Please use this identifier to cite or link to this item: https://knowledgecommons.lakeheadu.ca/handle/2453/5491
Title: Integrating multi-omics data via latent space construction for breast and bladder cancer analysis
Authors: Boominathan, Arvind Chidambaram
Issue Date: 2025
Abstract: Cancer remains one of the most complex and heterogeneous diseases, driven by intricate interactions across genetic, epigenetic, and transcriptional landscapes. Accurately understanding and predicting tumor characteristics, such as Tumor Mutational Burden (TMB), is critical for effective diagnosis, prognosis, and personalized treatment strategies. This research aims to address inherent challenges in integrating high-dimensional, heterogeneous multi-omics datasets—including DNA methylation, gene expression, and Copy Number Alteration (CNA)—specifically for bladder and breast cancer analysis, by building a shared latent space that captures and preserves meaningful cross-omics representations. Some of these challenges include data imbalance, dimensionality, modalityspecific noise, and complex non-linear biological interactions. To overcome these obstacles, this thesis proposes constructing a shared latent space through advanced deep-learning approaches by utilizing Deep Multiset Canonical Correlation Analysis (DMCCA) and Graph Attention Networks (GATs). The shared latent space methodology provides a unified representation capturing crucial and intricate biological interactions across various omics modalities, as a result giving improved predictive accuracy for TMB classification. Attention mechanisms further refine this integration by dynamically focusing on the most relevant relational patterns within multiomics data, enhancing the model’s ability to capture biological interactions between genes, pathways, and patient profiles. In addition, this study utilizes oversampling techniques—mainly the Synthetic Minority Oversampling Technique (SMOTE)—to offset data imbalance among TMB classes and menopausal status groups. As compared to baseline supervised machine learning models such as Logistic Regression (LR), Artificial Neural Network (ANN), and Tabular Transformer, the new GAT model with shared latent space training performed better by achieving an AUC of 0.76 and accuracy of 76.1% for BRCA, whereas that of BLCA was 0.73 with an accuracy of 65.3%, thereby establishing the usefulness of multi-omics integration through shared latent space learning.
URI: https://knowledgecommons.lakeheadu.ca/handle/2453/5491
metadata.etd.degree.discipline: Computer Science
metadata.etd.degree.name: Master of Science
metadata.etd.degree.level: Master
metadata.dc.contributor.advisor: Alkhateeb, Abedalrhman
metadata.dc.contributor.committeemember: Bin Ahmed, Saad
Yassine, Abdulsalam
Appears in Collections:Electronic Theses and Dissertations from 2009

Files in This Item:
File Description SizeFormat 
BoominathanA2025m-2b.pdf
  Until 2026-05-12
Embargoed until May 12, 20266.09 MBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.