A hybrid framework for weak signal learning in breast cancer prediction using metabolomics data

dc.contributor.advisorAlkhateeb, Abedalrhman
dc.contributor.authorFang, Jiahui
dc.contributor.committeememberRathore, M. Mazhar
dc.contributor.committeememberDeng, Yong
dc.date.accessioned2026-05-04T13:17:09Z
dc.date.created2026
dc.date.issued2026
dc.description.abstractClinical MS-based metabolomics prediction in small cohorts is often constrained by weak class separation, class imbalance, and heterogeneous sample reliability. Under these conditions, predictive performance is limited not by a single factor, but by the combined effects of unstable feature structure, limited minority class support, and unequal learning difficulty across samples. Existing methods have addressed some of these challenges separately, but a unified framework for stable learning under weak signal conditions remains insufficiently developed. This thesis studies weak signal clinical metabolomics prediction as a structured learning problem rather than a standard supervised classification task. To address this setting, a unified and fold-disciplined framework is developed that integrates transformer representation learning, conditional generative adversarial network (cGAN) augmentation, and curriculum learning (CL) within stratified cross-validation (CV). The framework is designed to provide a more stable representation space, strengthen minority class support during training, and organize training in a way that better reflects variation in sample reliability. The proposed framework is evaluated on two breast cancer-related metabolomics datasets with different signal conditions. ST004145 is used as the primary weak signal dataset, while ST000355 is used as a strong signal stability-check dataset. On ST004145, the full hybrid model achieved the highest mean Area Under the ROC Curve (AUC) among the compared methods (0.6794 ± 0.0871). Ablation analysis further indicated that both cGAN minority support and CL difficulty-aware training contributed to the final performance pattern. On ST000355, performance differences between models were much smaller, although the proposed model remained highly competitive, with an AUC of 0.9896 ± 0.0195. These findings suggest that the value of the proposed framework is most evident under weak signal conditions, where predictive robustness depends on addressing multiple interacting sources of instability within a single training design. Therefore, this thesis contributes a more structured methodological perspective on weak signal clinical metabolomics prediction and supports the usefulness of a unified, fold-disciplined learning framework in small, class imbalanced clinical cohorts.
dc.identifier.urihttps://knowledgecommons.lakeheadu.ca/handle/2453/5597
dc.language.isoen
dc.titleA hybrid framework for weak signal learning in breast cancer prediction using metabolomics data
dc.typeThesis
etd.degree.disciplineComputer Science
etd.degree.grantorLakehead University
etd.degree.levelMaster
etd.degree.nameMaster of Computer Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FangJ2026m-2b.pdf
Size:
3.21 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.23 KB
Format:
Item-specific license agreed upon to submission
Description: