A hybrid framework for weak signal learning in breast cancer prediction using metabolomics data

Loading...
Thumbnail Image

Date

Authors

Fang, Jiahui

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Clinical MS-based metabolomics prediction in small cohorts is often constrained by weak class separation, class imbalance, and heterogeneous sample reliability. Under these conditions, predictive performance is limited not by a single factor, but by the combined effects of unstable feature structure, limited minority class support, and unequal learning difficulty across samples. Existing methods have addressed some of these challenges separately, but a unified framework for stable learning under weak signal conditions remains insufficiently developed. This thesis studies weak signal clinical metabolomics prediction as a structured learning problem rather than a standard supervised classification task. To address this setting, a unified and fold-disciplined framework is developed that integrates transformer representation learning, conditional generative adversarial network (cGAN) augmentation, and curriculum learning (CL) within stratified cross-validation (CV). The framework is designed to provide a more stable representation space, strengthen minority class support during training, and organize training in a way that better reflects variation in sample reliability. The proposed framework is evaluated on two breast cancer-related metabolomics datasets with different signal conditions. ST004145 is used as the primary weak signal dataset, while ST000355 is used as a strong signal stability-check dataset. On ST004145, the full hybrid model achieved the highest mean Area Under the ROC Curve (AUC) among the compared methods (0.6794 ± 0.0871). Ablation analysis further indicated that both cGAN minority support and CL difficulty-aware training contributed to the final performance pattern. On ST000355, performance differences between models were much smaller, although the proposed model remained highly competitive, with an AUC of 0.9896 ± 0.0195. These findings suggest that the value of the proposed framework is most evident under weak signal conditions, where predictive robustness depends on addressing multiple interacting sources of instability within a single training design. Therefore, this thesis contributes a more structured methodological perspective on weak signal clinical metabolomics prediction and supports the usefulness of a unified, fold-disciplined learning framework in small, class imbalanced clinical cohorts.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By