Causal discovery and treatment effect modeling in breast cancer

Loading...
Thumbnail Image

Date

Authors

Krikun, Elena

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Modeling breast cancer outcomes remains challenging because of extreme molecular heterogeneity and the inability of associative models, including those developed through traditional machine learning, to support counterfactual, intervention-based clinical reasoning. Building on recent advances in causal feature selection, multiomics variable selection, and individual treatment effect estimation, this thesis proposes a hybrid pipeline within a unified computational multiomics framework that integrates high-dimensional data with causal modeling to produce interpretable precision oncology models that extend beyond risk prediction. The proposed pipeline was developed using the TCGA-BRCA cohort as the discovery set and validated on the independent retrospective METABRIC cohort to assess transportability. To address the curse of dimensionality, the framework applies Markov Blanket-based local causal discovery across seven data modalities and reduces more than 600,000 initial features to a sparse and stable causal core. This causal representation is then used for survival modeling (C-index = 0.8085, 5-year AUC = 0.8676) and individual treatment effect (ITE) estimation for chemotherapy, hormone therapy, and targeted therapy. External validation on METABRIC achieved a C-index of 0.7200 and a 5-year AUC of 0.7639, indicating moderate but clear transportability across cohorts and assay platforms. The final causal core confirmed the integration of clinical, proteomic, and epigenetic signals, and identified a long non-coding RNA as a structurally relevant driver. The treatment-effect stage used treatment-specific arm definitions reconstructed from clinical records together with a robustness-oriented validation protocol. Chemotherapy showed the strongest and most stable beneficial treatment effect, most notably in the TNBC subgroup, where treatment-effect estimates remained consistently protective across estimators and overlap-adjusted variants. Hormone-therapy estimates showed a consistently protective direction in receptor-positive subgroup analyses, although the magnitude of the effect was attenuated under stricter overlap control, indicating residual confounding and limited positivity in the observational setting. Targeted therapy also showed a protective direction under most evaluated techniques, but given the very small number of treated patients and partial estimator disagreement, these effect estimates should be interpreted as exploratory.

Description

Thesis is embargoed until May 15 2027.

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By