Multimodal deep learning for multi-horizon corporate revenue forecasting

Wu, Qiping

Multimodal deep learning for multi-horizon corporate revenue forecasting

Files

WuQ2026m-2b.pdf (8.05 MB)

Date

2026

Authors

Wu, Qiping

Abstract

Corporate revenue forecasting matters for valuation, portfolio management, and capital allocation. However, it is difficult because financial statements mainly reflect the past, while investors and firms often need forecasts from the next quarter to a rolling one-year horizon. This challenge becomes even greater over longer horizons, especially in fast-changing industries. This thesis addresses the problem by building a forecasting framework that starts with a broad quantitative baseline and then extends to a multimodal approach. First, this thesis develops a Temporal Fusion Transformer (TFT) baseline for next-quarter revenue forecasting across 155 continuously listed S&P 500 firms. Under a strict chronological evaluation protocol, the TFT model achieves a test Mean Absolute Percentage Error (MAPE) of 9.31%, a Root Mean Squared Error (RMSE) of 1,973 million USD, and a Mean Absolute Error (MAE) of 1,790 million USD. Controlled ablation analysis further shows that accurate short-horizon forecasting depends not only on autoregressive revenue history, but also on structured firm context, including sector identity, year-over-year growth, and firm scale variables such as total assets and equity. Second, the framework is extended from one-quarter-ahead to four-quarter-ahead forecasting. The results show that forecast accuracy deteriorates as the horizon expands, with MAPE rising from 9.31% at one quarter ahead (𝑡 + 1) to 12.07% at four quarters ahead (𝑡 + 4). A comparison with an LSTM baseline under the same chronological setting further suggests that this deterioration is not specific to a single model, but reflects a broader limitation of purely financial forecasting approaches. The effect is especially pronounced in technology-oriented firms, highlighting the limits of relying only on lagged financial data in non-linear growth environments. Third, the work proposes a multimodal TFT framework that integrates earnings-call-derived textual signals into the forecasting pipeline. Focusing on the Mega-Cap 5 companies, the framework uses both Financial Bidirectional Encoder Representations from Transformers (FinBERT) and a locally deployed Llama-3 8B model to extract finance-domain sentiment and richer generative narrative features from quarterly earnings call transcripts. These results show that transcript-based narrative features improve long-horizon forecasting. Among the models, the Llama-3 representation delivers the biggest improvement. For example, the pure TFT has a MAPE of 53.85%, while the FinBERT+TFT and Llama-3+TFT hybrids reduce it to 48.70% and 43.01%, respectively. Overall, this thesis presents a practically deployable multimodal forecasting framework that bridges the gap between backward-looking financial fundamentals and forward-looking managerial narratives in corporate revenue forecasting.

URI

https://knowledgecommons.lakeheadu.ca/handle/2453/5592

Collections

Electronic Theses and Dissertations from 2009

Full item page

Multimodal deep learning for multi-horizon corporate revenue forecasting

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By