Trustworthy efficient learning with graph-aware quantum-classical transformers for hyperspectral imaging and NLP tasks
Abstract
This thesis focuses on one practical goal: building parameter-efficient models that keep performance
close to current baseline models. The prime motivation is to control information flow
explicitly, instead of allowing model capacity to grow without accounting for parameter size.
In this work, the potential of hyperspectral imagery is exploited by integrating patch-level graph
construction with a transformer-inspired attention module to preserve local spatial coherence, so
that long-range spectral interactions can still be modeled. The proposed work is further extended
to multitask prediction by adapting the architecture after including an attention head in the
presented framework. The model uses approximately 3.3 million parameters, which increases the
overall model size. In deep learning tasks, models with fewer parameters are preferred when they
can provide a similar level of efficiency to larger models. Hence, a quantum-inspired classical deep
learning architecture is presented. Using this architecture, low-dimensional quantum encoding is
considered, which reduces the number of parameters to 35,000 while maintaining a similar level
of performance to the model without the quantum component. To evaluate the strength of the
presented model on Natural Language Processing (NLP) tasks, the same quantum model is used
with a frozen pretrained encoder, i.e., DistilBERT, and then matched low-dimensional heads, so
that compact classical and quantum heads can be compared for non-image tasks such as NLP.
The work follows a clear three-stage path. First, PatchGraph-MTFormer is developed for
hyperspectral image classification, where locality is important and training data are often limited.
It is evaluated on four standard HSI datasets: Indian Pines, Pavia University, Houston 2013, and
WHU-Hi-LongKou. Second, QuantFormer is introduced to test whether a compact quantum–
classical bottleneck can reduce model size while keeping useful predictive quality in HSI settings.
Third, the same bottleneck idea is transferred to NLP using a frozen encoder and matched
low-dimensional heads, so compact classical and compact quantum heads can be compared fairly.
Across these stages, experiments show a consistent pattern: careful constraints on representations
can reduce parameter burden while preserving strong results. PatchGraph-MTFormer reaches OA
values of 99.93% (Indian Pines), 99.74% (Pavia University), 100.00% (Houston 2013), and 99.65%
(WHU-Hi-LongKou), with 91.50% OA on HyperLeaf cultivar classification, while QuantFormer
remains strong with above 99% OA on three airborne HSI benchmarks and about 89.8% OA on
EuroSAT_MS.
Overall, the thesis provides a practical design-and-evaluation workflow for constrained learning:
define bottlenecks clearly, keep protocols fixed, compare under matched settings, and report
predictive quality together with parameter and runtime cost.
Description
Thesis embargoed until May 1 2027.
