Generative AI based on medical visual question answering (VQA) techniques

Kaushik, Sarthak

Generative AI based on medical visual question answering (VQA) techniques

dc.contributor.advisor	Mohammed, Sabah
dc.contributor.author	Kaushik, Sarthak
dc.contributor.committeemember	Fiaidhi, Jinan
dc.contributor.committeemember	Zerpa, Carlos
dc.date.accessioned	2026-05-12T13:47:59Z
dc.date.created	2026
dc.date.issued	2026
dc.description.abstract	Medical visual question answering (MedVQA) enables clinicians to pose direct medical image questions rather than just using single-label image classification. It comes in handy in gastrointestinal (GI) endoscopy and, more specifically, in colonoscopy where clinical inquiries tend to be about whether findings are present, where they are, how many they are, or how much disease they represent. Nevertheless, the existing GI MedVQA systems are still plagued with critical issues, such as unbalanced dataset structure, imbalanced classes, mismatch of the answer format, and poor visual grounding in case of free-text generation. These problems are examined in this dissertation based on five GI datasets: HyperKvasir, Kvasir-VQA, Kvasir-VQA-x1, ImageCLEF MEDVQA-GI and LIMUC. The comparison of benchmarking results reveals a similar trend in these datasets: in case of cur-=rent GI tasks, pipelines with supervision or other restrictions are more trustworthy than unsupervised and raw zero-shot vision-language generation. This is particularly relevant in the context where the answer space is fixed or where the clinically important classes are uncommon. The dissertation develops a generative approach to ulcerative colitis severity scoring on LIMUC based on this finding in which the Mayo endoscopic subscore is the target task. The model is tested to explicit output limitations, and analysis with clinically relevant analysis. The work then maps this model-level contribution to a physician-support environment by a modular wrapper which interprets queries, retrieves supporting evidence as necessary, generates citation-linked answers, rejects unsupported queries, and maintains traceable system logs. The general argument of the dissertation is that GI MedVQA can only proceed toward clinical decision support with systems that are still visually based, whose limits can be traced, and whose auditing can be performed by the supervision of physicians. The work does not purport independent clinical application. Rather, it provides a gradual route to benchmark assessment to safer clinician-facing colonoscopy aid.
dc.identifier.uri	https://knowledgecommons.lakeheadu.ca/handle/2453/5607
dc.language.iso	en
dc.subject	medical visual question answering
dc.subject	gastrointestinal endoscopy
dc.subject	ulcerative colitis
dc.subject	Mayo endoscopic subscore
dc.subject	vision-language models
dc.subject	retrieval-augmented generation
dc.subject	clinical decision support
dc.subject	colonoscopy
dc.title	Generative AI based on medical visual question answering (VQA) techniques
dc.type	Thesis
etd.degree.discipline	Computer Science
etd.degree.grantor	Lakehead University
etd.degree.level	Master
etd.degree.name	Master of Science in Computer Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: KaushikS2026m-2b.pdf
Size:: 2.01 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.23 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations from 2009