Generative AI based on medical visual question answering (VQA) techniques

dc.contributor.advisorMohammed, Sabah
dc.contributor.authorKaushik, Sarthak
dc.contributor.committeememberFiaidhi, Jinan
dc.contributor.committeememberZerpa, Carlos
dc.date.accessioned2026-05-12T13:47:59Z
dc.date.created2026
dc.date.issued2026
dc.description.abstractMedical visual question answering (MedVQA) enables clinicians to pose direct medical image questions rather than just using single-label image classification. It comes in handy in gastrointestinal (GI) endoscopy and, more specifically, in colonoscopy where clinical inquiries tend to be about whether findings are present, where they are, how many they are, or how much disease they represent. Nevertheless, the existing GI MedVQA systems are still plagued with critical issues, such as unbalanced dataset structure, imbalanced classes, mismatch of the answer format, and poor visual grounding in case of free-text generation. These problems are examined in this dissertation based on five GI datasets: HyperKvasir, Kvasir-VQA, Kvasir-VQA-x1, ImageCLEF MEDVQA-GI and LIMUC. The comparison of benchmarking results reveals a similar trend in these datasets: in case of cur-=rent GI tasks, pipelines with supervision or other restrictions are more trustworthy than unsupervised and raw zero-shot vision-language generation. This is particularly relevant in the context where the answer space is fixed or where the clinically important classes are uncommon. The dissertation develops a generative approach to ulcerative colitis severity scoring on LIMUC based on this finding in which the Mayo endoscopic subscore is the target task. The model is tested to explicit output limitations, and analysis with clinically relevant analysis. The work then maps this model-level contribution to a physician-support environment by a modular wrapper which interprets queries, retrieves supporting evidence as necessary, generates citation-linked answers, rejects unsupported queries, and maintains traceable system logs. The general argument of the dissertation is that GI MedVQA can only proceed toward clinical decision support with systems that are still visually based, whose limits can be traced, and whose auditing can be performed by the supervision of physicians. The work does not purport independent clinical application. Rather, it provides a gradual route to benchmark assessment to safer clinician-facing colonoscopy aid.
dc.identifier.urihttps://knowledgecommons.lakeheadu.ca/handle/2453/5607
dc.language.isoen
dc.subjectmedical visual question answering
dc.subjectgastrointestinal endoscopy
dc.subjectulcerative colitis
dc.subjectMayo endoscopic subscore
dc.subjectvision-language models
dc.subjectretrieval-augmented generation
dc.subjectclinical decision support
dc.subjectcolonoscopy
dc.titleGenerative AI based on medical visual question answering (VQA) techniques
dc.typeThesis
etd.degree.disciplineComputer Science
etd.degree.grantorLakehead University
etd.degree.levelMaster
etd.degree.nameMaster of Science in Computer Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KaushikS2026m-2b.pdf
Size:
2.01 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.23 KB
Format:
Item-specific license agreed upon to submission
Description: