Exploiting semantic similarity models to automate transfer credit assessment in academic mobility
Master of Science
SubjectNatural language processing
Semantic textual similarity
Semantic similarity models
MetadataShow full item record
Student mobility or academic mobility involves students moving between institutions during their post-secondary education, and one of the challenging tasks in this process is to assess the transfer credits to be offered to the incoming student. In general, this process involves domain experts comparing the learning outcomes (LOs) of the courses, and based on their similarity deciding on offering transfer credits to the incoming students. This manual im- plementation of the task is not only labor-intensive but also influenced by undue bias and administrative complexity. This research work focuses on identifying an algorithm that ex- ploits the advancements in the field of Natural Language Processing (NLP) to effectively automate this process. A survey tracing the evolution of semantic similarity helps under- stand the various methods available to calculate the semantic similarity between text data. The basic units of comparison namely, learning outcomes are made up of two components namely the descriptor part which provides the contents covered, and the action word which provides the competency achieved. Bloom’s taxonomy provides six different levels of com- petency to which the action words fall into. Given the unique structure, domain specificity, and complexity of learning outcomes, a need for designing a tailor-made algorithm arises. The proposed algorithm uses a clustering-inspired methodology based on knowledge-based semantic similarity measures to assess the taxonomic similarity of learning outcomes and a transformer-based semantic similarity model to assess the semantic similarity of the learning outcomes. The cumulative similarity between the learning outcomes is further aggregated to form course to course similarity. Due to the lack of quality benchmark datasets, a new benchmark dataset is built by conducting a survey among domain experts with knowledge in both academia and computer science. The dataset contains 7 course-to-course similarity values annotated by 5 domain experts. Understanding the inherent need for flexibility in the decision-making process the aggregation part of the algorithm offers tunable parame- ters to accommodate different scenarios. Being one of the early research works in the field of automating articulation, this thesis establishes the imminent challenges that need to be addressed in the field namely, the significant decrease in performance by state-of-the-art se- mantic similarity models with an increase in complexity of sentences, lack of large datasets to train/fine-tune existing models, lack of quality in available learning outcomes, and reluc- tance to share learning outcomes publicly. While providing an efficient algorithm to assess the similarity between courses with existing resources, this research work steers future re- search attempts to apply NLP in the field of articulation in an ideal direction by highlighting the persisting research gaps.