Exploiting semantic similarity models to automate transfer credit assessment in academic mobility
Abstract
Student mobility or academic mobility involves students moving between institutions during
their post-secondary education, and one of the challenging tasks in this process is to assess
the transfer credits to be offered to the incoming student. In general, this process involves
domain experts comparing the learning outcomes (LOs) of the courses, and based on their
similarity deciding on offering transfer credits to the incoming students. This manual im-
plementation of the task is not only labor-intensive but also influenced by undue bias and
administrative complexity. This research work focuses on identifying an algorithm that ex-
ploits the advancements in the field of Natural Language Processing (NLP) to effectively
automate this process. A survey tracing the evolution of semantic similarity helps under-
stand the various methods available to calculate the semantic similarity between text data.
The basic units of comparison namely, learning outcomes are made up of two components
namely the descriptor part which provides the contents covered, and the action word which
provides the competency achieved. Bloom’s taxonomy provides six different levels of com-
petency to which the action words fall into. Given the unique structure, domain specificity,
and complexity of learning outcomes, a need for designing a tailor-made algorithm arises.
The proposed algorithm uses a clustering-inspired methodology based on knowledge-based
semantic similarity measures to assess the taxonomic similarity of learning outcomes and a
transformer-based semantic similarity model to assess the semantic similarity of the learning
outcomes. The cumulative similarity between the learning outcomes is further aggregated
to form course to course similarity. Due to the lack of quality benchmark datasets, a new
benchmark dataset is built by conducting a survey among domain experts with knowledge
in both academia and computer science. The dataset contains 7 course-to-course similarity
values annotated by 5 domain experts. Understanding the inherent need for flexibility in
the decision-making process the aggregation part of the algorithm offers tunable parame-
ters to accommodate different scenarios. Being one of the early research works in the field
of automating articulation, this thesis establishes the imminent challenges that need to be
addressed in the field namely, the significant decrease in performance by state-of-the-art se-
mantic similarity models with an increase in complexity of sentences, lack of large datasets
to train/fine-tune existing models, lack of quality in available learning outcomes, and reluc-
tance to share learning outcomes publicly. While providing an efficient algorithm to assess
the similarity between courses with existing resources, this research work steers future re-
search attempts to apply NLP in the field of articulation in an ideal direction by highlighting
the persisting research gaps.