Use of semantic, syntactic and sentiment features to automate essay evaluation

Janda, Harneet Kaur

View/Open

JandaH2019m-1a.pdf (1.038Mb)

Author

Janda, Harneet Kaur

Metadata

Show full item record

Abstract

Manual grading of essays by humans is time-consuming and likely to be susceptible to inconsistencies and inaccuracies. Mostly performed within an academic institution, the task at hand is to grade hundreds of submitted essays and the major hurdle is the homogeneous assessment from the first till the last. It can take hours or sometimes even days to finish the assessment. Automating this tedious manual task is not only a relief to the teachers but also assures the students of consistent markings throughout. The challenge in automatizing is to recognize crucial aspects of natural language processing (NLP) which are vital for accurate automated essay evaluation. NLP is a subset of the field of artificial intelligence which deals with making computers understand the language used by humans for expression and then further process it. Since essays are a written textual form of expression and idea exchange, automating the essay assessment process through a computer system leverages progress from NLP field and automates one of the biggest manual tasks of educational systems. In recent years, an abundance of research has been done to automate essay evaluation processes, yet little has been done to take into consideration the syntax, semantic coherence and sentiments of the essay’s text together. Our proposed system incorporates not just the rule-based grammar and surface level coherence check but also includes the semantic similarity of the sentences. We propose to use graph-based relationships within the essay’s content and polarity of opinion expressions. Semantic similarity is determined between each statement of the essay to form these graph-based spatial relationships. Our algorithm uses 23 salient features with high predictive power, which is less than the current systems while considering every aspect to cover the dimensions that a human grader focuses on. Fewer features help us get rid of the redundancies of the data so that the predictions are based on more representative features and are robust to noisy data. The prediction of the scores is done with neural networks using the data released by the ASAP competition held by Kaggle. The resulting agreement between human grader’s score and the system’s prediction is measured using Quadratic Weighted Kappa (QWK). Our system produces a QWK of 0.793. Our results are repeatable and transparent, and every feature is very well explained as compared to other existing systems where authors have not explained the methodologies and feature extraction to a similar extent for the results to be reproduced.

URI

http://knowledgecommons.lakeheadu.ca/handle/2453/4345

Collections

Electronic Theses and Dissertations from 2009 [1632]