Use of semantic, syntactic and sentiment features to automate essay evaluation
Abstract
Manual grading of essays by humans is time-consuming and likely to be susceptible to
inconsistencies and inaccuracies. Mostly performed within an academic institution,
the task at hand is to grade hundreds of submitted essays and the major hurdle is the
homogeneous assessment from the first till the last. It can take hours or sometimes
even days to finish the assessment. Automating this tedious manual task is not only
a relief to the teachers but also assures the students of consistent markings throughout.
The challenge in automatizing is to recognize crucial aspects of natural language
processing (NLP) which are vital for accurate automated essay evaluation.
NLP is a subset of the field of artificial intelligence which deals with making computers
understand the language used by humans for expression and then further process
it. Since essays are a written textual form of expression and idea exchange, automating
the essay assessment process through a computer system leverages progress
from NLP field and automates one of the biggest manual tasks of educational systems.
In recent years, an abundance of research has been done to automate essay evaluation
processes, yet little has been done to take into consideration the syntax, semantic coherence
and sentiments of the essay’s text together. Our proposed system incorporates
not just the rule-based grammar and surface level coherence check but also includes
the semantic similarity of the sentences. We propose to use graph-based relationships
within the essay’s content and polarity of opinion expressions. Semantic similarity
is determined between each statement of the essay to form these graph-based spatial
relationships. Our algorithm uses 23 salient features with high predictive power,
which is less than the current systems while considering every aspect to cover the
dimensions that a human grader focuses on. Fewer features help us get rid of the
redundancies of the data so that the predictions are based on more representative
features and are robust to noisy data. The prediction of the scores is done with
neural networks using the data released by the ASAP competition held by Kaggle.
The resulting agreement between human grader’s score and the system’s prediction
is measured using Quadratic Weighted Kappa (QWK). Our system produces a QWK
of 0.793. Our results are repeatable and transparent, and every feature is very well
explained as compared to other existing systems where authors have not explained
the methodologies and feature extraction to a similar extent for the results to be reproduced.