Argument summarization: enhancing summary generation and evaluation metrics
Abstract
In the current era of mass digital information, the need for effective argument summarization has become paramount. This thesis explores the domain of argument
summarization, focusing on the development of techniques and evaluation metrics to
improve the quality of summarization models. The study first investigates the task of
key point analysis, and the challenges associated with previous approaches to it, emphasizing the significance of coverage of the summary. To address these challenges,
we propose a novel clustering-based framework that leverages the inherent semantics of arguments to identify and group similar arguments. The proposed approach
is evaluated on the benchmark dataset and compared with previous state-of-the-art
methods, demonstrating its effectiveness. In addition to the proposed framework,
this thesis also presents an analysis of the previous evaluation metric for argument
summarization. Commonly used metric, ROUGE is evaluated, revealing its limitation in capturing the nuanced aspects of argument quality. To this end, we introduce
new evaluation metrics and methods that consider the coverage and redundancy of
the generated summaries, providing more accurate and informative assessments of
summarization models. We further show that our evaluation metric has a better correlation with actual summary quality, whereas previous metrics fail to capture this
correlation.