Adding time-series data to enhance performance of naural language processing tasks
Abstract
In the past few decades, with the explosion of information, a large number of
computer scientists have devoted themselves to analyzing collected data and applying
these findings to many disciplines. Natural language processing (NLP) has been one of
the most popular areas for data analysis and pattern recognition. A significantly large
amount of data is obtained in text format due to the ease of access nowadays. Most
modern techniques focus on exploring large sets of textual data to build forecasting
models; they tend to ignore the importance of temporal information which is often
the main ingredient to determine the performance of analysis, especially in the public
policy view. The contribution of this paper is three-fold. First, a dataset called
COVID-News is collected from three news agencies, which consists of article segments
related to wearing masks during the COVID-19 pandemic. Second, we propose a
long-short term memory (LSTM)-based learning model to predict the attitude of the
articles from the three news agencies towards wearing a mask with both temporal
and textural information. Then we added the BERT model to further improve and
enhance the performance of the proposed model. Experimental results on the COVIDNews dataset show the effectiveness of the proposed LSTM-based algorithm.