Development of a language model and opinion extraction for text analysis of online platforms

Abdul Qudar, Mohiuddin Md

dc.contributor.advisor	Mago, Vijay
dc.contributor.author	Abdul Qudar, Mohiuddin Md
dc.date.accessioned	2021-06-22T14:13:22Z
dc.date.available	2021-06-22T14:13:22Z
dc.date.issued	2021
dc.identifier.uri	https://knowledgecommons.lakeheadu.ca/handle/2453/4822
dc.description.abstract	Language models are one of the fundamental components in a wide variety of natural language processing tasks. The proliferation of text data over the last two decades and the developments in the field of deep learning have encouraged researchers to explore ways to build language models that have achieved results at par with human intelligence. An extensive survey is presented in Chapter 2 exploring the types of language models, with a focus on transformer-based language models owing to the state-of-the-art results achieved and the popularity gained by these models. This survey helped to identify existing shortcomings and research needs. With the advancements of deep learning in the domain of natural language processing, extracting meaningful information from social media platforms, especially Twitter, has become a growing interest among natural language researchers. However, applying existing language representation models to extract information from Twitter does not often produce good results. To address this issue, Chapter 3 introduces two TweetBERT models which are domain specific language presentation models pre-trained on millions of tweets. TweetBERT models significantly outperform the traditional BERT models in Twitter text mining tasks. Moreover, a comprehensive analysis is presented by evaluating 12 BERT models on 31 different datasets. The results validate our hypothesis that continuously training language models on Twitter corpus helps to achieve better performance on Twitter datasets. Finally, in Chapter 4, a novel opinion mining system called ONSET is presented. ONSET is mainly proposed to address the need for large amounts of quality data to fine-tune state-of-the-art pre-trained language models. Fine-tuning language models can only produce good results if trained with a large amount of relevant data. ONSET is a technique that can fine-tune language models for opinion extractions using unlabelled training data. This system is developed through a fine-tuned language model using an unsupervised learning approach to label aspects using topic modeling and then using semi-supervised learning with data augmentation. With extensive experiments performed during this research, the proposed model can achieve similar results as some state-of-the-art models produce with a high quantity of labelled training data.	en_US
dc.language.iso	en_US	en_US
dc.subject	Language model	en_US
dc.subject	Deep learning	en_US
dc.subject	Transformer-based language models	en_US
dc.subject	TweetBERT	en_US
dc.subject	ONSET	en_US
dc.title	Development of a language model and opinion extraction for text analysis of online platforms	en_US
dc.type	Thesis
etd.degree.name	Master of Science	en_US
etd.degree.level	Master	en_US
etd.degree.discipline	Computer Science	en_US
etd.degree.grantor	Lakehead University	en_US

Files in this item

Name:: QudarM2021m-1a.pdf
Size:: 2.501Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations from 2009 [1612]

Show simple item record