Please use this identifier to cite or link to this item: https://knowledgecommons.lakeheadu.ca/handle/2453/5185
Title: A data warehouse-oriented methodology for qualitative semi-structured web information and social networking sites' user status search
Authors: Kabir, Md. Shahriar
Keywords: Web mining;Webpage extraction and modeling;Identifying emotional and psychological traits (social networking sites);Machine learning;Natural language processing;Text mining
Issue Date: 2019
Abstract: Finding most desired and useful information from the diverse information and content embedded on webpages has become more challenging due to the rapid growth of websites and webpages, dynamic changes and updates of information and content on webpages, the lack of well-formed structure of webpage content and so on. Information search seems a trivial task when plain text, hyperlink texts, embedded images, videos that all make up webpage content remain in semistructured form. Semi-structured webpage content do not have predefined structure and remains in hierarchically nested HTML tags of a webpage body. Unlike structured webpage content, heterogeneous semi-structured webpage content can’t be neatly formatted, organized and modeled directly into relational database. One of the most important information types on the web is web user’s emotion expressed in user-posted status on Social Networking Sites like Facebook, Twitter. Publicly posted user status is informative enough to know user’s daily thoughts, feelings, emotions through textual self-description. The data warehouse-oriented methodology of semi-structured webpage content extraction and modeling into database introduces a simplified and less labor intensive XML-based semi-structured webpage content extraction technique that overcomes the limitations of existing pre-defined specification file and Wrapper-based techniques to adapt rapid changes of webpage content and to extract same piece of information on different webpages having differentiated nested HTML structure. This methodology also introduces Multidimensional Fact data modeling technique for semi-structured webpage content storage into relational database. Our implemented methodology ensures qualitative search result in terms of hyperlinks to most desired webpages appearing first with a relatively very low fractional amount of minute. [...]
URI: https://knowledgecommons.lakeheadu.ca/handle/2453/5185
metadata.etd.degree.discipline: Computer Science
metadata.etd.degree.name: Master of Science
metadata.etd.degree.level: Master
metadata.dc.contributor.advisor: Fiaidhi, Jinan
metadata.dc.contributor.committeemember: Mohammed, Sabah
Du, Shan
Yassine, Abdulsalam
Appears in Collections:Electronic Theses and Dissertations from 2009

Files in This Item:
File Description SizeFormat 
KabirMds2019-1a.pdf19.54 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.