Show simple item record

dc.contributor.advisorFiaidhi, Jinan
dc.contributor.authorKabir, Md. Shahriar
dc.date.accessioned2023-06-27T18:03:20Z
dc.date.available2023-06-27T18:03:20Z
dc.date.created2019
dc.date.issued2019
dc.identifier.urihttps://knowledgecommons.lakeheadu.ca/handle/2453/5185
dc.description.abstractFinding most desired and useful information from the diverse information and content embedded on webpages has become more challenging due to the rapid growth of websites and webpages, dynamic changes and updates of information and content on webpages, the lack of well-formed structure of webpage content and so on. Information search seems a trivial task when plain text, hyperlink texts, embedded images, videos that all make up webpage content remain in semistructured form. Semi-structured webpage content do not have predefined structure and remains in hierarchically nested HTML tags of a webpage body. Unlike structured webpage content, heterogeneous semi-structured webpage content can’t be neatly formatted, organized and modeled directly into relational database. One of the most important information types on the web is web user’s emotion expressed in user-posted status on Social Networking Sites like Facebook, Twitter. Publicly posted user status is informative enough to know user’s daily thoughts, feelings, emotions through textual self-description. The data warehouse-oriented methodology of semi-structured webpage content extraction and modeling into database introduces a simplified and less labor intensive XML-based semi-structured webpage content extraction technique that overcomes the limitations of existing pre-defined specification file and Wrapper-based techniques to adapt rapid changes of webpage content and to extract same piece of information on different webpages having differentiated nested HTML structure. This methodology also introduces Multidimensional Fact data modeling technique for semi-structured webpage content storage into relational database. Our implemented methodology ensures qualitative search result in terms of hyperlinks to most desired webpages appearing first with a relatively very low fractional amount of minute. [...]en_US
dc.language.isoen_USen_US
dc.subjectWeb miningen_US
dc.subjectWebpage extraction and modelingen_US
dc.subjectIdentifying emotional and psychological traits (social networking sites)en_US
dc.subjectMachine learningen_US
dc.subjectNatural language processingen_US
dc.subjectText miningen_US
dc.titleA data warehouse-oriented methodology for qualitative semi-structured web information and social networking sites' user status searchen_US
dc.typeThesisen_US
etd.degree.nameMaster of Scienceen_US
etd.degree.levelMasteren_US
etd.degree.disciplineComputer Scienceen_US
etd.degree.grantorLakehead Universityen_US
dc.contributor.committeememberMohammed, Sabah
dc.contributor.committeememberDu, Shan
dc.contributor.committeememberYassine, Abdulsalam


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record