The analysis of Canada's health through social media using machine learning
Shah, Neel J.
Master of Science
MetadataShow full item record
Real-time online data processing is quickly becoming an essential tool in the analysis of social media for political trends, advertising, public health awareness programs and policy making. Traditionally, processes associated with offline analysis are productive and efficient only when the data collection is a one-time process. Currently, cutting edge research requires real-time data analysis that comes with a set of challenges, particularly the efficiency of continuous data fetching within the context of present NoSQL and relational databases. In this thesis, I demonstrate a solution to effectively address the challenges of real-time analysis using a configurable Elasticsearch search engine. We are using a distributed database architecture, pre-build indexing and standardizing the Elasticsearch framework for large scale text mining. The results from the Elasticsearch engine is visualized in almost real-time. We focused on taking our solution to the challenges of real-time data processing is to apply it on social media to conduct a large scale health analaysis in Canada. Social media a crucial database that provides information on a variety of topics such as health, food, feedback on products, and many others. At present, people utilize social media to share their daily lifestyles, for example, where they are going, what exercise are they doing, or what are they eating. By analyzing the information, collected from these individuals, the health of the population can be gauged. This analysis can become an integral part of the government’s efforts to study the health of people on a large scale. This is because public health is becoming the primary concern for many governments around the world, and they believe it is necessary to analyze the present scenario within the population before creating any new policies. Traditionally, governments use a door to door survey, for example, a census, or hospital information to decide their health policies. This information is limited and sometimes takes a long time to collect and analyze sufficiently enough to aid in decision making. Our approach is to try to solve such problems through the advancement of natural language processing algorithms and large scale data analysis. Results show, the proposed method provides the solution in less time with the same accuracy when compared to the traditional one.