From social media to expert reports: automatically validating and extending complex conceptual models using machine learning approaches
Master of Science
SubjectComplex conceptual models
MetadataShow full item record
Given the importance of developing accurate models of any complex system, the modeling process often seeks to be comprehensive by including experts and community members. While many qualitative modeling processes can produce models in the form of maps (e.g., cognitive/concept mapping, causal loop diagrams), they are generally conducted with a facilitator. The limited capacity of the facilitators limits the number of participants. The need to be either physically present (for face-to-face sessions) or at least in a compatible time zone (for phone interviews) also limits the geographical diversity of participants. In addition, participants may not openly express their beliefs (e.g., weight discrimination, political views) when perceiving that they may not be well received by a facilitator or others in the room. In contrast, the naturally occurring exchange of perspectives on social media provides an unobtrusive approach to collecting beliefs on causes and consequences of such complex systems. Mining social media also supports a scalable approach and a geographically diverse sample. While obtaining a conceptual model via social media can inform policymakers about popular support for possible policies, the model may stand in stark contrast with an expert-based model. Identifying and reconciling these differences is an important step to integrate social computing with policy making. The pipeline to automatically validate large conceptual models, here of obesity and politics using large text data-set (academic reports or social media like Twitter) comprise technical innovation of applying machine learning approaches. This is achieved by generating relevant keywords using wordnet interface from NLTK, articulating topic modelling using gensim LDA model, entity recognition using Google Cloud Natural language processing API and categorizing themes by count vectorizer and tf-idf transformer using scikit-learn library. Once the pipeline validates the model, it is further suggested for extension by mining literature or Twitter conversations and using Granger causality tests on the time series gained from respective sources of data. Later we realize the impact of the shift in public opinion on Twitter, which can alter the results of validation and extension of conceptual models while using our computational methods. So we finally compare the sentiment analysis and sarcasm detection results on these conceptual models. Analyzing these results we discuss whether the confirmed and extended associations in our conceptual model are an artifact of our method or an accurate reflection of events related to that complex conceptual model. The combination of these machine learning approaches will help us automatically confirm and extend complex conceptual models with less hassle of money, time and resources. It can be used for automatically formulating public policies which are created in response to issues brought before decision makers, instead we create them using issues discussed everyday on social media platform.