Using homophily to analyze and develop link prediction models with deep learning framework
Abstract
Twitter is a prominent social networking platform where users’ short messages or “tweets”
are often used for analysis. However, there has not been much attention paid to mining the
medical professions, such as detecting users’ occupations from their biographical content.
Mining such information can be useful to build recommender systems for cost-effective advertisements. Conventional classifiers can be used to predict medical occupations, but they
tend to perform poorly as there are a variety of occupations. As a result, the main focus
of the research is to use various deep learning techniques to examine the textual properties
of Twitter users’ biographic contents, network properties, and the impact of homophily of
Twitter users employed in medical professional fields. In Chapter 2, a survey is presented
based on the concept of homophily as well as important social network topics that summarize the state of art methods that has been proposed in the past years to identify and
measure the effect of homophily in multiple types of social networks. This enables us to
find open challenges and directions for future research. In Chapter 3, a model has been
developed to identify Twitter users working in medical professional fields by using textual
properties of the Twitter Users’ bio contents. We have conducted our analysis by annotating
the content of Twitter users’ bios and propose a method of combining word embedding with
state-of-art neural network models. Finally, in Chapter 4, the research introduces a link
prediction model based on the homophily concept by using the Twitter users’ followers and
following IDs identified from Chapter 3. Recent research has centered on analyzing rapidly
evolving networks. While predicting links in dynamic networks is difficult, deep learning
techniques and network representation learning algorithms, such as Node2vec, have demonstrated significant improvements in prediction accuracy. However, Node2vec’s Stochastic
Gradient Descent (SGD) approach is prone to falling into a local optimum, and as a consequence, Node2vec fails to capture the network’s global structure. To address this problem,
we propose NODDLE (integration of NOde2vec anD Deep Learning mEthod), a deep learning system in which we combine Node2vec’s features and feed them into a four-layer hidden
neural network. integration of NOde2vec anD Deep Learning mEthod (NODDLE) takes advantage of adaptive learning optimizers for improving the performance of link prediction. On
different social network datasets, experimental findings show that our approach outperforms
conventional methods.