key: cord-0534528-4rogx69u authors: Rath, Bhavtosh; Salecha, Aadesh; Srivastava, Jaideep title: Detecting Fake News Spreaders in Social Networks using Inductive Representation Learning date: 2020-11-21 journal: nan DOI: nan sha: 7670fbf23f5443bcb602f3d9de6e0ef7c8563971 doc_id: 534528 cord_uid: 4rogx69u An important aspect of preventing fake news dissemination is to proactively detect the likelihood of its spreading. Research in the domain of fake news spreader detection has not been explored much from a network analysis perspective. In this paper, we propose a graph neural network based approach to identify nodes that are likely to become spreaders of false information. Using the community health assessment model and interpersonal trust we propose an inductive representation learning framework to predict nodes of densely-connected community structures that are most likely to spread fake news, thus making the entire community vulnerable to the infection. Using topology and interaction based trust properties of nodes in real-world Twitter networks, we are able to predict false information spreaders with an accuracy of over 90%. People use social networking platforms like Twitter, Facebook and Whatsapp not only to consume and share information but also their opinions about it. Ease of sharing has made it possible to spread information quickly, often without verifying it, resulting in fake news spreading. This has led to increase in interest among social media researchers to propose fake news spreading detection models. In this context, it is not only important to detect false information but also identify people who are most likely to believe and spread the false information. This is so because detection of fake news spreaders can help contain the rapid spreading of fake news in social networks. While most of the related work in fake news detection systems has modeled content of the news itself, we propose a complementary approach that takes the network topology and historical user activity into account. As the CoViD19 virus spread rapidly around the world in 2020, so has false information regarding various aspects pertaining to it 1 . The need for a spreader detection model for fake news has never been more evident. Thus in this paper, we propose a novel spreader detection model using an inductive representation learning framework. The model quickly identifies spreaders before the false information penetrates deeper into a densely connected community and infects more nodes. The main contributions of the paper are as follows: 1. We propose a fake news spreader detection framework using the Community Health Assessment model [6] and interpersonal trust [4] . To the best of our knowledge, this is the first fake news spreader detection model proposed that relies on features extracted from underlying network structure and historical behavioral data instead of the content. 2. We implement our framework using inductive representation learning [2] where we sample neighborhood of nodes in a weighted network and aggregate their trust-based features. 3. We evaluate our proposed interpersonal trust based framework using multiple real Twitter networks and show that trust based modeling helps us identify false information spreaders with high accuracy, which makes the technique useful for fake news mitigation. 4 . We further observe that our model's accuracy when detecting false information spreaders is higher than that for true information spreaders. This indicates that people are usually able to reason about true information from analyzing the content, and thus trust in their neighbors is not a very significant factor. However, determining the truth of false information that is plausibly true from content itself is difficult and hence we have to rely on sources we trust to believe in it or not. This makes nodes that are fake news spreaders and at the same time highly trusted by lots of people in the network, especially dangerous. We acknowledge that not all such uberspreaders have ill intentions; some might be just ignorant. They all, nonetheless, have power to spread false information far and wide, with great speed. We believe this phenomenon needs greater study. The rest of the paper is organized as follows: We first discuss related work, then describe a motivating example for spreader detection from a network structure perspective, and summarize past ideas that the proposed research builds upon. We then explain the proposed framework and how we model interpersonal trust with it followed by experimental analysis and finally give our concluding remarks and proposed future work. In this section we first discuss related works on Graph Neural network architectures. Next, we discuss works related to the application of GNNs to social networks and information dissemination. We then outline other works in the domain of fake news detection, and we finally present works on Inductive representation learning that we build on. Graph Neural Networks (GNNs) are powerful neural network models that have received increased attention recently because of their application to non-euclidean spaces such as social networks. Numerous mathematical models for GNNs have been proposed [12] . In recent times, there has been research that has leveraged GNNs for complex tasks in social graphs like political perspective prediction and stance detection. In the field of fake news detection, Bian et al. [8] proposed a graph convolution network based model that utilized propagation paths to detect fake news. Researchers have also proposed architectures that integrate ideas from generative adversarial networks to build graph-based detectors for rumor identification [14] . Other studies have demonstrated the merit of attention based graph models in modelling and detecting rumors [15] , [1] . Notably, Lu et al. [16] developed a graph-aware attention network that uses user representations and propagation paths taken by a piece of information to predict fake news. Nguyen et al. [17] recently proposed FANG, an inductive learning framework that uses GNNs for social structure representation and fake news detection. Our work also utilizes inductive representation learning in the form of GraphSage [2] , which generates embeddings by sampling and aggregating features. GraphSage generalizes well to unseen and rapidly changing data by dynamically adapting at inference time. Characterizing the differences between the spread of false and true news has also served as motivation for our research. In this regard, Vosoughi et al's. [11] work on the empirical analysis of the propagation paths taken by false and true news is of interest to us. Jooyeon et al. [10] also proposed a bayesian nonparametric model to understand the role of content in diffusion of true and false news and the differences therein. Unlike most previous works that analyzes content features, our approach uses the underlying social graph structures along with users representations built from their historical data to build an inductive learning based graph neural network to help identify the most prevalent information spreaders. To understand the role of network structure in fake news spreader detection, consider the scenario illustrated in Figure 1. The network contains 8 communities. Subscript of a node denotes the community it belongs to. In the context of Twitter, directed edge B 1 → A 1 represents B 1 follows A 1 . Thus, a tweet flows from A 1 to B 1 . If B 1 decides to retweet A 1 's tweet, we say that B 1 has endorsed A 1 's tweet, and that B 1 trusts A 1 . Communities in social networks are modular groups, where within-group members are tightly connected, and intra-community trust is higher, compared to trust between members in different communities, who are at best loosely connected. The more B trusts A, the higher the chance that B will retweet A's tweet, and thus propagate A's message, whether it is true or false. The figure illustrates the spread of fake news starting from D 1 as it spreads across the network through A 3 till A 8 . We consider two scenarios for spreader detection: 1. Information reaches neighborhood of a community: Consider the scenario when a message is propagated by D 1 , a neighborhood node for community 3. Node A 3 is exposed and is likely to spread the information, thus beginning spread of information into a densely connected community. Thus it is important to predict nodes in the boundary of communities that are likely to become information spreaders. 2. Information penetrates the community: Consider the scenario where A 3 decides to propagate a message. Nodes B 3 , D 3 and E 3 , which are immediate followers of A 3 are now exposed to the information. Due to their close proximity, they are vulnerable to believing the endorser. The remaining nodes of the community (C 3 , F 3 ) are two steps away from A 3 . Similarly for community 8 when the message has reached node A 8 , nodes D 8 and F 8 are one step away and remaining community members (E 8 , C 8 , B 8 ) are two steps away. Intuitively, in a closely-knit community structure if one of the nodes decides to spread a piece of information, the likelihood of it spreading quickly within the entire community is very high. Thus it is important to detect nodes within a community that are likely to become information spreaders to protect the health of the entire community. Above motivation ideas were elaborated in [9] . Next we discuss some concepts used by our proposed model. Consider the scenario described in Figure 1 . If a community member believes the information and becomes a spreader, the likelihood of other community members becoming spreaders would be high due to dense connectivity, and hence higher trust, among community members. Using the Community Health Assessment model we propose the ideas of neighbor, boundary and core nodes for every community in a social network. The three types of nodes from community (com) perspective that are affected during the process of news spreading are explained below: 1. Neighbor nodes (N com ): These nodes are directly connected to at least one node of the community. They are not a part of the community. 2. Boundary nodes (B com ): These are community nodes that are directly connected to at least one neighbor node. It is important to note that only community nodes that have an outgoing edge towards a neighbor node are in B com . 3. Core nodes (C com ): Community nodes that are only connected to members within the community. The idea was proposed in [6] to show how trust plays a more important role in spreading fake news compared to true news. The neighbor, boundary, and core nodes for communities in Figure 1 are listed in Table I. TABLE I . Neighbor, boundary and core nodes for communities in Figure 1 . The Trust in Social Media (TSM) algorithm assigns a pair of complementary trust scores to each node in a network called Trustingness and Trustworthiness. Trustingness (ti) quantifies the propensity of a node to trust its neighbors and Trustworthiness (tw) quantifies the willingness of the neighbors to trust the node. The TSM algorithm takes a user network, i.e., a directed graph G(V, E), as input together with a specified convergence criteria or a maximum permitted number of iterations. In each iteration for every node in the network, trustingness and trustworthiness are computed using the equations mentioned below: where u, v, x ∈ V are user nodes, ti(v) and tw(u) are trustingness and trustworthiness scores of v and u, respectively, w(v, x) is the weight of edge from v to x, out(v) is the set of out-edges of v, in(u) is the set of in-edges of u, and s is the involvement score of the network. Involvement is basically the potential risk a node takes when creating a link in the network, which is set to a constant empirically. The details of the algorithm are excluded due to space constraints and can be found in [4] . Believability is an edge score derived from Trustingness and Trustworthiness scores. It quantifies how likely the receiver of a message is to believe its sender. Believability for a directed edge is naturally computed as a function of the trustworthiness of the sender and the trustingness of the receiver. So, the believability score is supposed to be proportional to the two values above, which can be jointly determined and computed as follows: The idea has been applied in [5] where an RNN model was proposed to identify rumor spreaders in Twitter networks. IV. PROPOSED APPROACH Problem Formulation: Given a directed social network G(V, E) comprising disjoint modular communities (φ), with each community (com ∈ φ) having well-defined neighbor nodes (N com ), boundary nodes (B com ) and core nodes (C com ). Aggregating topology-based (top) and activity-based (act) trust properties from nodes sampled from depth K (where N br K=1 (b) ⊆ N com ), we want to predict boundary nodes b that are most likely to become information spreaders (b sp ). Similarly, we aggregate nodes sampled from depth K (where N br K=1 (c) ⊆ B com ) to predict core nodes c that are most likely to become information spreaders (c sp ). Inductive Representation Learning: As fake news spreads rapidly, network structure around the spreaders also evolves quickly. Thus, it is important to have a scalable model that is able to quickly learn meaningful representations for newly seen (i.e. exposed) nodes without relying on the complete network structure. Most graph representation learning techniques, however, employ a transductive approach to learning node representations which optimizes the embeddings for nodes based on the entire graph structure. We employ an inductive approach inspired from GraphSAGE [2] to generate embeddings for the nodes as the information spreading network gradually evolves. It learns an aggregator function that generalizes to unseen node structures which could become potential information spreaders. The idea is to simultaneously learn the topological structure and node features from the neighborhood (N br) nodes, by training a set of aggregator functions instead of individual node embeddings. Using an inductive representation learning model we learn features of the exposed population (i.e. followers of the spreaders) by aggregating trust-based features from their neighborhood nodes. Figure 2 shows how we model the proposed approach with community perspective. Nodes outside the solid oval represent N com , between solid and dotted oval represents B com and within the dotted oval represents C com . (a) shows that false information spread has reached the two neighbor nodes (highlighted in red). Three boundary nodes (circled in red) are exposed to the information. In (b) we learn representations for the exposed boundary nodes by aggregating features of their local neighborhood structure (denoted by white nodes). Two out of the three boundary nodes that become spreaders are highlighted and the exposed core nodes are circled. Similarly, in (c) we learn representations for the exposed core nodes by aggregating their local neighborhood features. One core node becomes a spreader and the community is now vulnerable to fake news spreading. The proposed framework is explained as follows: First we generate a weighted information spreading network based on interpersonal trust. We then sample neighborhood with a probability proportional to the trust based edge weights. For the sampled neighborhood we aggregate their feature representations. Finally we explain the loss function used to learn parameters of the model. Graph of the information spreading network has edge weights that quantify the likelihood of trust formation between senders and receivers. Once we compute these edge scores using techniques mentioned in Table II , we normalize weights for all out-edges connecting the boundary node. Similarly we normalize weights for all in-edges connecting the boundary node. Instead of sampling neighborhood as a uniform distribution, we sample a subset of neighbors proportional to the weights of the edges connecting them. Sampling is done recursively till depth K. The idea is to learn features from neighbors proportional to the level of inter-personal trust. Algorithm 1 explains the sampling strategy. After sampling neighborhood as an unordered set, we aggregate the embeddings of sampled nodes till depth K recursively for each boundary node. The intuition is that at each depth, the Eq 4 end for end for end for boundary nodes incrementally learn trust-based features from the sampled neighborhood. Three aggregation architectures namely mean, LSTM and pooling explained in [2] can be used. For simplicity, we only apply the mean aggregator, which takes the mean of representations h k−1 u where u ∈ N br k−1 (b). The aggregator is represented below: ∀u∈N br(b) ) )}) (5) Algorithm 2 explains the aggregation strategy. The weight matrices in Algorithm 2 are tuned using stochastic gradient descent on a loss function in order to learn the parameters. We train the model to minimize cross-entropy. The loss function is modeled to predict whether the boundary node is an information spreader (b Sp ) or a non-spreader (bS p ). y represents the actual class (2-dimensional multinomial distribution of [1, 0] for spreader and [0,1] for non-spreader) andŷ represents the predicted class. We extend the model for C com to identify the core node spreaders (c Sp ) and non-spreaders (cS p ). Considering boundary nodes have denser neighborhood compared to core nodes, we later analyze whether the proposed model is more sensitive to density of neighborhood structure or the aggregated features. The implementation code is made publicly available 2 . As explained in the preliminaries section, interpersonal trust has been applied successfully in the past to model spreading of fake news. Thus we model our node representation learning problem using interpersonal trust to predict whether a node is a spreader or not. We first apply a non-uniform neighborhood sampling strategy using weighted graph (where edge weights quantify the likelihood of trust formation). We then aggregate two trust features: 1) The likelihood of trusting others and 2) The likelihood of being trusted by others. We use two kinds of interpersonal-trust: Topology-based (top) computed from the social network topology and Activity-based (act) computed using timeline activity data collected for every node using Twitter API. We use trustingness (ti(x)) and trustworthiness (tw(x)) scores of node x obtained from TSM as proxy for topology-based trust features and the fraction of timeline statuses of x that are retweets (RT x ) denoted by ∀i∈t {1 if i = RT x else 0}/n(t) and average number of times x's tweets are retweeted (n(RT x )) denoted by ∀i∈t i n(RTx) /n(t) as activity-based trust features (t represents most recent tweets posted on x's timeline 3 ). For an edge from x to v, the topology-based edge weight is the believability score (bel xv ) and activity-based edge weight is the number of times x is retweeted by v (RT xv ). Trust-based sampling and aggregation strategy is summarized in Table II . We evaluate our proposed model using real world Twitter datasets. We obtained the ground truth of false information and the refuting true information from altnews.in, a popular fact checking website. The source tweet related to the information was obtained directly as a tweet embedded in the website or through a keyword based search on Twitter. From the source tweet we generated the source tweeter and the retweeters (proxy for spreaders), follower-following network of the spreaders (proxy for network) and the timeline data for all nodes in the network (to generate trust-based features) using the Twitter API. Besides evaluating our model on false information (F) and the refuting true information (T) networks separately, we also evaluated on network obtained by combining them (F ∪ T). Metadata for the network dataset aggregated for all news events is summarized in Table III . We obtained the topology-based measures by running TSM algorithm on the network to obtained ti, tw for all nodes and bel for all edges. We used the generic settings for TSM parameters (number of iterations = 100, involvement score = 0.391) by refering to [4] . We found the disjoint modular communities using Louvain community detection algorithm [7] and identified the neighbor, boundary and core nodes for every community using Community Health Assessment model. We then generated the activity-based measures from timeline data of the nodes. The embeddings are generated using the forward propagation method shown in Algorithm 2, assuming that the model parameters are learnt using Equation 6 . Due to class imbalance we undersample the majority class to obtain balanced spreader and non-spreader class distribution. The size of hidden units is set to 128 and the learning rate is set to 0.001. We used rectified linear units as the non-linear activation function. The batch size was adjusted for optimal performance depending on the size of training dataset. Due to the heavy-tailed nature of degree distributions of edges in social networks we downsample before modeling, which ensured that the neighborhood information is stored in dense adjaceny lists. This drastically reduces our run time, which is ideal for early detection of spreaders. We also set sampling depth K=1 because the network constitutes only immediate follower-following nodes of the spreaders. We compared results for the following models, including baselines: 1) T rusting others: Intuitively, users with high likelihood to trust others tend to be spreaders of false information. This model learns a threshold based on correlation between 'trusting others' features (both topology-and activity-based) and user ground truth. 2) T rusted by others: Intuitively, users with high likelihood to be trusted by others tend to be spreaders of false information. Like the previous model, this model learns a threshold based on correlation between 'trusted by others' features (both topology-and activity-based) and user ground truth. 3) Interpolation: This model linearly combines 'trusting others' and 'trusted by others' features to find an optimal threshold. This model applies LINE [13] which serves as transductive learning baseline. 5) GCN top : This model implements graph convolutional networks [3] based transductive learning model that aggregates topology features from neighborhood. 6) GCN act : This is the graph convolutional networks based model that aggregates activity features from neighborhood. 7) SA rand GE top : This model applies the inductive learning by sampling neighborhood considered as uniform distribution and aggregating only topology based features. 8) SA rand GE act : This model applies the inductive learning by sampling neighborhood considered as uniform distribution and aggregating only activity based features. 9) SA top GE top : Instead of random sampling, we sample on the believability (bel) weighted network and aggregate their topology based features. 10) SA top GE act : Sampling approach is identical to 11) but we aggregate neighborhood's activity based features. 12) SA act GE top : We sample neighborhood non-uniformly on the retweet count (RT ) weighted network and aggregate their topology based features. 13) SA act GE act : Sampling approach is identical to 14) but we aggregate neighborhood's activity based features. Baseline models 1) -3) are inspired from [5] that considers features based on trust. Baseline model 4) considers features based on network structure only. Proposed models 5) -13) integrate both neighborhood structure and node features. We analyze the best combination of sampling and aggregating strategy that predicts spreader node with highest accuracy. For evaluation we did a 80-10-10 train-validation-test split of the dataset. We used 5-fold cross validation and four common metrics: Accuracy, Precision, Recall and F1 score. We only show results for the spreader class. We evaluated our proposed model on 10 debunked news events. For each news event we obtained three types of networks: network for the false information (F), for the true information (T) refuting it and the network obtained by combining them (F ∪ T). Thus we ran our models on 30 largescale networks. Boundary node analysis (Less dense Nbr): Table IV summarizes results for the boundary node prediction aggregated for all news. The results show that F performs better than T on almost every metrics while F ∪ T performs poorly. The poor performance of F ∪ T networks could be attributed to the fact that there is minimal overlap of nodes in F and T networks (12%) which causes the F ∪ T networks to have sparser communities. Also false and true information spreaders are together considered as spreader class which could be affecting the model performance. While comparing the baseline models, T rusted by others model performs better than the T rusting others model with an improvement in accuracy of 4.8%, 5% and 1.5% for F, T and F ∪ T networks respectively. Interpolation model shows a further improvement of 2.3%, 2.3% and 1.1% for F, T and F ∪ T networks respectively over trustingness model. LIN E and GCN baselines show significant improvement on all metrics for F networks compared to T or F ∪ T networks. We see further substantial increase in performance for each type of network using inductive learning models. Comparing the two random sampler models (i.e. SA rand GE top , SA rand GE act ) we see that topology-based features of the neighborhood perform better than activity-based features. Similar trend is observed for topology-based sampler models (i.e. SA top GE top , SA top GE act ) where model using topology-based aggregator performs better than activity-based aggregator. Same is the case for activity-based sampler models (i.e. SA act GE top , SA act GE act ). Integrating top and act does not show any significant improvement over top only models. Thus we can conclude that interpersonal trust based modeling in the inductive learning framework is able to predict false information spreaders better than true information spreaders. We also observe that topology-based sampling and aggregating strategies perform better than activity-based strategies. The low performance of activity-based strategies could be attributed to the fact that many Twitter users are either inactive users or users with strict privacy settings whose timeline data could not be retrieved. Also recent 10 activities on a user's timeline might be insufficient data to capture activity-based trust dynamics. For each type of network, we observe that SA top GE top model performs the best, with F having accuracy of 93.3%, which is higher than 12.3% and 52.1% over T and F ∪ T networks respectively. Figure 3 shows the performance metrics of this model for the 10 news events (N1-N10). We observe a clear distinction in performance, with F networks performing better than T, which in turn is better than F ∪ T. An interesting observation is the high precision values for T. This is because the percentage of predicted spreaders which are non-spreaders tends to be lower for T network than for F network. Core node analysis (More dense Nbr): Table V summarizes results of the model for predicting core nodes aggregated for all news. The overall performance trend is identical to the results shown for boundary nodes in Table IV . Among the baseline models, Interpolation model performs better than T rusted by others and T rusting others models. LIN E and GCN based models show significant improvement over trust feature baselines on all metrics. Among inductive learning models, topology-based trust modeling shows better performance than activity-based trust modeling. Also F networks perform better than T networks, which in turn perform better than F ∪ T networks. Among random sampler models, SA rand GE top has the highest accuracy of 84.2%, 72.6% and 65.6% for F, T and F ∪ T networks respectively. Among topology-based sampler models SA top GE top performs better over SA top GE act with an increase in accuracy of 2.8%, 4.5% and 7.1% for F, T and F ∪ T networks respectively. Activity-based sampler models also show identical trend with SA act GE top performing better than SA act GE act with an increase in accuracy of 2.6%, 9% and 4.6% for F, T and F ∪ T networks respectively. Among all models SA top GE top shows the best overall performance. Figure 4 shows the metric performance of this model for the 10 news events. True information network for N10 is excluded from analysis as it did not have sufficient spreaders to train our model on. A clear observation is that the metric performance for the three types of networks is not as distinct as in Figure 3 . Even though the number of core nodes is much higher than boundary nodes, the number of core spreaders is much smaller than boundary node spreaders. Thus the model fails to learn meaningful representations for core nodes due to smaller training dataset. Summary: Comparing the prediction performance of core and boundary spreaders we can conclude that our model's performance is more sensitive to aggregated features and training dataset size compared to density of neighborhood. In this paper we proposed a novel fake news spreader detection model for communities using inductive representation learning and community health assessment. Using interpersonal trust based properties we could identify spreaders with high accuracy, and also showed that the proposed model identifies false information spreaders more accurately than true information spreaders. The key hypothesis we tested is that interpersonal trust plays a significantly more important role in identifying false information spreaders than true information spreaders. Identified false information spreaders can thus be quarantined and true news spreaders can be promoted, thus serving as an effective mitigation strategy. Experimental analysis on Twitter data showed that topology-based modeling yields better results compared to activity-based modeling. The proposed research can be used to identify people who are likely to become spreaders in real-time due to its ability to adapt to rapidly evolving information spreading networks. As part of future work we want to test our model on higher volume of user timeline activity which would give a better picture of the effectiveness of the activity-based approach. We would also want to take into consideration the presence of bots. We would also want to extend the network further in order to sample neighborhood from greater sampling depths. Call attention to rumors: Deep attention based recurrent neural networks for early rumor detection Inductive representation learning on large graphs Semi-supervised classification with graph convolutional networks Trustingness & trustworthiness: A pair of complementary trust measures in a social network Utilizing computational trust to identify rumor spreaders on Evaluating vulnerability to fake news in social networks:a community health assessment model Fast unfolding of communities in large networks Rumor detection on social media with bi-directional graph convolutional networks Epidemiology inspired framework for fake news mitigation in social networks Homogeneity-based transmissive process to model true and false news in social networks The spread of true and false news online A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems LINE: Largescale Information Network Embedding Jointly embedding the local and global relations of heterogeneous graph for rumor detection Graph-aware Co-Attention Networks for Explainable Fake News Detection on Social Media 58th Annual Meeting of the Association for Computational Linguistics FANG 29th ACM International Conference on Information and Knowledge Management