key: cord-156676-wes5my9e
authors: Masud, Sarah; Dutta, Subhabrata; Makkar, Sakshi; Jain, Chhavi; Goyal, Vikram; Das, Amitava; Chakraborty, Tanmoy
title: Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter
date: 2020-10-09
journal: nan
DOI: nan
sha: 
doc_id: 156676
cord_uid: wes5my9e

Online hate speech, particularly over microblogging platforms like Twitter, has emerged as arguably the most severe issue of the past decade. Several countries have reported a steep rise in hate crimes infuriated by malicious hate campaigns. While the detection of hate speech is one of the emerging research areas, the generation and spread of topic-dependent hate in the information network remain under-explored. In this work, we focus on exploring user behaviour, which triggers the genesis of hate speech on Twitter and how it diffuses via retweets. We crawl a large-scale dataset of tweets, retweets, user activity history, and follower networks, comprising over 161 million tweets from more than $41$ million unique users. We also collect over 600k contemporary news articles published online. We characterize different signals of information that govern these dynamics. Our analyses differentiate the diffusion dynamics in the presence of hate from usual information diffusion. This motivates us to formulate the modelling problem in a topic-aware setting with real-world knowledge. For predicting the initiation of hate speech for any given hashtag, we propose multiple feature-rich models, with the best performing one achieving a macro F1 score of 0.65. Meanwhile, to predict the retweet dynamics on Twitter, we propose RETINA, a novel neural architecture that incorporates exogenous influence using scaled dot-product attention. RETINA achieves a macro F1-score of 0.85, outperforming multiple state-of-the-art models. Our analysis reveals the superlative power of RETINA to predict the retweet dynamics of hateful content compared to the existing diffusion models.

For the past half-a-decade, in synergy with the sociopolitical and cultural rupture worldwide, online hate speech has manifested as one of the most challenging issues of this century transcending beyond the cyberspace. Many hate crimes against minority and backward communities have been directly linked with hateful campaigns circulated over Facebook, Twitter, Gab, and many other online platforms [1] , [2] . Online social media has provided an unforeseen speed of information spread, aided by the fact that the power of content generation is handed to every user of these platforms. Extremists have exploited this phenomenon to disseminate hate campaigns to a degree where manual monitoring is too costly, if not impossible.

Thankfully, the research community has been observing a spike of works related to online hate speech, with a vast majority of them focusing on the problem of automatic detection of hate from online text [3] . However, as Ross et al. [4] pointed it out, even manual identification of hate speech comes with ambiguity due to the differences in the definition of hate. Also, an important signal of hate speech is the presence of specific words/phrases, which vary significantly across topics/domains. Tracking such a diverse socio-linguistic phenomenon in realtime is impossible for automated, large-scale platforms.

An alternative approach can be to track potential groups of users who have a history of spreading hate. As Matthew et al. [5] suggested, such users are often a very small fraction of the total users but generate a sizeable portion of the content. Moreover, the severity of hate speech lies in the degree of its spread, and an early prediction of the diffusion dynamics may help combat online hate speech to a new extent altogether. However, a tiny fraction of the existing literature seeks to explore the problem quantitatively. Matthew et al. [5] put up an insightful foundation for this problem by analyzing the dynamics of hate diffusion in Gab 1 . However, they do not tackle the problem of modeling the diffusion and restrict themselves to identifying different characteristics of hate speech in Gab.

Hate speech on Twitter: Twitter, as one of the largest micro-blogging platforms with a worldwide user base, has a long history of accommodating hate speech, cyberbullying, and toxic behavior. Recently, it has come hard at such contents multiple times 2 , and a certain fraction of hateful tweets are often removed upon identification. However, a large majority of such tweets still circumvent Twitter's filtering. In this work, we choose to focus on the dynamics of hate speech on Twitter mainly due to two reasons: (i) the wide-spread usage of Twitter compared to other platforms provides scope to grasp the hate diffusion dynamics in a more realistic manifestation, and (ii) understanding how hate speech emerges and spreads even in the presence of some top-down checking measures, compared to unmoderated platforms like Gab.

Diffusion patterns of hate vs. non-hate on Twitter: Hate speech is often characterized by the formation of echochambers, i.e., only a small group of people engaging with such contents repeatedly. In Figure 1 , we compare the temporal diffusion dynamics of hateful vs. non-hate tweets (see Sections VI-A and VI-B for the details of our dataset and hate detection methods, respectively). Following the standard information diffusion terminology, the set of susceptible nodes at any time instance of the spread is defined by all such nodes which have been exposed to the information (followers of those who have posted/retweeted the tweet) up to that instant but did not participate in spreading (did not retweet/like/comment). While hateful tweets are retweeted in a significantly higher magnitude compared to non-hateful ones (see Figure 1 (a)), they tend to create lesser number of susceptible users over time (see Figure 1 (b)). This is directly linked to two major phenomena: primarily, one can relate this to the formation of hate echo-chambers -hateful contents are distributed among a well-connected set of users. Secondarily, as we define susceptibility in terms of follower relations, hateful contents, therefore, might have been diffusing among connections beyond the follow network -through paid promotion, etc. Also one can observe the differences in early growth for the two types of information; while hateful tweets acquire most of their retweets and susceptible nodes in a very short time and stall, later on, non-hateful ones tend to maintain the spread, though at a lower rate, for a longer time. This characteristic can again be linked to organized spreaders of hate who tend to disseminate hate as early as possible.

Topic-dependence of Twitter hate: Hateful contents show strong topic-affinity: topics related to politics and social issues, for example, incur much more hateful content compared to sports or science. Hashtags in Twitter provide an overall mapping for tweets to topics of discussion. As shown in Figure 2 , the degree of hateful content varies significantly for different hashtags. Even when different hashtags share a common theme (such as topic of discussion #jamiaunderattack, #jamiaviolence and #jamiaCCTV), they may still incur a different degree of hate. Previous studies [5] tend to denote users as hate-preachers irrespective of the topic of discussion. However, as evident in Figure 3 , the degree of hatefulness expressed by a user is dependent on the topic as well. For example, while some users resort to hate speech concerning COVID-19 and China, others focus on topics around the protests against the Citizenship Amendment Act in India.

Exogenous driving forces: With the increasing entangle- The color of a cell corresponds to a user, and a hashtag signifies the ratio of hateful to non-hate tweets posted by that user using that specific hashtag. ment of virtual and real social processes, it is only natural that events happening outside the social media platforms tend to shape the platform's discourse. Though a small number of existing studies attempt to inquire into such inter-dependencies [6] , [7] , the findings are substantially motivating in problems related to modeling information diffusion and user engagement in Twitter and other platforms. In the case of hate speech, exogenous signals offer even more crucial attributes to look into, which is global context. For both detecting and predicting the spread of hate speech over short tweets, the knowledge of context is likely to play a decisive role Present work: Based on the findings of the existing literature and the analysis we presented above, here we attempt to model the dynamics of hate speech spread on Twitter. We separate the process of spread as the hate generation (asking for who will start a hate campaign) and retweet diffusion of hate (who will spread an already started hate campaign via retweeting). To the best of our knowledge, this is the very first attempt to delve into the predictive modeling of online hate speech. Our contributions can be summarized as follows:

1) We formalize the dynamics of hate generation and retweet spread on Twitter subsuming, the activity history of each user and signals propagated by the localized structural properties of the information network of Twit-ter induced by follower connections as well as global endogenous and exogenous signals (events happening inside and outside of Twitter) (See Section III). 2) We present a large dataset of tweets, retweets, user activity history, and the information network of Twitter covering versatile hashtags, which made to trend very recently. We manually annotate a significant subset of the data for hate speech. We also provide a corpus of contemporary news articles published online (see Section VI-A for more details). 3) We unsheathe rich set of features manifesting the signals mentioned above to design multiple prediction frameworks which forecast, given a user and a contemporary hashtag, whether the user will write a hateful post or not (Section IV). We provide an in-depth feature ablation and ensemble methods to analyze our proposed models' predictive capability, with the best performing one resulting in a macro F1-score of 0.65. 4) We propose RETINA (Retweeter Identifier Network with Exogenous Attention), a neural architecture to predict potential retweeters given a tweet (Section V-B).

RETINA encompasses an attention mechanism which dictates the prediction of retweeters based on a stream of contemporary news articles published online. Features representing hateful behavior encoded within the given tweet as well as the activity history of the users further help RETINA to achieve a macro F1-score of 0.85, significantly outperforming several state-of-the-art retweet prediction models.

We have made public our datasets and code along with the necessary instructions and parameters, available at https://github.com/LCS2-IIITD/RETINA.

Hate speech detection. In recent years, the research community has been keenly interested in better understanding, detection, and combating hate speech on online media. Starting with the basic feature-engineered Logistic Regression models [8] , [9] to the latest ones employing neural architectures [10] , a variety of automatic online hate speech detection models have been proposed across languages [11] . To determine the hateful text, most of these models utilize a static-lexicon based approach and consider each post/comment in isolation. With lack of context (both in the form of individual's prior indulgence in the offense and the current world view), the models trained on previous trends perform poorly on new datasets. While linguistic and contextual features are essential factors of a hateful message, the destructive power of hate speech lies in its ability to spread across the network. However, only recently have researchers started using network-level information for hate speech detection [12] , [13] . Rathpise and Adji [14] proposed methods to handle class imbalance in hate speech classification. A recent work showed how the anti-social behavior on social media during COVID-19 led to the spread of hate speech. Awal et al. [15] coined the term, 'disability hate speech' and showed its social, cultural and political contexts. Ziems et al. [16] explained how COVID-19 tweets increased racism, hate, and xenophobia in social media.

While our work does not involve building a new hate speech detection model, yet hate detection underpins any work on hate diffusion in the first place. Inspired by existing research, we also incorporate hate lexicons as a feature for the diffusion model. The lexicon is curated from multiple sources and manually pruned to suit the Indian context [17] . Meanwhile, to overcome the problem of context, we utilize the timeline of a user to determine her propensity towards hate speech.

Information diffusion and microscopic prediction. Predicting the spread of information on online platforms is crucial in understanding the network dynamics with applications in marketing campaigns, rumor spreading/stalling, route optimization, etc. The latest in the family of diffusion being the CHASSIS [18] model. On the other end of the spectrum, the SIR model [19] effectively captures the presence of R (Recovered) nodes in the system, which are no longer active due to information fatigue 3 . Even though limited in scope, the SIR model serves as an essential baseline for all diffusion models.

Among other techniques, a host of studies employ social media data for both macroscopic (size and popularity) and microscopic (next user(s) in the information cascade) prediction. While highly popular, both DeepCas [20] and DeepHawkes [21] focus only on the size of the overall cascade. Similarly, Khosla et al. [22] utilized social cues to determine the popularity of an image on Flickr. While Independent Cascade (IC) based embedding models [23] , [24] led the initial work in supervised learning based microscopic cascade prediction; they failed to capture the cascade's temporal history (either directly or indirectly). Meanwhile, Yang et al. [25] presented a neural diffusion model for microscopic prediction, which employs recurrent neural architecture to capture the history of the cascade. These models focus on predicting the next user in the cascade from a host of potential candidates. In this regard, TopoLSTM [26] considers only the previously seen nodes in any cascade as the next candidate without using timestamps as a feature. This approximation works well under limited availability of network information and the absence of cascade metadata. Meanwhile, FOREST [27] considers all the users in the global graph (irrespective of one-hop) as potential users, employing a time-window based approach. Work by Wang et al. [28] lies midway of TopoLSTM and FOREST, in that it does not consider any external global graph as input, but employs a temporal, two-level attention mechanism to predict the next node in the cascade. Zhou et al. [29] compiled a detailed outline of recent advances in cascade prediction.

Compared to the models discussed above for microscopic cascade prediction, which aim to answer who will be the next participant in the cascade, our work aims to determine whether a follower of a user will retweet (participate in the 

Probability of ui retweeting (static vs. j th interval) X T , X N Feature tensors for tweet and news X T,N Output from exogenous attention cascade) or not. This converts our use case into a binary classification problem, and adds negative sampling (in the form on inactive nodes), taking the proposed model closer to realworld scenario consisting of active and passive social media users.

The spread of hate and exploratory analysis by Mathew et al. [5] revealed exciting characteristics of the breadth and depth of hate vs. non-hate diffusion. However, their methodology separates the non-haters from haters and studies the diffusion of two cascades independently. Real-world interactions are more convoluted with the same communication thread containing hateful, counter-hateful, and non-hateful comments. Thus, independent diffusion studies, while adequate at the exploratory analysis of hate, cannot be directly extrapolated for predictive analysis of hate diffusion. The need is a model that captures the hate signals at the user and/or group level. By taking into account the user's timeline and his/her network traits, we aim to capture more holistic hate markers.

Exogenous influence. As early as 2012, Myers et al. [7] exposed that external stimuli drive one-third of the information diffusion on Twitter. Later, Hu et al. [30] proposed a model for predicting user engagement on Twitter that is factored by user engagement in 600 real-world events. From employing world news data for enhancing language models [31] to boosting the impact of online advertisement campaigns [32] , exogenous influence has been successfully applied in a wide variety of tasks. Concerning social media discourse, both De et al. [33] in opinion mining and Dutta et al. [6] in chatter prediction corroborated the superiority of models that consider exogenous signals.

Since our data on Twitter was collected based on trending Indian hashtags, it becomes crucial to model exogenous signals, some of which may have triggered a trend in the first place. While a one-to-one mapping of news keywords to trending keywords is challenging to obtain, we collate the most recent (time-window) news w.r.t to a source tweet as our ground-truth. To our knowledge, this is the first retweet prediction model to consider external influence.

An information network of Twitter can be defined as a directed graph G = {U, E}, where every user corresponds to a unique node u i ∈ U, and there exists an ordered pair (u i , u j ) ∈ E if and only if the user corresponding to u j follows user u i . (Table I summarizes important notations and  denotations. ) Typically, the visible information network of Twitter does not associate the follow relation with any further attributes, therefore any two edges in E are indistinguishable from each other. We associate unit weight to every e ∈ E.

Every user in the network acts as an agent of content generation (tweeting) and diffusion (retweeting). For every user u i at time t 0 , we associate an activity history

The information received by user u i has three different sources: (a) Peer signals (S P i ): The information network G governs the flow of information from node to node such that any tweet posted by u i is visible to every user u j if (u i , u j ) ∈ E; (b) Non-peer endogenous signals (S en ): Trending hashtags, promoted contents, etc. that show up on the user's feed even in the absence of peer connection; (c) Exogenous signals (S ex ): Apart from the Twitter feed, every user interacts with the external world-events directly (as a participant) or indirectly (via news, blogs, etc.).

Hate generation. The problem of modeling hate generation can be formulated as assigning a probability with each user that signifies their likelihood to post a hateful tweet. With our hypothesis of hateful behavior being a topic-dependent phenomenon, we formalize the modeling problem as learning the parametric function,

where T is a given topic, t is the instance up to which we obtain the observable history of u i , d is the dimensionality of the input feature space, and θ 1 is the set of learnable parameters. Though ideally P (u i |T ) should be dependent on S P i as well, the complete follower network for Twitter remains mostly unavailable due to account settings, privacy constraints, inefficient crawling, etc.

Hate diffusion. As already stated, we characterize diffusion as the dynamic process of retweeting in our context. Given a tweet τ (t 0 ) posted by some user u i , we formulate the problem as predicting the potential retweeters within the interval [t 0 , t 0 + ∆t]. Assuming the probability density of a user u j retweeting τ at time t to be p(t), then retweet prediction problem translates to learning the parametric function

Eq. 2 is the general form of a parametric equation describing retweet prediction. In our setting, the signal components S P j , H j,t , and the features representing the tweet τ incorporates the knowledge of hatefulness. Henceforth, we call τ the root tweet and u i the root user. It is to be noted that, the features representing the peer, non-peer endogenous, and exogenous signals in Eq. 1 and 2 may differ due to the difference in problem setting.

Beyond organic diffusion. The task of identifying potential retweeters of a post on Twitter is not straightforward. In retrospect, the event of a user retweeting a tweet implies that the user must have been an audience of the tweet at some point of time (similar to 'susceptible' nodes of contagion spread in the SIR/SIS models [19] , [34] ). For any user, if at least one of his/her followees engages with the retweet cascade, then the subject user becomes susceptible. That is, in an organic diffusion, between any two users u i , u j there exists a finite path u i , u i+1 . . . , u j in G such that each user (except u i ) in this path is a retweeter of the tweet by u i . However, due to account privacy etc., one or more nodes within this path may not be visible. Moreover, contents promoted by Twitter, trending topics, content searched by users independently may diffuse alongside their organic diffusion path. Searching for such retweeters is impossible without explicit knowledge of these phenomena. Hence, we primarily restrict our retweet prediction to the organic diffusion, though we experiment with retweeters not in the visibly organic diffusion cascade to see how our models handle such cases.

To realize Eq. 1, we signify topics as individual hashtags. We rely purely on manually engineered features for this task so that rigorous ablation study and analysis produce explainable knowledge regarding this novel problem. The extracted features instantiate different input components of f 1 in Eq. 1. We formulate this task in a static manner, i.e., assuming that we are predicting at an instance t 0 , we want to predict the probability of the user posting a hateful tweet within [t 0 , ∞]. While training and evaluating, we set t 0 to be right before the actual tweeting time of the user.

The activity history of user u i , signified by H i,t is substantiated by the following features:

• We use unigram and bigram features weighted by tf-idf values from 30 most recent tweets posted by u i to capture its recent topical interest. To reduce the dimensionality of the feature space, we keep the top 300 features sorted by their idf values.

• To capture the history of hate generation by u i , we compute two different features her most recent 30 tweets: (i) ratio of hateful vs. non-hate tweets and (ii) a hate lexicon vector HL = {h i |h i ∈ II + and i = 1, . . . , |H|}, where H is a dictionary of hate words, and h i is the frequency of the i th lexicon from H among the tweet history.

• Users who receive more attention from fellow users for hate propagation are more likely to generate hate. Therefore, we take the ratio of retweets of previous hateful tweets to nonhateful ones by u i . We also take the ratio of total number of retweets on hateful and non-hateful tweets of u i .

• Follower count and date of account creation of u i .

• Number of topics (hashtags) u i has tweeted on up to t.

We compute Doc2Vec [35] representations of the tweets, along with the hashtags present in them as individual tokens.

We then compute the average cosine similarity between the user's recent tweets and the word vector representation of the hashtag, this serves as the topical relatedness of the user towards the given hashtag.

To incorporate the information of trending topics over Twitter, we supply the model with a binary vector representing the top 50 trending hashtags for the day the tweet is posted.

We compute the average tf-idf vector for the 60 most recent news headlines from our corpus posted before the time of the tweet. Again we select the top 300 features.

Using the above features, we implement six different classification models(and their variants). Details of the models are provided in Section VI-C.

V. RETWEET PREDICTION While realizing Eq. 2 for retweeter prediction, we formulate the task in two different settings: the static retweeter prediction task, where t 0 is fixed, and ∆t is ∞ (i.e., all the retweeters irrespective of their retweet time) and the dynamic retweeter prediction task where we predict on successive time intervals.

For these tasks, we rely on features both designed manually as well as extracted using unsupervised/self-supervised manner.

For the task of retweet prediction, we extract features representing the root tweet itself, as well as the signals of Eq. 2 corresponding to each user u i (for which we predict the possibility of retweeting). Henceforth, we indicate the root user by u 0 .

Here, we incorporate S P i using two different features: shortest path length from u 0 to u i in G, and number of times u i has retweeted tweets by u 0 . All the features representing H i,t and S en remain same as described in Section IV.

We incorporate two sets of features representing the root tweet τ : the hate lexicon vector similar to Section IV-A and top 300. We varied the size of features from 100 to 1000, and the best combination was found to be 300.

For the retweet prediction task, we incorporate the exogenous signal in two different methods. To implement the attention mechanism of RETINA, we use a Doc2Vec representations of the news articles as well as the root tweet. For the rest of the models, we use the same feature set as Section IV-D.

Guided by Eq. 2, RETINA exploits the features described in Section V-A for both static and dynamic prediction of retweeters.

Exogenous attention. To incorporate external information as an assisting signal to model diffusion, we use a variation of scaled dot product attention [36] in RETINA (see Figure 4 ). Given the feature representation of the tweet X T and news Static prediction of retweeters: To predict whether u j will retweet, the input feature X uj is normalized and passed through a feed-forward layer, concatenated with X T,N , and another feed-forward layer is applied to predict the retweeting probability P uj . (c) Dynamic retweet prediction: In this case, RETINA predicts the user retweet probability for consecutive time intervals, and instead of the last feed-forward layer used in the static prediction, we use a GRU layer.

feature sequence X N = {X N 1 , X N 2 , . . . , X N k }, we compute three tensors Q T , K N , and V N , respectively as follows:

where W Q , W K , and W V are learnable parameter kernels (we denote them to belong to query, key and value dense layers, respectively in Figure 4 ). The operation (·) | (−1,0) (·) signifies Tensor contraction according to Einstein summation convention along the specified axis. In Eq. 3, (−1, 0) signifies last and first axis of the first and second tensor, respectively. Therefore, X

Each of W Q , W K , and W V is a two-dimensional tensor with hdim columns (last axis).

Next, we compute the attention weight tensor A between the tweet and news sequence as

where Sof tmax(X[. . . , i, j]) = e X[...,i,j] j e X[...,i,j] . Further, to avoid saturation of the softmax activation, we scale each element of A by hdim −0.5 [36] .

The attention weight is then used to produce the final encoder feature representation X T,N by computing the weighted average of V N as follows:

RETINA is expected to aggregate the exogenous signal exposed by the sequence of news inputs according to the feature representation of the tweet into X T,N , using the operations mentioned in Eqs. 3-5 via tuning the parameter kernels.

Final prediction. With S ex being represented by the output of the attention framework, we incorporate the features discussed in Section V-A in RETINA to subsume rest of the signals (see Eq. 2). For the two separate modes of retweeter prediction (i.e., static and dynamic), we implement two different variations of RETINA.

For the static prediction of retweeters, RETINA predicts the probability of each of the users u 1 , u 2 , . . . , u n to retweet the given tweet with no temporal ordering (see Figure 4 (b)). The feature vector X ui corresponding to user u i is first normalized and mapped to an intermediate representation using a feedforward layer. It is then concatenated with the output of the exogenous attention component, X T,N , and finally, another feed-forward layer with sigmoid nonlinearity is applied to compute the probability P ui .

As opposed to the static case, in the dynamic setting RETINA predicts the probability of every user u i to retweet within a time interval t 0 + ∆t i , t 0 + ∆t i+1 , with t 0 being the time of the tweet published and ∆t 0 = 0. To capture the temporal dependency between predictions in successive intervals, we replace the last feed-forward layer with a Gated Recurrent Unit (GRU), as shown in Figure 4 (c). We experimented with other recurrent architectures as well; performance degraded with simple RNN and no gain with LSTM.

Cost/loss function. In both the settings, the task translates to a binary classification problem of deciding whether a given user will retweet or not. Therefore, we use standard binary cross-entropy loss L to train RETINA:

where t is the ground-truth, p is predicted probability (P ui in static and P ui j in dynamic settings), and w is a the weight given to the positive samples to deal with class imbalance.

We initially started collected data based on topics which led to a tweet corpus spanning across multiple years. To narrow down our time frame and ease the mapping of tweets to news, we restricted our time span from 2020-02-03 to 2020-04-14 and made use of trending hashtags. Using Twitter's official API 4 , we tracked and crawled for trending hashtags each day within this duration. Overall, we obtained 31, 133 tweets from 13, 965 users. We also crawled the retweeters for each tweet along with the timestamps. Table II describes the hashtag-wise detailed statistics of the data. To build the information network, we collected the followers of each user up to a depth of 3, resulting in a total of 41, 151, 251 unique users in our dataset. We also collect the activity history of the users, resulting in a total of 163, 042, 612 tweets in our dataset. One should note that the lack of a wholesome dataset (containing textual, temporal, network signals all in one) is the primary reason why we decided to collect our own dataset in the first place.

We also, crawled the online news articles published within this span using the News-please crawler [37] . We managed to collect a total of 683, 419 news articles for this period. After filtering for language, title and date, we were left with 319, 179 processed items. There headlines were used as the source of the exogenous signal.

We employ three professional annotators who have experience in analyzing online hate speech to annotate the tweets manually. All of these annotators belong to an age group of 22-27 years and are active on Twitter. As the contextual knowledge of real-world events plays a crucial role in identifying hate speech, we ensure that the annotators are well-aware of the events related to the hashtags and topics. Annotators were asked to follow Twitter's policy as guideline for identifying hateful behavior 5 . We annotated a total of 17, 877 tweets with an inter-annotator agreement of 0.58 Krippendorf's α. The low value of inter-annotator's agreement is at par with most hate speech annotation till date, pointing out the hardness of the task even for human subjects. This further strengthens the need for contextual knowledge as well as exploiting beyondthe-text dynamics. We select the final tags based on majority voting.

Based on this gold-standard annotated data, we train three different hate speech classifiers based on the designs given by Davidson et al. [9] (dubbed as Davidson model), Waseem and Hovy [8] , and Pinkesh et al. [10] . With an AUC score 0.85 and macro-F1 0.59, the Davidson model emerges as the best performing one. When the existing pre-trained Davidson model was tested on our annotated dataset, it achieved 0.79 AUC and 0.48 macro-F1. This highlights both the limitations of existing hate detection models to capture newer context, as well as the importance of manual annotations and fine-tuning. We use the fine-tuned model to annotate the rest of the tweets in our dataset (% of hateful tweets for each hashtag is reported in Table II) . We use the machine-annotated tags for the features and training labels in our proposed models only, while the hate generation models are tested solely on gold-standard data.

Along with the manual annotation and trained hate detection model, we use a dictionary of hate lexicons proposed in [17] . It contain a total of 209 words/phrases signaling a possible existence of hatefulness in a tweet. Example of slur terms used in the lexicon include words such as harami (bastard), jhalla (faggot), haathi (elephant/fat). Using the above terms is derogatory and a direct offense. In addition, the lexicon has some colloquial terms such as mulla (muslim), bakar (gossip), aktakvadi (terrorist), jamai (son-in-law) which may carry a hateful sentiment depending on the context in which they are used.

To experiment on our hate generation prediction task, we use a total of 19, 032 tweets(which have atleast 60 news mapping to it from the time of its posting) coming from 12, 492 users to construct the ground-truth. With an 80 : 20 train-test split, there are 611 hateful tweets among 15, 225 in the training data, whereas 129 out of 3, 807 in the testing data. To deal with the severe class imbalance of the dataset, we use both upsampling of positive samples and downsampling of negative samples.

With all the features discussed in Section IV, the full size of the feature vector is 3, 645. We experimented with all our proposed models with this full set of features and dimensionality reduction techniques applied to it. We use Principal Component Analysis (PCA) with the number of components set to 50. Also, we conduct experiments selecting K-best features (K = 50) using mutual information.

We implement a total of six different classifiers using Support Vector Machine (with linear and RBF kernel), Logistic Regression, Decision Tree, AdaBoost, and XGBoost [38] . Parameter settings for each of these are reported in Table III . All of the models, PCA, and feature section are implemented using scikit-learn 6 .

The activity of retweeting, too, shows a skewed pattern similar to hate speech generation. While the maximum number retweets for a single tweet is 196 in our dataset, the average remains to be 13.10. We use only those tweets which have more than one retweet and atleast 60 news mapping to it from the time of its posting. With an 80 : 20 train-test split, this results in a total of 3, 057 and 765 samples for training and testing.

For all the Doc2Vec generated feature vectors related to tweets and news headlines, we set the dimensionality to 50 and 500, respectively. For RETINA, we set the parameter hdim and all the intermediate hidden sizes for the rest of the feedforward (except the last one generating logits) and recurrent layers to 64 (see Section V-B).

Hyperparameter tuning of RETINA. For both the settings (i.e, static and dynamic prediction of retweeters), we used mini-batch training of RETINA, with both Adam and SGD optimizers. We varied the batch size within 16, 32 and 64, with the best results for a batch size of 16 for the static mode and 32 for the dynamic mode. We also varied the learning rates within a range 10 −4 to 10 −1 , and chose the best one with learning rate 10 −2 using the SGD optimizer 7 for the dynamic model. The static counterpart produced the best results with Adam optimizer 8 [39] using default parameters.

To deal with the class imbalance, we set the parameter w in Eq. 6 as w = λ(log C − log C + ), where C and C + are the counts for total and positive samples, respectively in the training dataset, and λ is a balancing constant which we vary from 1 to 2.5 with 0.5 steps. We found the best configurations with λ = 2.0 and λ = 2.5 for the static and dynamic modes respectively. 7 https://www.tensorflow.org/api docs/python/tf/keras/optimizers/SGD 8 https://www.tensorflow.org/api docs/python/tf/keras/optimizers/Adam

In the absence of external baselines for predicting hate generation probability due to the problem's novelty, we explicitly rely on ablation analyses of the models proposed for this task. For retweet dynamics prediction, we implement 5 external baselines and two ablation variants of RETINA. Since information diffusion is a vast subject, we approach it from two perspectives -one is the set of rudimentary baselines (SIR, General Threshold), and the other is the set of recently proposed neural models.

SIR [19] : The Susceptible-Infectious-Recovered (Removed) is one of the earliest predictive models for contagion spread. Two parameters govern the model -transmission rate and recovery rate, which dictate the spread of contagion (retweeting in our case) along with a social/information network.

Threshold Model [40] : This model assumes that each node has threshold inertia chosen uniformly at random from the interval [0, 1]. A node becomes active if the weighted sum of its active neighbors exceeds this threshold.

Using the same feature set as described in Section V-A, we employ four classifiers -Logistic Regression, Decision Tree, Linear SVC, and Random Forest (with 50 estimators). All of these models are used for the static mode of retweet prediction only. Features representing exogenous signals are engineered in the same way as described in Section IV-D.

To overcome the feature engineering step involving combinations of topical, contextual, network, and user-level features, neural methods for information diffusion have gained popularity. While these methods are all focused on determining only the next set of users, they are still important to measure the diffusion performance of RETINA. TopoLSTM [26] : It is one of the initial works to consider recurrent models in generating the next user prediction probabilities. The model converts the cascades into dynamic DAGs (capturing the temporal signals via node ordering). The senderreceiver based RNN model captures a combination of active node's static score (based on the history of the cascade), and a dynamic score (capturing future propagation tendencies).

FOREST [27] : It aims to be a unified model, performing the microscopic and the macroscopic cascade predictions combining reinforcement learning (for macroscopic) with the recurrent model (for microscopic). By considering the complete global graph, it performs graph sampling to obtain the structural context of a node as an aggregate of the structural context of its one or two hops neighbors. In addition, it factors the temporal information via the last m seen nodes in the cascade.

HIDAN [28] : It does not explicitly consider a global graph as input. Any information loss due to the absence of a global graph is substituted by temporal information utilized in the form of ordered time difference of node infection. Since HIDAN does not employ a global graph, like TopoLSTM, it too uses the set of all seen nodes in the cascade as candidate nodes for prediction.

We exercise extensive feature ablation to examine the relative importance of different feature sets. Among the six different algorithms we implement for this task, along with different sampling and feature reduction methods, we choose the best performing model for this ablation study. Following Eq. 1, we remove the feature sets representing H i,t , S ex , S en , and T (see Section IV for corresponding features) in each trial and evaluate the performance.

To investigate the effectiveness of the exogenous attention mechanism for predicting potential retweeters, we remove this component and experiment on static as well as the dynamic setting of RETINA.

Evaluation of classification models on highly imbalanced data needs careful precautions to avoid classification bias. We use multiple evaluation metrics for both the tasks: macro averaged F1 score (macro-F1), area under the receiver operating characteristics (AUC), and binary accuracy (ACC). As the neural baselines tackle the problem of retweet prediction as a ranking task, we improvise the evaluation of RETINA to make it comparable with these baselines. We rank the predicted probability scores (P ui and P ui j in static and dynamic settings, respectively) and compute mean average precision at topk positions (MAP@k) and binary hits at top-k positions (HITS@k). Table IV presents the performances of all the models to predict the probability of a given user posting a hateful tweet using a given hashtag. It is evident from the results that, all six models suffer from the sharp bias in data; without any classspecific sampling, they tend to lean towards the dominant class (non-hate in this case) and result in a low macro-F1 and AUC compared to very high binary accuracy. SVM with rbf-kernel outperforms the rest when no upsampling or downsampling is done, with a macro-F1 of 0.55 (AUC 0.61).

Effects of sampling. Downsampling the dominant classes result in a substantial leap in the performance of all the models. The effect is almost uniform over all the classifiers except XGBoost. In terms of macro-F1, Decision Tree sets the best performance altogether for this task as 0.65. However, the rest of the models lie in a very close range of 0.62-0.64 macro-F1. While the downsampling performance gains are explicitly evident, the effects of upsampling the dominated class are less intuitive. For all the models, upsampling deteriorates macro-F1 by a large extent, with values in the range 0.44-0.47. However, the AUC scores improve by a significant margin for all the models with upsampling except Decision Tree. AdaBoost achieves the highest AUC of 0.68 with upsampling.

Dimensionality reduction of feature space. Our experiments with PCA and K-best feature selection by mutual information show a heterogeneous effect on different models. While the only SVM with linear kernel shows some improvement with PCA over the original feature set, the rest of the models observe considerable degradation of macro-F1. However, SVM with rbf kernel achieves the best AUC of 0.68 with PCA. With top-K best features, the overall gain in performance is not much significant except Decision Tree.

We also experiment with combinations of different sampling and feature reduction methods, but none of them achieve a significant gain in performance.

Ablation analysis. We choose Decision Tree with downsampling of dominant class as our best performing model (in terms of macro-F1 score) and perform ablation analysis. Table V presents the performance of the model with each feature group removed in isolation, along with the full model. Evidently, for predicting hate generation, features representing exogenous signals and user activity history are most important. Removal of the feature vector signifying trending hashtags, which represent the endogenous signal in our case, also worsens the performance to a significant degree.

Table VI summarizes the performances of the competing models for the retweet prediction task. Here again, binary accuracy presents a very skewed picture of the performance due to class imbalance. While RETINA in dynamic setting outperforms the rest of the models by a significant margin for all the evaluation metrics, TopoLSTM emerges as the best baseline in terms of both MAP@20 and HITS@20.

In Figure 5 , we compare RETINA in static and dynamic setting with TopoLSTM in terms of HITS@k for different values of k. For smaller values of k, RETINA largely outperforms TopoLSTM, in both dynamic and static setting. However, with increasing k-values, the three models converge to very similar performances. Figure 6 provides an important insight regarding the retweet diffusion modeling power of our proposed framework RETINA. Our best performing baseline, TopoLSTM largely fails to capture the different diffusion dynamics of hate speech in contrast to non-hate (MAP@20 0.59 for non-hate vs. 0.43 for hate). On the other hand, RETINA achieves MAP@20 scores 0.80 and 0.74 in dynamic (0.54 and 0.56 in static) settings to predict the retweet dynamics for hate and non-hate contents, respectively. One can readily infer that our wellcurated feature design by incorporating hate signals along with the endogenous, exogenous, and topic-oriented influences empowers RETINA with this superior expressive power.

Among the traditional baselines, Logistic Regression gives comparable Macro F1-score to the best static model; however, owing to memory limitations it could not be trained on news set larger than 15 per tweet. Similarly, SVM based models could not incorporate even 15 news items per tweet (memory limitation). Meanwhile, an ablation on news size gave best results at 60 for both static and dynamic models.

We find that the contribution of the exogenous signal(i.e the news items) plays a vital role in retweet prediction, much similar to our findings in Table V for predicting hate generation. With the exogenous attention component removed in static as well as dynamic settings (RETINA-S † and RETINA-D † , respectively, in Table VI) , performance drops by a significant margin. However, the performance drop is more significant in RETINA-D † for ranking users according to retweet probability (MAP@k and HITS@k). The impact of exogenous signals on Macro-F1 is more visible in the traditional models.

To observe the performance of RETINA more closely in the dynamic setting, we analyse its performance over successive prediction intervals. Figure 8 shows the ratio between the predicted and the actual number of retweets arrived at different intervals. As clearly evident, the model tends to be nearly perfect in predicting new growth with increasing time. High error rate at the initial stage is possibly due to the fact that the retweet dynamics remains uncertain at first and becomes more predictable as increasing number of people participate over time. A similar trend is observed when we compare the performance of RETINA in static setting with varying size of actual retweet cascades. Figure 9 shows that RETINA-S performs better with increasing size of the cascade.

In addition, we also vary the number of tweets posted by a user. Figure 7 shows that the performance of RETINA in both static and dynamic settings increases by varying history size from 10 to 30 tweets. Afterward, it either drops or remains the same.

Our attempt to model the genesis and propagation of hate on Twitter brings forth various limitations posed by the problem itself as well as our modeling approaches. We explicitly cover such areas to facilitate the grounds of future developments. 

We have considered the propagation of hateful behavior via retweet cascades only. In practice, there are multiple other forms of diffusion present, and retweet only constitutes a subset of the full spectrum. Users susceptible to hateful information often propagate those via new tweets. Hateful tweets are often counteracted with hate speech via reply cascades. Even if not retweeted, replied, or immediately influencing the generation of newer tweets, a specific hateful tweet can readily set the audience into a hateful state, which may later develop repercussions. Identification of such influences would need intricate methods of Natural Language Processing techniques, adaptable to the noisy nature of Twitter data.

As already discussed, online hate speech is vastly dynamic in nature, making it difficult to identify. Depending on the topic, time, cultural demography, target group, etc., the signals of hate speech change. Thus models like RETINA which explicitly uses hate-based features to predict the popularity, need updated signaling strategy. However, this drawback is only evident if one intends to perceive such endeavors as a simple task of retweet prediction only. We, on the other hand, focus on the retweet dynamics of hateful vs. non-hateful contents which presumes the signals of hateful behavior to be well-defined.

The majority of the existing studies on online hate speech focused on hate speech detection, with a very few seeking to analyze the diffusion dynamics of hate on large-scale information networks. We bring forth the very first attempt to predict the initiation and spread of hate speech on Twitter. Analyzing a large Twitter dataset that we crawled and manually annotated for hate speech, we identified multiple key factors (exogenous information, topic-affinity of the user, etc.) that govern the dissemination of hate.

Based on the empirical observations, we developed multiple supervised models powered by rich feature representation to predict the probability of any given user tweeting something hateful. We proposed RETINA, a neural framework exploiting extra-Twitter information (in terms of news) with attention mechanism for predicting potential retweeters for any given tweet. Comparison with multiple state-of-the-art models for retweeter prediction revealed the superiority of RETINA in general as well as for predicting the spread of hateful content in particular.

With specific focus of our work being the generation and diffusion of hateful content, our proposed models rely on some general textual/network-based features as well as features signaling hate speech. A possible future work can be to replace hate speech with any other targeted phenomenon like fraudulent, abusive behavior, or specific categories of hate speech. However, these hate signals require a manual intervention when updating the lexicons or adding tropical hate tweets to retrain the hate detection model. While the features of the end-to-end model appear to be highly engineered, individual modules take care of respective preprocessing.

In this study, the mode of hate speech spread we primarily focused on is via retweeting, and therefore we restrict ourselves within textual hate. However, spreading hateful contents packaged by an image, a meme, or some invented slang are some new normal of this age and leave the space for future studies.

Report of the independent international factfinding mission on myanmar

Fanning the flames of hate: Social media and hate crime

A survey on automatic detection of hate speech in text

Measuring the reliability of hate speech annotations: The case of the european refugee crisis

Spread of hate speech in online social media

Deep exogenous and endogenous influence combination for social chatter intensity prediction

Information diffusion and external influence in networks

Hateful symbols or hateful people? predictive features for hate speech detection on twitter

Automated hate speech detection and the problem of offensive language

Deep learning for hate speech detection in tweets

A hierarchically-labeled Portuguese hate speech dataset

ARHNet -leveraging community interaction for detection of religious hate speech in Arabic

The effects of user features on twitter hate speech detection

Handling imbalance issue in hate speech classification using sampling-based methods

On analyzing antisocial behaviors amid covid-19 pandemic

Racism is a virus: Anti-asian hate and counterhate in social media during the covid-19 crisis

Mind your language: Abuse and offense detection for code-switched languages

Chassis: Conformity meets online information diffusion

Containing papers of a mathematical and physical character

Deepcas: An end-to-end predictor of information cascades

Deephawkes: Bridging the gap between prediction and understanding of information cascades

What makes an image popular?

Representation learning for information diffusion through social networks: An embedded cascade model

A novel embedding method for information diffusion prediction in social network big data

Neural diffusion model for microscopic cascade prediction

Topological recurrent neural network for diffusion prediction

Multi-scale information diffusion prediction with reinforced recurrent networks

Hierarchical diffusion attention network

A survey of information cascade analysis: Models, predictions and recent advances

Predicting user engagement on twitter with real-world events

Ccnet: Extracting high quality monolingual datasets from web crawl data

Event triggered social media chatter: A new modeling framework

Demarcating endogenous and exogenous opinion diffusion process on social networks

A deterministic model for gonorrhea in a nonhomogeneous population

Distributed representations of sentences and documents

Attention is all you need

news-please: A generic news crawler and extractor

Xgboost: A scalable tree boosting system

Adam: A method for stochastic optimization

Maximizing the spread of influence through a social network