key: cord-0578027-h4n921dz
authors: Duzha, Armend; Casadei, Cristiano; Tosi, Michael; Celli, Fabio
title: Hate versus Politics: Detection of Hate against Policy makers in Italian tweets
date: 2021-07-12
journal: nan
DOI: nan
sha: 8ced979c89e35f9ab0e9c64c83d4e49e859888bd
doc_id: 578027
cord_uid: h4n921dz

Accurate detection of hate speech against politicians, policy making and political ideas is crucial to maintain democracy and free speech. Unfortunately, the amount of labelled data necessary for training models to detect hate speech are limited and domain-dependent. In this paper, we address the issue of classification of hate speech against policy makers from Twitter in Italian, producing the first resource of this type in this language. We collected and annotated 1264 tweets, examined the cases of disagreements between annotators, and performed in-domain and cross-domain hate speech classifications with different features and algorithms. We achieved a performance of ROC AUC 0.83 and analyzed the most predictive attributes, also finding the different language features in the anti-policymakers and anti-immigration domains. Finally, we visualized networks of hashtags to capture the topics used in hateful and normal tweets.

The rise of Natural Language Processing (NLP) tasks focused on hate speech [2] and the analysis of online debates [7] have both highlight bad behaviors in social media, such as offensive language against vulnerable groups (e.g., immigrants, minorities, etc.) [32] , as well as aggressive language against women [34] . An under-researched -yet important -area of investigation is anti-policy hate: the hate speech against politicians, policy making and laws at any level (national, regional and local). While anti-policy hate speech has been addressed in Arabic [21] , most European languages have been under-researched. In recent years, scientific research contributed to the automatic detection of hate speech from text with datasets annotated with hate labels, aggressiveness, offensiveness, and other related dimensions [37] . Scholars have presented systems for the detection of hate speech in social media focused on specific targets, such as immigrants [14] , and language domains, such as racism [25] , misogyny [18] or cyberbullying [28] . Each type of hate speech has its own vocabulary and its own dynamics, thus the selection of a specific domain is crucial to obtain clean data and to restrict the scope of experiments and learning tasks.

We have formulated three Research Questions:

-RQ1: How different are hate speech domains, such as anti-immigrants and anti-policy? -RQ2: Is it possible to perform cross-domain training to exploit techniques and models trained in one domain (i.e. anti-immigration) to detect hate speech in another domain (i.e. against policy-makers)? -RQ3: Is it possible to identify and track the topics of public debate involved/not involved in hate speech?

In order to address RQ1, we performed correlation and classification analysis.

The former was carried out to measure how different language features are related to hate speech in different domains, the latter to test the performance of classifiers in different domains. To address RQ2, we performed cross-domain classification and applied hate speech models trained in an anti-immigration domain to a policy-making domain. Finally, to address RQ3, we extracted the hashtags from tweets labelled as hateful and non-hateful, visualized the network of co-occurrences with a Hifan Hu graph [23] .

With this research, we aim to provide actionable insights for evidence-based decision-making [26] , as online hate is often a predictor of offline crime [43] . We selected Twitter as the source of data and Italian as the target language for two reasons: 1) There are datasets annotated with anti-immigrant hate speech labels in Italian, but no datasets annotated with anti policy making hate speech labels, 2) Italy has, at least since the elections in 2018, a large audience that pays attention to hyper-partisan sources on Twitter that are prone to produce and retweet messages of hate against policy making [20] . This paper contributes to the scientific research in NLP and hate speech detection in two ways. First: the production of a new corpus, annotated with hate speech labels, in an under-resourced language (Italian). Second: the classification of hate speech tweets against policy making, and its comparison to the classification of hate speech against immigrants. The paper is structured as follows: after a literature review (Section 2), we collect a stream of tweets in Italian using keywords (i.e., hashtags) related to laws and regulations (Section 3). We then train, test, and evaluate models for hate speech from existing resources, analyze the predictive power of each feature, visualize the results (Section 4), and draw conclusions (Section 5).

Hate speech is defined as any expression that is abusive, insulting, intimidating, harassing, and/or incites, supports and facilitates violence, hatred, or discrimination. It is directed against people (individuals or groups) on the basis of their race, ethnic origin, religion, gender, age, physical condition, disability, sexual orientation, political conviction, and so forth [16] . A recent study defined the relationships between hate speech and related concepts (see Figure  1 ), highlighting the fact that involved phenomena make hate speech especially hard to model, with the risk of creating data that is biased and making the models prone to overfitting. In addition to this, literature also reports cases of annotators' insensitivity to differences in dialects and offenses [39] that make annotation difficult. For these reasons, one of the largest challenges in the field of hate speech is to investigate architectures that are explainable, stable and well-performing across different languages and domains [31] . Another key issue Fig. 1 Relation between Hate Speech and related concepts. Source [31] .

is that many recent approaches based on word embeddings [9] , Deep Learning algorithms and BERT Pre-trained transformers [15] [41] [33] , are vulnerable to undesirable bias in training data, especially in the political domain [42] , and suffer from poor interpretability [27] . In other words, it can be difficult to understand how the systems based on Deep Learning techniques make their decisions about hateful/non-hateful messages. Moreover, the decisions taken by systems might be based on biased and unfair models. A method for explaining the decisions of transformer models is to look at the attention vectors [11] . Yet, studies show that learned attention weights are frequently uncorrelated with gradient-based measures of feature importance, thus, different attention distributions can nonetheless yield similar predictions [24] . In a context of policy making, the transparency of the decisions and the possibility to interpret the results should be considered a priority.

Despite there being many studies about hate speech in Natural Language Processing (NLP) against various targets, such as anti-immigrants, there are few works in the field of hate speech detection against politicians and policy making. Previous approaches to this task exploited transparent Machine Learning (ML) algorithms, such as Gaussian Naïve Bayes, Random Forests or Support Vector Machines (SVM), as well as Deep Learning algorithms, such as convolutional neural network (CNN), Multi-Layer Perceptrons (MLP), Recurrent Neural Networks (RNN) with long-short-term memory (LSTM) or bidirectional long-short-term memory (Bi-LSTM) on top of word embeddings extracted from the training set or pre-trained from other resources with transfer learning. These studies show that good results can be obtained with Bi-LSTM, MLP and SVM [21] .

Studies that provided useful datasets in the field of hate speech include SemEval 2019 who studies multilingual hate speech against immigrants and women in English and Spanish [4] . In Italian there are two main corpora, both about anti-immigrant hate: the Italian HS corpus [32] and HaSpeeDe-tw2018, the dataset released during the EVALITA campaign in 2018 [36] . The former is a collection of more than 5,700 tweets manually annotated with hate speech, aggressiveness, irony and other forms of potentially harassing communication. The latter, is a dataset (3,000 tweets for training and 1000 for testing) manually annotated with hate speech labels. The results of HaSpeeDe-tw2018, reported in Table 1 , are the state-of-the-art in hate speech detection in Italian and show that lexical resources, such as polarity and emotion lexica, are useful to this task [5] , [17] .

Most hate speech recognition systems at HaSpeeDe-tw2018 exploit SVM, Recurrent Neural Networks with LSTM or ensemble learning (meta) [3] , [29] , [13] , and word-embeddings as features [38] , pre-trained or extracted from the training set. Some systems also use cross-platform data (i.e. Facebook and Twitter) and shows that this strategy yields similar results for Twitter [12] . Crucially, the best performing systems make use of lexical resources for polarity, subjectivity and emotions [10] , showing that word embeddings are more effective when combined with lexical resources. The current state-of-the-art in the HaSpeeDe task from Twitter is 0.808 macro-F1, obtained using transformer-based models [35] . Regarding the visualization, the heuristic power of network graphs has been known in computational social sciences for at least one decade. For example, network graphs of topics or Twitter hashtags can be used to analyze sentiment polarization of hyper-partisan topics [19] . Another example, networks of replies annotated with personality types can represent the conversational dynamics of neurotics and emotionally stable users [8] .

In the next section, we describe how we created the dataset and annotated it with hate speech labels.

In order to monitor the reactions of society towards policy making, we retrieved a stream of tweets in Italian from March to May 2020, using snowball sampling. Starting from a set of seed hashtags, for instance: #dpcm (decree of the president of the council of ministers), #legge (law) and #leggedibilancio (budget law), we retrieved a sample of tweets and then added the new hashtags contained in this sample to extend the list of seed hashtags and retrieve new tweets. We called this dataset Policycorpus. We removed duplicates, retweets and tweets containing only hashtags and urls. In total we obtain a set of 1264 tweets (1000 for training and 264 for testing). The amount of hate labels in the Policycorpus is 11% (1124 normal and 140 hate tweets). It is strongly unbalanced, like in the it-HS corpus (17% of hate tweets), because it reflects the raw distribution of hate tweets in Twitter. The HaSpeeDe-tw corpus (32% of hate tweets) instead has a distribution that oversample hate tweets. At the end of the sampling process, the list of seeds included about 60 hashtags referring to -Laws, such as #decretorilancio (#relaunchdecree), #leggelettorale (#electorallaw), #decretosicurezza (#securitydecree) -Politicians and policy makers, such as #Salvini, #decretoSalvini (#Salvinidecree), #Renzi, #Meloni, #DraghiPremier -Political parties, such as #lega (#league), #pd (#Democratic Party) -Political tv shows, such as #ottoemezzo, #nonelarena, #noneladurso, #Piazzapulita -Topics of the public debate, such as #COVID, #precari (#precariousworkers), #sicurezza (#security), #giustizia (#justice), #ItalExit -Hyper-partisan slogans, such as #vergognaConte (#shameonConte), #contedimettiti (#ConteResign) or #noicontrosalvini (#WeareagainstSalvini) This is the first corpus in Italian annotated with hate speech against policy makers. We plan to make this resource available under request 1 .

To produce gold standard labels, we asked two Italian experts of communication, to manually label the tweets in the Policycorpus, distinguishing between hate and normal tweets according to the following guidelines: By definition, hate speech is any expression that is abusive, insulting, intimidating, harassing, and/or incites to violence, hatred, or discrimination. It is directed against people on the basis of their race, ethnic origin, religion, gender, age, physical condition, disability, sexual orientation, political conviction, and so forth. Translated Examples:

1) "a clear #NO to #Netherlands that we would like users of the #MES economic resources but in exchange for Italy's renunciation of its budgetary autonomy. To Netherlands we say: thank you and goodbye, WE ARE NOT INTERESTED !! " is normal because it does not contain hate, insults, intimidation, violence or discrimination.

2) "... There is a weekly catwalk of the #jackal #no #notAtAll! Listening to a Po #clown after a true PATRIOT a doctor from #Bergamo cannot be held, seen or heard. Giletti should stop inviting certain SLACKERS FROM THE PO VALLEY! #COVID-19 #NonelArena" contains hate speech, including insults like #clown and #jackal.

3) "I have my say ... #Draghi is a great economist but we don't need a #Monti-style economist ... We don't need another technical #government to obey the banking lobby! We need a political leader! We need a #ItalExit! We need the #Lira! #No to #DraghiPremier " is a normal case, despite the strong negative sentiment. It might be controversial for the presence of the term lobby, often used in abusive contexts, but in this case, it is not directed against people on the basis of their race, ethnic origin, religion, gender, age, physical condition, disability, sexual orientation or political conviction.

The Inter-Annotator Agreement is k =0.53. Although the score is not high, it is in line with the score reported in the literature for hate speech against immigrants (k =0.54) [32] and indicates that the detection of hate speech is a hard task for humans. All the examples of disagreement were discussed and an agreement was reached between the annotators. The cases of disagreements occurred more often when the sentiment of the tweet was negative, this was mainly due to:

The use of vulgar expressions not explicitly directed against specific people but generically against political choices.

The negative interpretation of hyper-partisan hashtags, such as #contedimettiti (#ConteResign) or #noicontrosalvini (#WeareagainstSalvini), in tweets without explicit insults or abusive language.

The substitution of explicit insults with derogatory words, such as the word "circus" instead of "clowns". In the next section, we report and discuss the results of the experiments.

Our goal is to create models of hate speech that automatically predict hateful tweets against policy makers in the Policycorpus. First, we describe the features extracted from text, then we perform in-domain and cross-domain classification, and finally, we conduct feature analysis and visualize the hashtag networks. As discussed in Section 2, we aim to develop explainable Artificial Intelligence (AI) models, hence we also exploited ML algorithms based on lexical resource (Lex), such as SVM, Adaboost and Random Forests, in addition to more advanced techniques, for instance, neural networks based on the AlBERTo pretrained transformer model. We ran two different experiments:

-In experiment one, we tried to answer to RQ2, using different algorithms to train models on the existing corpora. We then perform a cross-domain classification, evaluating the predictions trained on HaSpeeDe-tw and it-HS to the Policycorpus test set (Section 4.2); -In experiment two, we tried to answer to RQ1, with a feature analysis to understand which features are best predictors of hate speech in the policy making domain with respect to anti-immigration domain (Section 4.3);

Finally, to answer RQ3, we visualized the networks of hashtags in order to understand the relationships between topics used in normal and hateful tweets (Section 4.4). Before all, we described the features extracted from text.

Building upon the previous work presented in the literature, we adopted linguistic resources for the extraction of features to use with ML algorithms. In particular, we used:

-LIWC [40] , a linguistic resource available in many languages, including Italian [1] , that maps words to 68 psycholinguistics dimensions, such as linguistic dimensions (i.e. pronouns, articles, tense), psychological processes (i.e cognitive mechanisms, sensations, certainty, causation) human processes (i.e. sex, social life, family), personal concerns (i.e. leisure, money, religion, death) and spoken categories (i.e. assent, nonfluencies) -NRC [30] , a linguistic resource that maps words to 10 emotion and polarity features, for instance positive words, negative words, anger, anticipation, fear, sadness, joy, surprise, trust and disgust. -Other 22 language-independent stylometric features [6] , including positive/negative emoticons/emojis, ratio of punctuation, question and expression marks, numbers, operators, links, hashtags, mentions or emails addresses, parentheses, lowercase/uppercase and ratio of repeated bigrams.

These dictionaries extract a matrix of 100 features, less sparse than bag-ofwords. In addition to this, we used a transformer model trained on Italian tweets: AlBERTo, that extracts a dense matrix of more than 700 embedding features [33] .

Hate speech labels are naturally unbalanced, as normal tweets are -fortunately -the large majority, especially in the Policycorpus and it-HS corpus. As this is a natural condition, we chose to keep the labels unbalanced and measure the performances with two metrics: ROC AUC curve, which is insensitive to class imbalance, and weighted-average F-measure that takes into account the difference of performance for the two classes. In this experiment, we trained and tested various algorithms, we used a training-test split as evaluation settings, which is 88%-12% in the it-HS corpus, 75%-25% in HaSpeeDe-tw2018 and 80%-20% in the Policycorpus. Table 3 Per-class results of the classification on each corpus with the best algorithm (Al-BERTo + Neural Networks).

A closer look at the per-class performance obtained with the best algorithm (AlBERTo + neural networks), reveals that in general the algorithm has a higher performance in the detection of normal tweets and lower performance in the recognition of hate tweets, which have a poor recall. The fact that recall is higher in the HaSpeeDe-tw corpus than in the Policycorpus suggests that balancing the number of hate examples with the normal ones has a positive effect on recall. Precision is similar in these two datasets (0.75): the it-HS corpus has a higher precision on the hate class, but the recall follows the same pattern of the other two corpora. We present these results in Table 3 .

In attempt to address RQ2, we used the models trained on the HaSpeeDetw and it-HS corpora in the previous experiment to automatically produce predictions on the Policycorpus test set, thus performing a cross-domain backtest. Given the differences between domains we expect poor results in the next experiment, the results of which are presented in Table 4 results of cross-domain classification show that the domain shift had a huge impact on the performance of the classifiers, particularly from HaSpeeDe-tw to Policycorpus, where the results measured with weighted-average F1 are below the majority baseline, suggesting that the features are so different that the model cannot use them in the correct way. Surprisingly, the models trained on the it-HS corpus produced good results, but only the ones trained with ML algorithms, particularly random forests and adaboost, that are more capable of using weak features. AlBERTo and Neural Networks in this case performed only slightly better than the majority baseline. We believe that the large training size of it-HS corpus had a positive effect for the cross-domain adaptation.

The cross-domain classification highlighted the difference in the features between the corpora. To measure this difference, and answer RQ1, we computed the Pearson correlation between the lexical features and the hate speech scores. In Table 5 we present the best lexical features correlated to hate speech in each dataset. Positive correlation indicates the best features to classify hate messages and negative correlations indicate the best feature to classify normal messages. All these features were used in the classification experiments. The analysis revealed that Stylometric features, such as the ratio of lowercase and uppercase characters, have a strong predictive power in the HaSpeeDe-tw2018 dataset, but not in the it-HS corpus, where there is more variety. LIWC features, such as sexual, anger and swear word ratios, are among the best predictors of hate speech against politicians. This experiment clearly shows that the most useful features for the detection of hate speech in the domain of anti-immigration are punctuation (the more there is punctuation, the more a message is non-hateful) and expression marks (the more exclamations, the more a message is likely to be hateful). In Policycorpus there are sexual and swear words as markers of hateful messages and lower case, numbers and positive emotions as markers of non-hateful messages. It is interesting to note that lower case letters are correlated to hate speech in the anti-immigration domain, while in the anti-policy domain they are correlated to non-hateful messages. The similarity between the best features in it-HS and Policycorpus explained the good result obtained in the cross-domain classification with ML algorithms. We also exploited the attention vectors of AlBERTo to try to explain the poor performance in the cross-domain classification. Using the average activations of each token in the attention vectors, we computed the strongest predictors in the model. The results, represented as wordclouds in Figure 2 , show the most frequent tokens activated to detect hate and normal labels for each corpus. The clear difference from the tokens used in the anti-immigrant and anti-policy domains is a clue of the poor performance in cross-domain classification. 

To address RQ3, we treated the 'normal' and 'hate' classes as nodes in a network that we plotted with Yifan Hu trees [23] . In this way we were able to visualize the network of hashtags connected to the 'normal' or 'hate' nodes in the Policycorpus. In other words, we were able to visualize hashtags appearing only in hate speech context, hashtags appearing only in normal contexts, and hashtags appearing in both networks. The results, depicted in Figure 3 , show a pattern with a blue cloud (above), that represents the network of hashtags in normal tweets and a red cloud (below), that represents the hashtags in hate messages. Between the two, there is a smaller cloud of hashtags used in both contexts. A closer look to these hashtags reveals the topics of the public debate that are more controversial.

These topics include politicians (#Salvini, #Meloni, #Conte, #Draghi), economic issues (#lira, #MES), keywords related to the pandemic (#covid, #pandemia, #mascherine) and to political tv shows (#nonelarena). Fig. 3 Detailed visualization of the hashtag network of a small portion of the Policycorpus with Yifan Hu trees. The blue cloud (above) contains hashtags of normal messages, the red cloud (below) contains the hashtags from hate tweets and the smaller cloud between the two contains the hashtags in both messages.

In this paper, we presented a new resource for the analysis of hate speech against policy makers on Twitter. The dataset, named Policycorpus, is the first of this type in Italian, an under-resourced language. We confirmed that the annotation of hate speech is difficult, and detailed the cases of disagreements between annotators. Using this resource, we demonstrated that:

-Deep Learning algorithms and transformer-based models achieve state-ofthe-art performances in both domains. -Machine Learning algorithms are suitable for cross-domain classification from hate speech against immigrants to hate speech against policy makers.

-Hate speech against immigrants can be detected by looking at the style of the written text (i.e. punctuation and exclamation), while hate speech towards policy makers is based more on the vocabulary and psycholinguistic aspects (i.e. swear words).

We also visualized the spread of hate speech in Twitter against policy makers [22] and identified clusters of tweets that appear only in hate tweets and in both normal and hate tweets. We suggest that this method can be exploited to track which topics convey hatred towards policy-makers. Combining hate speech detection algorithms and visualizations, one can build a dashboard for monitoring hate speech on Twitter. The final aspect that we want to highlight is that the amount of data available, and its balance between classes, can help to improve the performance of the classifiers. In the future we plan to run experiments on domain-adaptation and collect more data for hate speech detection against policy makers.

Three of authors of this paper declare to be employed by Maggioli S.p.A., a private Company with a financial interest in the field of Public Administration. They also declare their involvement in the PolicyCLOUD project, which has received funding from the European Union's Horizon 2020 research and innovation programme (Grant Agreement no 870675), and has financial interests in the subject and materials discussed in this manuscript.

Data Availability

The italian liwc2001 dictionary

Deep learning for hate speech detection in tweets

Rug@ evalita 2018: Hate speech detection in italian social media

Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter

Overview of the evalita 2018 hate speech detection task

Lato: Language-independent analysis tool

Corea: Italian news corpus with emotions and agreement

Long chains or stable communities? the role of emotional stability in twitter conversations

Multi-task learning in deep neural networks at evalita 2018

What does bert look at? an analysis of bert's attention

Crossplatform evaluation for italian hate speech detection

Hate speech detection using attention-based lstm

Hate me, hate me not: Hate speech detection on facebook

Pretraining of deep bidirectional transformers for language understanding

you don't understand, this is a new war!" analysis of hate speech in news web sites' comments

In Evaluation Campaign of Natural Language Processing and Speech Tools for Italian

Online hate speech against women: Automatic identification of misogyny and sexism on twitter

A long-term analysis of polarization on twitter

Multi-party media partisanship attention score. estimating partisan attention of news media sources using twitter data in the lead-up to 2018 italian election

Hanene Maafi, and Thinhinane Hamitouche. Detecting hate speech against politicians in arabic community on social media

Open data visualizations and analytics as tools for policy-making

Visualizing large graphs

Attention is not explanation

Locate the hate: Detecting tweets against blacks

Analytics as a service facilitating efficient data-driven public policy management

Hate speech detection: Challenges and solutions

A system to monitor cyberbullying based on message classification and social network analysis

Comparing different supervised approaches to hate speech detection

Nrc emotion lexicon

Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation

Hate speech annotation: Analysis of an italian twitter corpus

Alberto: Italian bert language understanding model for nlp challenging tasks based on tweets

Hateminers: detecting hate speech against women

Haspeede 2@ evalita2020: Overview of the evalita 2020 hate speech detection task

Overview of the evalita 2020 second hate speech detection task (haspeede 2)

An italian twitter corpus of hate speech against immigrants

Detecting hate speech for italian language in social media

The risk of racial bias in hate speech detection

The psychological meaning of words: Liwc and computerized text analysis methods

Bert rediscovers the classical nlp pipeline

Impact of politically biased data on hate speech classification

Hate in the machine: anti-black and anti-muslim social media posts as predictors of offline racially and religiously aggravated crime

The research leading to the results presented in this paper has received funding from the PolicyCLOUD project, supported by the European Union's Horizon 2020 research and innovation programme under Grant Agreement no 870675. The authors plan to make Policycorpus, the dataset presented in this manuscript, available upon request, under the conditions set by the PolicyCLOUD project. For more information, please get in touch with fabio.celli@maggioli.it.