key: cord-0289273-7fwovv5d authors: Hristakieva, Kristina; Cresci, Stefano; Martino, Giovanni Da San; Conti, Mauro; Nakov, Preslav title: The Spread of Propaganda by Coordinated Communities on Social Media date: 2021-09-27 journal: nan DOI: 10.1145/3501247.3531543 sha: a905bcdc2ee6338703d96ad685bb749809d8d854 doc_id: 289273 cord_uid: 7fwovv5d Large-scale manipulations on social media have two important characteristics: (i) use of propaganda to influence others, and (ii) adoption of coordinated behavior to spread it and to amplify its impact. Despite the connection between them, these two characteristics have so far been considered in isolation. Here we aim to bridge this gap. In particular, we analyze the spread of propaganda and its interplay with coordinated behavior on a large Twitter dataset about the 2019 UK general election. We first propose and evaluate several metrics for measuring the use of propaganda on Twitter. Then, we investigate the use of propaganda by different coordinated communities that participated in the online debate. The combination of the use of propaganda and coordinated behavior allows us to uncover the authenticity and harmfulness of the different communities. Finally, we compare our measures of propaganda and coordination with automation (i.e., bot) scores and Twitter suspensions, revealing interesting trends. From a theoretical viewpoint, we introduce a methodology for analyzing several important dimensions of online behavior that are seldom conjointly considered. From a practical viewpoint, we provide new insights into authentic and inauthentic online activities during the 2019 UK general election. Social media currently represent one of the main channels for information spread and consumption. They are increasingly used by a constantly growing part of the population to maintain active social relationships, to stay informed about socially relevant issues, and to produce content, thus giving voice to the crowds. At the same time, a large portion of online information is biased, misleading, or outright fake [15, 45] . Moreover, harmful content can be purposefully shared by malicious actors, and even by unaware users, with the aim to manipulate online audiences, to sow doubt and discord, and to increase polarization [9, 41] . The unprecedented importance of social media for information diffusion, combined with their vulnerability to organized misbehavior, sets the stage for online manipulations that can cause tremendous societal repercussions, as witnessed during the US Capitol Hill assault in January 2021 [6, 30] , and with the rampaging COVID-19 vaccine misinformation [1, 19] . Despite the differences between the broad array of tactics used to carry out online manipulation, many social media campaigns share two fundamental characteristics: (i) they use propaganda to influence those targeted by the manipulation [4] , and (ii) they adopt coordinated actions to amplify the spread and the outreach of the manipulation and to increase its impact [31, 42] . Given their importance for online manipulation, each of these characteristics has received scholarly attention. For example, computational linguists developed AI solutions to automatically detect the use of propaganda techniques [11, 13, 16] . Similarly, network science frameworks were proposed for detecting coordinated groups of users [33] and for measuring the extent of coordination [31] . Despite recent progress, the study of computational propaganda and coordinated behavior is still in its early stages. 1 As such, and in spite of the interrelationship between propaganda and coordinated behavior, so far, these two aspects have been investigated in isolation. Nonetheless, their combined analysis is promising under multiple viewpoints: • From the propaganda viewpoint, there already exist methods for detecting the use of rhetorical techniques to influence others [13] . However, there have been no studies to detect the intent to harm behind propaganda campaigns [11] . Notably, coordination between users implies a shared intent. Thus, adding coordination information to the analysis of propaganda can contribute to bridging this gap. • From the coordination viewpoint, there already exist methods for detecting coordinated users in social media [31, 33, 47] . However, distinguishing between harmless (e.g., activists, fandoms) and harmful (e.g., botnets, trolls) coordination is still an open challenge [44] . Propaganda implies the aim to mislead and to manipulate. Thus, adding information about propaganda to the analysis of coordination can help detect harmful behavior. Our aim is to combine techniques for the analysis of propaganda and online coordination to draw nuanced insights into (i) the spread of propaganda online, (ii) the behavior of coordinated communities, and (iii) the interplay between propaganda and coordination. Contributions. We analyze the -so far unexplored -interplay between propaganda and coordination in the context of societal online debates. Towards this goal, we adopt a methodological approach grounded on state-of-the-art techniques for detecting propaganda in texts and for measuring coordinated behavior on social media, and we apply it to study a recent and relevant online debate on Twitter, about the 2019 UK general election. In particular, we propose and experiment with several measures for quantifying the spread of propaganda by social media users and communities. We further carry out network analysis of coordinated online communities that participated in the electoral debate. Next, we combine our results on propaganda and coordination by comparing the spread of propaganda to the activity of coordinated communities. We also compare our results to clear signs of inauthenticity and harmfulness -namely, bot scores provided by Botometer [37] and Twitter suspensions. Our analysis provides more nuanced results compared to existing work, and it surfaces interesting patterns in the behavior of online communities that would not be visible otherwise. For instance, it allows to clearly identify and to differentiate communities that exhibit opposite behaviors, such as (i) a malicious politicallyoriented community, characterized by strongly coordinated users that are involved in spreading propaganda, and (ii) a grassroots community of activists protesting for women's rights. Our main contributions can be summarized as follows: • We explore the interplay between propaganda and coordination in online debates, which so far has received little attention. • By cross-checking propaganda and coordination, we get better insights into malicious behavior, thus moving in the direction of identifying and studying coordinated inauthentic behaviors (CIB) as well as propaganda campaigns. • Regarding malicious behavior, we draw insights into the interplay between propaganda/coordination and automation/suspensions. • From a practical standpoint, our analysis reveals interesting and nuanced characteristics of several online communities, which were previously unknown. Significance. Our proposed methodology and results contribute to improving our understanding of coordinated harmful behavior. Moreover, insights such as those obtained thanks to our analysis can support platform administrators at enforcing targeted moderation interventions for curbing online harms [26, 43] . Propaganda and coordinated behavior pose peculiar challenges that mandate different methods, such as natural language understanding for the former, and network science for the latter. Thus, they have been the focus of largely disjoint efforts by different communities. Coordinated behavior, be it authentic or not, was introduced as a concept by Facebook in 2018 2 and later widely adopted in studies on online manipulation. Given its recency, the computational analysis of coordinated behavior poses several challenges. Some are conceptual: What exactly is coordinated behavior? How many accounts, or how much coordination, is needed for meaningful coordinated behavior to surface? Currently, there are no agreed-upon answers, 3 which makes computational analysis problematic. In fact, many solutions still require a great deal of manual intervention [29, 41] . In the few recent computational frameworks, coordination was defined as an exceptional similarity between a number of users. Nizzoli et al. [31] proposed a state-of-the-art pipeline organized in six analytical steps, starting with (i) selection of a set of users to investigate, (ii) selection of a measure for the similarity between users, (iii) construction of a weighted user-similarity network, (iv) network filtering, (v) coordination-aware community detection, and finally, (vi) analysis of the discovered coordinated communities. This approach is the only one so far that has proven capable of producing fine-grained estimates of the extent of coordination in the continuous [0, 1] range, rather than a binary {0, 1} classification of coordinated vs. non-coordinated communities. Examples of methods of the latter type include [28, 32, 33, 40, 47] . Similarly to [31] , Pacheco et al. [33] built a weighted user-similarity network. Then, they discarded all edges whose weight is below a threshold, and clustered the remaining network to discover coordinated communities. The drawback of this method, and similar ones [21] , is the need to specify arbitrary thresholds to distinguish between coordinated and non-coordinated behavior, thus providing a binary classification of a non-binary and nuanced phenomenon. Moreover, additional arbitrariness might also arise from the network projection and filtering steps, whose choice can significantly affect coordination results [8] . Other methods do not embed a notion of coordination, but rather propose to apply community detection to weighted user-similarity networks, thus leaving the task of investigating coordinated communities for subsequent analysis [46] . Notably, in all previous work, coordination was detected or measured independently of harmfulness or authenticity. In fact, coordination does not necessarily imply malicious activities: think of online fandoms, or other grassroots initiatives, which are examples of coordinated harmless and authentic behavior. Vargas et al. [44] evaluated the capabilities of existing systems to distinguish between harmful and harmless coordination, finding unsatisfactory results and highlighting the difficulty of this task. To this end, our results show that the combined analysis of coordination and propaganda allows to draw insights into the harmfulness and the authenticity of online behavior, thus contributing to bridging this scientific gap. Work on propaganda detection has focused on analyzing textual documents [3, 13, 35] . See [11] for a recent survey on computational propaganda detection. Rashkin et al. [35] developed a corpus with document-level annotations with four classes (trusted, satire, hoax, and propaganda), labeled using distant supervision: all articles from a given news outlet were assigned the label of that outlet. The news articles were collected from the English Gigaword corpus, which covers reliable news sources, as well as from seven unreliable news sources including two propagandistic ones. They trained a model using word -grams, and found that it performed well only on articles from sources that the system was trained on, and that the performance degraded quite substantially when evaluated on articles from unseen news sources. Barrón-Cedeno et al. [3] developed a corpus with two labels (i.e., propaganda vs. non-propaganda) and further investigated writing style and readability level. Their findings confirmed that using distant supervision, in conjunction with rich representations, might encourage the model to predict the source of the article, rather than to discriminate propaganda from non-propaganda. The studies by Habernal et al. [22, 23] also proposed a corpus with 1.3k arguments annotated with five fallacies that directly relate to propaganda techniques, including ad hominem, red herring, and irrelevant authority. A more fine-grained propaganda analysis was done by Da San Martino et al. [13] , who developed a corpus of news articles annotated with the spans of use of eighteen propaganda techniques. They asked to predict the spans of use of propaganda, as well as the specific technique being used, and they further tackled a sentence-level propaganda detection task. They proposed a multi-granular gated deep neural network that captures signals from the sentence-level to improve the performance of the fragment-level classifier and vice versa. Subsequently, an online demo Prta was made publicly available [12] , and there were several shared tasks [10, 17] . A limitation of this body of work lies in the lack of methods and tools for uncovering orchestrated propaganda campaigns rather than for detecting individual posts or articles that make use of propaganda. Below, we show that our analysis of coordinated behavior contributes to reaching this goal. The starting point for our study is the dataset from [31] . It contains 11,264,820 tweets about the 2019 UK general election, published by 1,179,659 distinct users. The tweets were collected between November 12, 2019 and December 12, 2019 (i.e., the election day) using the Twitter Streaming API. In particular, the dataset contains all tweets that use at least one of the election-related hashtags shown in Table 1 . We can see in the table that the hashtags used for data collection include both partisan hashtags as well as neutral ones. The dataset further contains the tweets shared by the two main parties (labour and conservative) and their leaders, as well as the interactions (i.e., retweets and replies) with such tweets, as summarized in Table 2 . The final dataset for this study is the combination of the data shown in Tables 1 and 2 , and quoted retweets (not counted in the tables). The dataset from [31] is publicly available for research purposes. 4 In this work, we extend the above Twitter dataset by also collecting and analyzing the textual content of all the news articles shared during the online electoral debate. To collect data about articles, we first parse the 11M tweets, looking for URLs pointing to news outlets, blogs, or other news Web sites. Out of the entire Twitter dataset, we found 35,976 distinct articles from 3,974 Web sites, that were shared 329,482 times during the data collection period. Finally, we leverage the newspaper3k Python package 5 to collect the textual content, together with some metadata about each shared article. We leverage the textual content of the shared articles and tweets to detect the use of propaganda. We further measure the similarities in the user tweeting behaviors to measure coordination. Figure 1 shows our methodological approach to the analysis of coordinated harmful behavior, which has two main building blocks: (i) a method for measuring coordination, and (ii) a propaganda classifier. As shown in figure, coordination is measured based on user activities and interactions, and results in a coordination score assigned to each user and in the identification of coordinated communities. Instead, propaganda scores are computed from the textual content of tweets and news articles. This process assigns a propaganda score to each user. Finally, each coordinated community is analyzed in terms of the coordination and propaganda scores of its members. The methods for measuring coordination and propaganda are described below. To measure the extent of online coordination, we followed network analysis approaches that have been recently proposed in stateof-the-art studies [31, 33, 47] , which compute similarities between users and consider exceptional or unexpected similarities as a proxy for coordination. We specifically follow the approach in [31] , as it is the only one that produces a coordination score rather then a binary label. For the user selection step, we constrained our analysis to superspreaders, defined as the top 1% of the users who shared the most retweets. Despite being only 10,782, superspreaders shared 3.9M tweets, which is 39% of the tweets and 44% of the retweets in our dataset. Previous work has shown that focusing on superspreaders is particularly relevant [34] . We measured the similarity between superspreaders in terms of co-retweets, in order to highlight users who frequently reshare the same messages. For each superspreader, we computed a TF.IDF-weighted vector of the tweet IDs that he/she retweeted. Using TF.IDF-weighting discounts viral tweets by influencers and popular users, while emphasizing retweets of unpopular tweets. Then, we computed the similarity between all pairs of superspreaders as the cosine similarity between their corresponding vectors, thus obtaining a weighted undirected user-similarity network. We filtered the network by computing its multiscale backbone, which allows to retain only statistically significant network structures [39] . Then, we applied the well-known Louvain community detection algorithm to group users into network communities. Finally, we applied network dismantling, which assigns a coordination score to each user in the network. We carried out the latter step by iteratively removing network edges and nodes based on a moving edge weight threshold. At each iteration, we removed all edges whose raw weight was lower than a threshold, and such that ended up being disconnected from the largest connected component. The threshold increased at each iteration, until the network was completely dismantled, i.e., no more connected nodes remained. For each node, we assigned a coordination score as the threshold value that disconnected that node from the rest of the network. We normalized the coordination score in the [0, 1] range, with 1 indicating maximum coordination. In order to assess whether an input piece of text is propagandistic, we used Proppy, a state-of-the-art propaganda detection system that achieved an 1 score of 0.83 on a reference benchmark dataset, outperforming several rival approaches [3] . It uses a maximum entropy binary classifier with L 2 regularization to discriminate propagandistic vs. non-propagandistic texts. Proppy was trained on the QProp corpus, which includes 51k news articles from 94 propagandistic and from 10 non-propagandistic news sources. 6 articles tweets Proppy represents the input text as a set of features, including (i) TF.IDF-weighted -grams, (ii) frequency of specific words from a number of lexicons coming from Wiktionary, LIWC, Wilson's subjectives, Hyland hedges, and Hooper's assertives, (iii) writing style features such as TF.IDF-weighted character 3-grams, readability level and vocabulary richness (e.g., Flesch-Kincaid grade level, Flesch reading ease and the Gunning fog index), Type-Token Ratio (TTR), hapax legomena and dislegomena, and (iv) the NEws LAndscape (NELA) features. The latter category includes 130 contentbased features collected from the existing literature, which measure different aspects of a news article, comprising sentiment, bias, morality, and complexity [24] . The lexicon features are based on the analysis of the language of propaganda and trustworthy news, discussed in [35] . The encouraging performance achieved by Proppy on reference datasets, as well as its capacity to outperform competing approaches and systems, makes it a favorable system to adopt for propaganda analysis. Below, we first analyse coordinated communities, we discuss our results and some limitations. We then combine these initial results with the analysis of propaganda, and we show how our combined approach helps to overcome the limitations of previous work. The application of the method for investigating coordination to our dataset resulted in the user-similarity network shown in Figure 2 . The network is composed of seven communities of coordinated users, depicted with different colors in the figure and analytically described in Table 3 and in the rest of this subsection: LAB: A large community of labourists that supported the Labour party and its leader Jeremy Corbyn, as well as traditional Labour themes such as healthcare and climate change. CON: A large community of conservative users. In addition to supporting the party and its leader Boris Johnson, this community was also strongly in favor of Brexit. TVT: A large community that included several parties, e.g., liberal democrats, who teamed up with labourists against the conservative party, a strategy dubbed tactical voting in the 2019 UK election. SNP: A medium-sized community of supporters of the Scottish National Party (SNP). These users also supported Scottish independence from the UK and asked for a new independence referendum. B60: A small community of "Backto60" activitsts. Unlike the previous communities, these users did not represent a political party involved in the election. Instead, B60 users leveraged the electoral debate to protest against a state pension age equalisation law that unfairly affected 4M women born in the 1950s. 7 ASE: A small community of conservative users. Despite sharing the same political orientation, these users were separated from the users in the CON community because, rather than supporting the conservative party, they were mainly involved in attacking the labour party. An important narrative for ASE were antisemitism allegations targeted at labourists and Jeremy Corbyn throughout the electoral debate. 8 LCH: Another small community of activists. Similarly to B60, these users were not particularly interested in the electoral debate, but rather protested against a retrospective taxation called "loan charge" that forced certain people to return unsustainable amounts, and which also resulted in several suicides. 9 The communities that emerged from the analysis of our usersimilarity network are consistent with the 2019 UK political landscape and with the results of previous work [25, 38] . Each of these communities had different goals, featured different narratives, and showed diverse degrees of coordination. In particular, we found both large and small communities, as shown in Figure 2 . While the larger communities are related to the main political parties in UK that participated in the election (e.g., LAB, CON, TVT, and SNP), the smaller ones represent other highly coordinated users who share a common goal, such as protesting activists (e.g., B60 and LCH) and political antagonists (e.g., ASE). The analysis of Table 3 also reveals a few differences in the sharing behaviors of the different communities. Overall, larger communities seem to share less original articles and tweets, compared to the smaller groups. Moreover, previous analysis showed that these communities are characterized by a large negative assortativity, meaning that influential users are mostly connected to ordinary ones, and vice-versa [31] . These figures are indicative of top-down behaviors, where a small number of highly influential characters (e.g., the party leaders) drive the activities of the remaining members. In contrast, smaller communities seem to exhibit bottom-up behavior, characterized by grassroots activities and more content heterogeneity, as testified by the large percentage of original articles and tweets. The distribution of coordination scores for users of the different communities is shown in Figure 3 . Again, different behavior and characteristics emerge for the different communities, and particularly, for the smaller ones. For instance, while B60 features users with diverse degrees of coordination, as shown by a relatively wide boxplot, LCH and SNP are much more homogeneous. B60 is also the community with the lowest degree of average coordination, in contrast to LCH and, to a lower extent, to ASE, SNP, and CON. Discussion. Our results so far match the state-of-the-art in the analysis of coordinated behavior [31, 33, 46] . On the one hand, this approach allows us to obtain nuanced results in terms of coordinated communities. In fact, it allows us to detect several groups of coordinated users, both large and small, thus yielding more informative results compared to coarser analysis that only focused on the two main factions involved in an online debate (e.g., right vs. left, Democrats vs. Republicans, etc.), as done in [7, 20] . Moreover, it surfaces different patterns of coordination in the network (e.g., top-down vs. bottom-up). This approach to the analysis of coordinated behavior allows us to obtain nuanced and fine-grained results, typical of studies that require a great deal of offline, manual investigation [2, 41] , while still retaining the advantages of large-scale, automated analysis. On the other hand, these evaluation results do not give insights into the harmfulness of the respective coordinated communities. In other words, it is still not possible to clearly identify which communities (if any) of those shown in Figure 2 exploited coordination for tampering with the 2019 UK electoral debate on Twitter, and which instead represent neutral or well-intentioned coordinated users. Our subsequent analysis below contributes to answering this question. Our dataset contains two sources of textual content that can potentially convey propaganda: (i) articles and (ii) tweets. Thus, the first choice for computing propaganda scores is which items to analyze: articles or tweets. All propaganda detection systems so farincluding Proppy, the one we use in our analysis -were developed for the analysis of news articles [11] . However, from Table 3 , we notice that our dataset features, on average, less than one original news article per user and about 32 original tweets per user. Thus, basing our propaganda scores on articles would result in sparse and unreliable estimations. Moreover, the original tweets are authored by the users themselves, unlike news articles, which are just reshared. For these reasons, tweets arguably represent a more direct and reliable input for estimating a user's propaganda. Nonetheless, we computed propaganda scores based on both articles and tweets, and we subsequently compared and validated each of them. The outcome of this comparison and validation allowed us to identify a suitable propaganda score to use in the remainder of our study. For computing propaganda scores based on articles, we used Proppy with the same configuration proposed by its authors in [3] . For tweets, we made adjustments to account for the inherent differences between news articles and tweets. Specifically, several machine learning features used in the textual classifiers are influenced by document length, and tweets are obviously much shorter than news articles. Thus, we did not classify single tweets, but we grouped the original tweets (i.e., without retweets) by the same author into chunks whose length was comparable to that of the articles used to train Proppy. The grouping merged tweets in chronological order, but we did not apply any filtering based on their textual content (e.g., topic). We carried out a validation of our propaganda estimations by manually inspecting a subset of the tweets classified by Proppy, which revealed meaningful and satisfactory classifications, and supported our approach for detecting propagandistic vs. non-propagandistic tweets. Independently on the choice of analyzing articles or tweets, we obtained a propaganda score for user as follows: where Ψ is the user-level aggregation function, are all chunks of original tweets, or all distinct news articles, shared by , and ( ) are Proppy's classifications of such items. Finally, since we want to compare different communities, we aggregate the user scores for each community. We compute the propaganda score of the -th community as follows: where Φ is the community-level aggregation function. Different aggregation functions Ψ and Φ (e.g., mean, median, max, etc.) can be used to compute and , respectively. These, in addition to the choice of analyzing articles vs. tweets, result in many possible measures for computing propaganda scores. Table 4 lists some of the measures that we experimented with. In the next sections, we compare the informativeness of these measures, and we choose a suitable one for our further analysis. Table 4 . Independently on the measure , some communities consistently appear as more propagandistic than others. Specifically, TVT and ASE are among the communities that shared the most propaganda, while SNP, B60 and LCH shared the least. Moreover, the most informative measures from Combining coordination and propaganda. So far, our approach provided us with three pieces of information that we can combine: (i) communities, (ii) coordination scores, and multiple (iii) propaganda scores. By combining community labels with coordination and propaganda scores, we can study the trends of propaganda as a function of coordination for each community. Let ( ) be the coordination score of the -th user . Then, the propaganda score for community , as a function of coordination, is defined as In other words, Equation (1) defines how to compute a propaganda score at different coordination thresholds , for each community. Therefore, we can assess whether the most coordinated users in each community were also the most propagandistic ones. In turn, this provides valuable information for assessing the harmfulness (or lack thereof) of the different communities. Figure 4 shows examples of this analysis obtained by applying Equation (1) at different levels of coordination, for some of the propaganda measures shown in Table 4 . In this figure, we use different line types and transparencies to indicate the number of users in each community at different levels of coordination. In fact, each community has a different cardinality, as reported in Table 3 . Moreover, fewer users are considered when moving towards large coordination values, i.e., only the most coordinated ones. As an example, Figure 4a shows that, for each community, we always have more than 10 users (as well as the several hundreds of tweets that they shared), even at coordination of about 1. Moreover, we always have more than 50 users and thousands of tweets, when the coordination is 0.8 or lower. For propaganda scores derived from tweets, this ensures that the trends shown in the figure are not derived from a trivial number of tweets. Choosing a suitable propaganda measure. Below, we report the results of a qualitative comparison and a quantitative evaluation of the propaganda measures from Table 4 . A desirable characteristic of a propaganda measure is the capacity to distinguish propagandistic vs. non-propagandistic communities. That is, its capacity to highlight the differences between the several communities involved in the online electoral debate, with respect to the use of propaganda. We evaluated the informativeness ( ) of each measure based on the differences between the propaganda trends that it produces. We quantified the difference between the propaganda trends for two communities as the possible negative linear correlation between them. Thus, for any given measure, we computed the average of the Pearson's correlations between the propaganda trends ( , ) and ( , ) of each possible pair of communities and : Then, we computed the informativeness of a measure as = 1−2 . Intuitively, if a measure yields a positive correlation between propaganda community trends, then¯≈ 1 and ≈ 0. This means that such a measure is not able to diversify the behavior of the different communities, as reflected by the low informativeness. Conversely, if a measure yields a large negative correlation between propaganda community trends, then¯≈ −1 and ≈ 1, meaning that the measure can diversify the different communities. Notably, our approach is similar but favorable over other alternatives for measuring informativeness, such as those based on mutual information, since they require additional problematic steps for estimating unknown distributions, as discussed in [27] . The last column of Table 4 reports the informativeness of each propaganda measure, with 1 being the most informative one. As shown in the table, there are relatively small differences in the informativeness of the propaganda measures that we evaluated, which ranges between 0.42 and 0.55 on a 0 to 1 scale. This means that the majority of the measures yield comparable results, as is also visible by the qualitatively similar propaganda trends shown in Figures 4a and 4b . This suggests that changing the measure would not drastically alter the evaluation results. Nonetheless, the topmost measures in Table 4 are relatively better (i.e., more informative) at surfacing the differences between the investigated communities. Interestingly, the most informative measures in Table 4 are all based on analysis of tweets, which reinforces our initial hypothesis and the results of our manual validation of propaganda classifications of tweets. Following these preliminary results and without loss of generality, in our subsequent analysis we adopt 1 for measuring the propaganda scores. Results. Figure 4a shows interesting propaganda trends for some communities. First, LCH is characterized by the lowest degree of propaganda among all communities. Similarly, B60 shows a marked decreasing propaganda trend, implying that the core users of the community (i.e., the most coordinated ones) are not engaged in propaganda. Both these findings suggest harmless behavior. In other words, LCH appears to be highly coordinated, as shown in Figure 3 , but harmless. B60 features diverse degrees of coordination among its users, but is nonetheless harmless. On the contrary, other communities feature increasing propaganda trends: above all, TVT and ASE. For coordination of 0.5 and above, both show increasing levels of propaganda, which supports the hypothesis of harmful communities. For the remaining three communities (i.e., LAB, CON, and SNP) the coordination appears to be mostly unrelated to propaganda. Note that these results are overall robust to changing the measure used to compute the propaganda scores. The above qualitative findings are further confirmed by the quantitative results reported in Table 5a . In particular, our correlation analysis shows strong, positive, and statistically significant Pearson correlations between propaganda and coordination for the TVT and the ASE communities, with = 0.813 and = 0.742, respectively. Instead, B60 features a strong, negative, and statistically significant correlation = −0.899. The remaining results in Table 5 are not meaningful, either because of small correlations or due to low statistical significance (for LAB, SNP, and LCH), or because of limited variation of propaganda (for CON). Regarding the latter, Pearson correlation measures the strength of the linear relationship between two variables, but not the extent of variation for either one, which is relevant in our analysis. The last column of Table 5 accounts for this aspect by measuring the variation in propaganda as = ( , = 0.9) − ( , = 0), for each community . Despite featuring a marked negative and significant correlation, CON exhibits a very small variation in propaganda, with = −0.008, represented in Figure 4a by a mostly flat line. Discussion. Given the lack of ground truth on coordinated harmful vs. harmless behaviors, one way to qualitatively validate our analysis is by cross-checking our results with previous work and with the role of the communities in the electoral debate, as also done in previous work [31, 33] . Two communities emerged as authentic and harmless: LCH and B60. This means that, according to our proposed methodology, their activities are coordinated, but not malicious nor deceptive. In other words, they exhibit coordinated but authentic and harmless behavior. From previous work [31] and from our analysis of coordinated communities, we find that these are groups of activists protesting against unfair taxation (LCH) and in favor of women's rights (B60). Table 6 provides a detailed look at some of the tweets from B60's highly coordinated users, confirming that their focus was to promote their cause and to encourage women to exercise their right to vote. They further endorsed the Labour leader Jeremy Corbyn, who expressed support for their initiative. Hence, our methodology correctly highlighted these activists as harmless examples of grassroots coordination. In contrast, our analysis revealed that TVT and ASE featured characteristics related to harmful behaviors. Both are highly polarized communities with strong political motivations. Regarding TVT and its highly coordinated members, Table 6 shows that the majority of their tweets and shared articles are Lines may interrupt if no tweets were shared by coordinated users (coordination ≥ ) of a community for a given frame. WASPI woman puts Boris Johnson on spot about trust after he publicly pledged to try and sort out pension row during visit to Cheltenham Now that most party manifestos have been published, join @WASPI_Campaign today at and find your nearest local group at to find out about our #GE2019 Toolkit so you can speak up for #WASPI in your local area #WASPIwomenvote How to join WASPI Table 6 : Excerpt of the activity of strongly coordinated ( ≥ 0.9) members of TVT and B60. While the TVT users attack Boris Johnson and Brexit, the B60 users encourage women to vote and support the WASPI (Women Against State Pension Inequality) campaign. 10 Our method labeled all TVT tweets in the table as propagandistic, and all B60 tweets as non-propagandistic. politically themed. Most of the time they attack Boris Johnson and the conservatives. Similarly, at the opposite side of the political spectrum, ASE's peculiarity was that of repeatedly attacking the Labour party and its leader with allegations of antisemitism. Here, our methodology highlighted aggressive communities as harmful. Lastly, LAB, CON, and SNP appear as neither markedly harmless nor harmful. This is in line with the role of these communities, since they are large communities of moderate users [31] . We further inspected the framing of the articles shared by different communities. We used the frame inventory of the Media Frames Corpus [5] , and we performed automatic annotation of the frames using the Tanbih API. 11 Tanbih is a news aggregator platform with intelligent data analysis capabilities, including the possibility to analyze articles and news outlets based on their degree of factual reporting, propagandistic content, hyper-partisanship, political bias, and stance with respect to various claims and topics [14, 18, 50] . Figure 5 shows the analysis for four relevant frames, highlighting 10 https://www.waspi.co.uk/about-us-2/ 11 http://app.swaggerhub.com/apis/yifan2019/Tanbih/0.8.0#/ striking differences across the frames, even within a single community. For example, Figure 5d shows that the Political frame for TVT evolves into propagandistic behavior as coordination increases. Conversely, for Policy Prescription, the propaganda score for the same community decreases. This suggests that the spread of propaganda is a theme-dependent phenomenon. Figure 5 also highlights that some communities deviate from the rest in terms of overall propagandistic content, such as SNP and TVT, which maintain a relatively high propaganda score of 0.45-0.60 for Public Opinion, compared to the remaining communities. Comparisons. Here, we discuss our methodology and our results compared to previous work, highlighting the usefulness and the advantages of our approach. Several earlier attempts at detecting inauthentic and harmful campaigns only investigated coordination and synchronization between user accounts [33, 40, 46] . In this work, all groups of users exhibiting unexpected coordination were considered to be malicious [33] . Despite representing an initial solution to the task of detecting malicious campaigns, this approach has a number of drawbacks. For example, if applied to our dataset, it would have flagged the LCH community as malicious, due to its extreme degree of coordination, as can be seen in Figure 3 . However, our nuanced analysis of propaganda and coordination revealed that the LCH users are protesting activists, which is a finding also confirmed by Nizzoli et al. [31] . Conversely, the TVT community features the secondlowest degree of coordination among our communities. As such, it would have been labeled as non-suspicious by previous techniques. Our analysis further revealed a strong positive correlation between propaganda and coordination for TVT users, thus uncovering their malicious intent. In summary, our results show that coordination alone does not provide enough information for assessing the real activities and intent of online communities. Instead, a methodology combining the analysis of coordination with signs of malicious intent (e.g., propaganda), such as the proposed one, can distinguish inauthentic and harmful behavior vs. authentic and harmless one. Our findings also confirm and extend previous results about the role of small and fringe Web communities in information disorder. Zannettou et al. [48, 49] noted that fringe, polarized, and strongly motivated communities are those that exert the most influence on the Web regarding issues such as disinformation and online abuse, despite being relatively small. In our analysis, we obtained comparable results. Indeed, the most interesting communities (i.e., those that exhibit coordinated yet markedly harmless or harmful behavior) are small and non-mainstream, such as LCH, B60, and ASE. However, while Zannettou et al. investigated this phenomenon across Web platforms, here we show that the same also occurs within platforms. We conclude our analysis by comparing propaganda and coordination scores to other clear signs of inauthenticity and harmfulness. We leverage scores indicating automation (i.e., botness) as a proxy for inauthenticity. For each account, we use the maximum of Botometer's English and universal scores, both provided in the [0, 1] range, as its automation score [37] . While we are aware of the limitations of current bot detectors [9] , including Botometer [36] , the strong interest on the role played by social bots in online manipulation campaigns motivates this analysis. Similarly, we investigate the number of accounts suspended by Twitter in each community, as a proxy for harmfulness. Table 5 reports in columns (b) and (c) the correlation results between our propaganda scores versus automation and Twitter suspensions, respectively. By cross-checking strong and significant correlations against notable variations in propaganda ( ), we highlight interesting trends. Regarding automation, the same communities that featured a strong positive correlation between propaganda and coordination -namely, TVT and ASEare also strongly correlated with automation scores. This means that highly coordinated users in TVT and ASE are both inauthentic and harmful, further confirming our earlier results. An unexpected result is instead obtained for B60, which features a strong negative correlation between propaganda and automation. In other words, while propaganda decreases as a function of coordination, automation scores increase. Thus, coordinated B60 users could be leveraging automation as a way to boost their online actions. Propaganda and automation trends for TVT, ASE, and B60 are shown in Figure 6a . Overall, TVT appears as the most harmful community throughout the online UK electoral debate, with high propaganda, high automation, and the largest share of accounts suspended by Twitter. We further measure strong positive correlation between propaganda and suspension trends for B60 and SNP. Since, for these communities, propaganda decreases with coordination, these positive correlations mean that Twitter suspensions also decrease, which is a sign of harmless behavior. This result is particularly relevant for B60, and it corroborates our previous findings. The trends of propaganda and suspensions for SNP, TVT, and B60 are shown in Figure 6b . We carried out the first combined analysis of propaganda and coordination in online debates. Specifically, we applied our methodology to the 2019 UK electoral debate on Twitter, revealing (i) harmful, (ii) neutral, and (iii) well-intentioned communities that took part in the debate. Among the most harmful communities, we found "tactical voters" (TVT), who colluded against conservatives, and a small group of political antagonists, who attacked labourists and Jeremy Corbyn with accusations of antisemitism (ASE). Among the harmless coordinated communities, we uncovered groups of activists protesting against loan taxation (LCH) and in favor of women's rights (B60). Besides providing novel and interesting insights into the communities that participated in the 2019 UK electoral debate, our results also demonstrate the need to combine analysis of coordinated user behavior and intent. Our methodology contributes to distinguishing between coordinated harmful and harmless behavior, thus overcoming one of the main limitations of earlier work. Among the future challenges along this important research direction is the construction of a reliable ground truth for coordinated harmful and harmless behavior. This endeavor would allow shifting from the current descriptive work to predictions, by training models that can detect harmful behavior. We also plan to collect and to investigate additional information about online communities, thus going beyond the analysis of coordination and propaganda. If successful, these efforts will allow a deeper understanding of coordinated online behavior, thus enabling the possibility to rapidly intervene, and ultimately to limit the spread, the influence, and the societal impact of online information disorders. Nakov. 2021. Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society A Two-Phase Framework for Detecting Manipulation Campaigns in Social Media Proppy: Organizing the news based on their propagandistic content Computational Propaganda and Political Big Data: Moving Toward a More Critical Research Agenda The Media Frames Corpus: Annotations of Frames Across Issues Graphika, and Stanford Internet Observatory. 2021. The Long Fuse: Misinformation and the 2020 Election. Stanford Digital Repository: Election Integrity Partnership Political polarization on Twitter The impact of projection and backboning on network topologies A decade of social bot detection SemEval-2020 task 11: Detection of propaganda techniques in news articles 2020. A survey on computational propaganda detection Prta: A system to support the analysis of propaganda techniques in the news Fine-Grained Analysis of Propaganda in News Articles Unsupervised user stance detection on Twitter New Dimensions of Information Warfare Detecting Propaganda Techniques in Memes SemEval-2021 task 6: Detection of Persuasion Techniques in Texts and Images 2022. Fine-Grained Prediction of Political Leaning on Social Media with Unsupervised Deep Learning Misinformation, manipulation and abuse on social media in the era of COVID-19 Reducing controversy by connecting opposing views It takes a village to manipulate the media: coordinated link sharing behavior during Argotario: Computational Argumentation Meets Serious Games Adapting Serious Game for Fallacious Argumentation to German: Pitfalls, Insights, and Best Practices Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape UK Election Analysis 2019: Media, Voters and the Campaign Evaluating the effectiveness of deplatforming as a moderation strategy on Twitter Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data A Synchronized Action Framework for Responsible Detection of Coordination on Social Media The Development of Connective Action during Social Movements on Social Media Coordinating Narratives and the Capitol Riots on Parler Coordinated behavior on social media in 2019 UK general election Unveiling Coordinated Groups Behind White Helmets Disinformation Uncovering coordinated networks on social media: Methods and case studies Searching for superspreaders of information in real-world social media Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking The false positive problem of automatic bot detection in social science research Detection of novel social bots by ensembles of specialized classifiers Brexit divides the UK, but partisanship and ideology are still key factors Extracting the multiscale backbone of complex weighted networks Identifying coordinated accounts on social media through hidden influence and group behaviours Disinformation as Collaborative Work: Surfacing the Participatory Nature of Strategic Information Operations Make Reddit Great Again: Assessing Community Effects of Moderation Interventions on r/The_Donald On the detection of disinformation campaign activity with network analysis Information disorder: Toward an interdisciplinary framework for research and policy making Who's in the Gang? Revealing Coordinating Communities in Social Media Amplifying influence through coordinated behaviour in social networks On the origins of memes by means of fringe Web communities The Web centipede: understanding how web communities influence each other through the lens of mainstream and alternative news sources Tanbih: Get To Know What You Are Reading The work is part of the Tanbih mega-project, developed at the Qatar Computing Research Institute, HBKU, which aims to limit the impact of "fake news, " propaganda, and media bias by making users aware of what they are reading, thus promoting media literacy and critical thinking.