key: cord-0450021-irsr9boo
authors: Trujillo, Amaury; Cresci, Stefano
title: Make Reddit Great Again: Assessing Community Effects of Moderation Interventions on r/The_Donald
date: 2022-01-17
journal: nan
DOI: nan
sha: ee496e968d1e183f5278e4723c88f0691e1e24de
doc_id: 450021
cord_uid: irsr9boo

The subreddit r/The_Donald was repeatedly denounced as a toxic and misbehaving online community, reasons for which it faced a sequence of increasingly constraining moderation interventions by Reddit administrators. It was quarantined in June 2019, restricted in February 2020, and finally banned in June 2020, but despite precursory work on the matter, the effects of this sequence of interventions are still unclear. In this work, we follow a multidimensional causal inference approach to study data containing more than 15M posts made in a time frame of 2 years, to examine the effects of such interventions within and without the subreddit. We find that the interventions had strong positive effects toward reducing the activity of problematic users both inside and outside of r/The_Donald. However, the interventions also caused an increase in toxicity and led users to share more polarized and less factual news. Additional findings of our study are that the restriction had stronger effects than the quarantine and that core users of r/The_Donald suffered stronger effects than the other users. Overall, our results provide evidence that the interventions had mixed effects and paint a nuanced picture of the consequences of community-level moderation strategies. We conclude by reflecting on the challenges of policing online platforms and by discussing implications for the design and deployment of moderation interventions.

The social media and news aggregator platform Reddit is among the most popular websites of the Internet, ranking as the seventh most visited website and the third most visited social media in the United States, as of December 2021. 1 The platform is organized in communities called subreddits, in which users can submit and discuss content regarding the community's shared topics and interests. Users can subscribe to subreddits of their interest to receive the latest content directly on their feeds, although the vast majority of subreddits are public and any user can participate in them, subscribed or otherwise. Reddit communities cover nearly every aspect of life, including news, sports, science, technology, religion, and a broad spectrum of social and other activities. Overall, politics is one of the most discussed topics on the platform, with several subreddits related to US politics consistently ranking among the most popular communities on the platform. The outreach of these communities is tremendous, since their content reaches millions of users in Reddit, other online platforms (e.g., Facebook, Twitter) [66] , and even the audience of traditional media [5, 60] . For these reasons, political subreddits such as r/politics, r/The_Donald, r/conservative, and others, have received much scholarly attention [33, 51, 60] .

In addition to platform-wise rules and policies, each subreddit sets its own guidelines to regulate the content shared and the interactions among its participants. Furthermore, unlike other platforms, users on Reddit can create and moderate their own subreddits in collaboration with others, provided that they meet a set of minimal requirements regarding account age and activity. As such, each community presents unique characteristics and develops its own habits, participation culture, and moderation rules [60] . Occasionally, though, some communities accept and even encourage potentially aggressive and harmful behaviors, and when they repeatedly violate Reddit's policies, platform administrators (i.e., Reddit personnel) can intervene to enforce moderation at the community level.

The right-leaning community of Donald Trump supporters of r/The_Donald was repeatedly denounced for toxicity, trolling, and harassment [25, 33, 43, 45] . As a result of such misbehavior, it faced a sequence of increasingly restrictive moderation interventions by Reddit administrators:

(1) Quarantine (June 26, 2019): r/The_Donald was quarantined following repeated reports for inciting violence, including threatening US public figures. A quarantined subreddit is removed from the platform's search results and from the feed of non-subscribed users, albeit these can still access its content if they opt-in after receiving a warning message upon visit [11] . These measures intended to reduce the subreddit's visibility and to deter newcomers. (2) Restriction (February 26, 2020): Administrators further restricted the subreddit and removed several of its moderators who were supporting content in violation of Reddit's policies, allowing new submissions only by approved users. Consequently, participation in the subreddit came to a complete halt within the following weeks, with the majority of users migrating to other subreddits or even to completely different platforms [33] . The existing content remained accessible for reading and commenting after the restriction. (3) Ban (June 29, 2020): r/The_Donald was banned, together with other two thousand subreddits, as part of Reddit's actions to enforce their policies. The ban permanently shut down the subreddit, removing it from the platform and avoiding the possibility for anyone to read or create content on it. Nevertheless, at this point the subreddit was already abandoned, with no new content posted for several weeks.

In addition to raising ethical and legal concerns as to whether online platforms should be allowed to limit the freedom of speech of their users, including that of major politicians [9, 31] , the practical consequences of these and other interventions are still unclear. Obviously, platforms apply moderation interventions to reduce the spread of toxic, hateful, fake, and otherwise problematic content [35] . However, the extent to which such interventions are capable of mitigating the issues is still up to debate, even more so given that certain interventions caused opposite effects to those planned -they backfired [2, 10, 13, 23, 50] . For these reasons, a growing body of studies aims at assessing the effectiveness of social media moderation strategies. Deplatforming -that is, the permanent banning of problematic users and even of entire communities-is a frequently used (and studied) moderation intervention. Chandrasekharan et al. [12] as well as Saleem and Ruths [55] independently analyzed Reddit's bans of racist (r/CoonTown) and fat-shaming (r/fatpeoplehate) communities, finding that banning such communities significantly decreased the usage of hate speech by their former members. In a different study [11] , Chandrasekharan et al. also investigated the effects of quarantining offensive communities, finding that the intervention significantly reduced the number of new users joining the communities, but not their overall extent of misogyny and racism. Horta Ribeiro et al. analyzed the activity of the former members of banned communities when they migrated to new platforms [33] . They found that banning a community on a platform results in reduced activity also on the new platform. The reduction in activity, however, may come at the expense of a more toxic and radical community [33] . Instead, Jhaver et al. evaluated the long-term effects of deplatforming controversial influencers on Twitter, finding that the overall activity and toxicity levels of their supporters declined after the bans [35] .

Despite these promising initial results, several questions are still left unanswered. Firstly, interventions may produce multiple effects [39] . However, existing literature almost exclusively investigated intervention effects with respect to the activity and the toxicity (or hate speech) of the affected users [11, 12, 33, 35, 55] . Other important dimensions were overlooked, such as the political polarization and the degree of factual reporting [2] . As such, it is unknown whether past interventions produced any effect, be it positive or negative, in the latter dimensions. Secondly, many different types of interventions are adopted by platform administrators, but the vast majority of studies only analyzed restrictions and bans (i.e., deplatforming) [12, 33, 35, 55] . The effects (or lack thereof) of other interventions, such as quarantines, are unclear [11] . Thirdly, existing studies evaluated each intervention in isolation. However, certain interventions are enforced as part of a sequence of actions, as in the case of r/The_Donald. Finally, interventions that do not completely shut a community (e.g., quarantines, as opposed to bans) may produce effects both within the moderated community, as well as in other communities. This is because most users browse and interact with multiple communities. Thus, when a community suffers a moderation intervention, the behavior of the members of that community might change not only within that community, but also in the other communities in which they participate. However, in the case of quarantine, its effects were primarily evaluated within the community targeted by the intervention [11] , and not on the other possibly affected ones.

The present study contributes to overcoming the aforementioned limitations. In particular, Reddit's quarantines are almost completely unexplored, since the majority of existing studies focused on deplatforming interventions [12, 33, 35, 55] . The only existing related study did not investigate effects with respect to the toxicity of the affected users and did not differentiate between different types of users [11] . For this reason, we ask the following starting question: RQ1: What were the effects of the quarantine, in terms of activity and toxicity, within r/The_Donald? Following from the previous question, we note that existing work evaluated intervention effects only in terms of activity and toxicity [11, 33] . However, interventions can have multiple effects, including possible negative side effects [2, 33] . Drawing conclusions about the effectiveness of an intervention only based on a limited set of metrics could be misleading. To this end, measuring effects also with respect to the quality of the news articles consumed by members of a community would provide deeper insights into the effects that moderation interventions actually have [2] . We thus seek answers to the following question:

RQ2: What were the effects of the quarantine, in terms of the quality of shared news articles, within r/The_Donald? Finally, previous work evaluated interventions in isolation. However, in the case of multiple subsequent interventions, precious insights could be drawn by evaluating the sequence of interventions so as to also compare them and identify the most effective one. Similarly, many existing works evaluated intervention effects only within the moderated community. However, while an intervention might have positive local effects, it could have overall negative global effects (e.g., on other communities, or on the platform as a whole) [15, 33] . Similarly to the previous research question, a limited scope in the analysis could lead to misleading conclusions. Evaluating effects on the other communities of the same platform allows a thorough understanding of the consequences of an intervention. Hence, we ask one final question:

RQ3: What were the effects of the sequence of interventions applied to r/The_Donald, on the other communities to which r/The_Donald members participated?

1.3 Summary of Methods, Findings and Implications

Our study is based on observational Reddit data comprising more than 15M posts collected over the course of 2 years, both within r/The_Donald and in other subreddits 2 . We operationalize possible intervention effects with metrics that measure user activity, the toxicity of shared comments, the degree of political polarization and facual reporting of shared news articles, and the group community proclivity. We adopt a set of causal inference methods including interrupted time series (ITS) regression analysis, and Bayesian structural time series modeling (BSTS) for the analysis of the temporal variations in measured quantities. In addition, we leverage appropriate tests to assess the statistical significance of all measured effects.

Regarding the effects of the quarantine on activity and toxicity within r/The_Donald (RQ1), we find moderate positive effects toward reducing user activity as well as strong short-term positive effects but strong long-term negative effects toward reducing toxicity. The quarantine also produced weak negative effects on the degree of factual reporting of the news articles shared within r/The_Donald (RQ2). No significant effect was found for the political polarization of the news articles. Results for RQ1 and RQ2 also reveal that core users of r/The_Donald suffered stronger effects with respect to the other users. Finally, the analysis of the effects that the quarantine and the restriction had outside of r/The_Donald (RQ3) reveal positive effects on reducing user activity, but negative effects on toxicity and on the degree of political polarization and factual reporting of shared news articles. Results for RQ3 also reveal that the restriction caused stronger effects than the quarantine.

Our results highlight that the sequence of interventions enforced on r/The_Donald had mixed effects. Overall, our results and other recent findings [33] partly question the positive judgments expressed in several previous works about the efficacy of Reddit's [11, 12, 55] and Twitter's [35] interventions. Furthermore, our nuanced results call for renewed efforts at evaluating the possible side effects of an intervention and the temporal variations of its effects. This research also contributes to building theories and methods [31, 37] to inform platform administrators for the design and deployment of future moderation interventions.

We provide background information and a critical literature discussion on three areas that are tightly linked to our present work: (i) the rise and fall of r/The_Donald, together with the many issues emerged therein; (ii) the moderation interventions put in place by online platforms for mitigating issues caused by problematic content and users; and (iii) the evaluation of the effects of such interventions.

Between 2015 and 2020 r/The_Donald served as an online space for supporters of the businessman and former US president Donald Trump. It was created on June 27, 2015, following the announcement of Trump's presidential campaign, and soon after the subreddit gained widespread popularity and traction among Trump enthusiasts, as well as among conservative and libertarian users [43] . At its peak, it counted almost 800K subscribers and it frequently ranked in the top-10 subreddits by activity. 3 Due to their technical skills, organization, motivation, and activism [29, 36, 59] , members of r/The_Donald managed to exert a strong influence on the news discussed on other social media platforms, like Twitter [11, 66] . Initially, discussions on r/The_Donald mainly focused on Trump-related news, with the vast majority of the posted content supporting his candidancy and later presidency. However, through time the subreddit slowly regressed to an alt-right bastion [33, 60] and a hub for far-right extremism [11] . The offensive nature of the content posted on r/The_Donald and the aggressive behavior of its members frequently caused considerable controversy and turmoil. Through the years, Reddit users, journalists and scholars repeatedly denounced r/The_Donald for being toxic and violent [33, 60] , racist, sexist and Islamophobic [29, 45, 54, 68] , engaged in coordinated trolling and harassment [25] , in strategic manipulation [59] , and in the spread of conspiracy theories [43] . The archetype of r/The_Donald's member was that of a white Christian male interested in conspiracy theories, firearms, and video games, and engaged in shocking and vitriolic humor [43] .

Many of the aggressive and harmful behaviors described above were in clear violation of Reddit's policies. For this reason, between 2019 and 2020 the platform administrators applied three increasingly restrictive moderation interventions to r/The_Donald. The first of such interventions (i.e., the quarantine) applied concepts of design friction [16] in order to make it more difficult for casual users to enter, and to be exposed to the content of, the subreddit [11] . Apart from the added difficulties however, all of the content from r/The_Donald remained visible and all interactions with content and users remained possible. Conversely, the second intervention (i.e., the restriction) made it impossible for the vast majority of users to post new submissions to the subreddit and eventually resulted in a mass migration of users to a new platform [33] . In practice, the restriction doomed r/The_Donald, even before the final ban that occurred four months later. Thus, our present work is part of the ongoing stream of research that aims at assessing the precise effects of these moderation interventions.

The many ailments that currently affect online social platforms -including those described above, as well as the spread of mis-and disinformation [22, 62] ; of propaganda and conspiracy theories [18, 48] ; the rise of hateful, abusive, and coordinated inauthentic behavior [27, 47] ; and the misbehavior of social bots and trolls [17, 67] -mandate the design and deployment of a number of moderation interventions. Many of these issues were exacerbated by the recent COVID-19 pandemic and its resulting infodemic [14, 24] , which led platforms to enforce a large number of interventions, unprecedented in both scale and nature.

For example, Twitter progressively deployed a number of moderation strategies ranging from adding friction to certain activities (e.g., adding a confirmation dialog when retweeting a link without having opened it) and attaching warning labels to tweets based on the presence of given keywords [65] , up to deplatforming individual users [35] as well as large coordinated groups. 4 Similarly, also Facebook and Instagram adopted warning labels and banned users, groups, and pages involved in coordinated inauthentic behavior [30, 47] , as well as those that promoted hateful or harmful behavior. 5 Pinterest blocked search results for anti-vaccination queries in an effort to curb vaccine misinformation. 6 Reddit has a long history of quarantining, restricting, and banning harmful communities [11, 12, 20, 33, 55] . This brief overview of recent moderation strategies emphasizes the diversity in scope and severity of the proposed interventions.

Platforms can also act at different levels of granularity: from moderating individual pieces of content or single users, to restricting entire communities, up to the point of making platformwide changes [11] . Examples of moderation of single users are the banning of Donald Trump from both Facebook and Twitter [41] and the banning of the controversial influencers Alex Jones, Milo Yiannopoulos, and Owen Benjamin from Twitter [35] . As for community-wide interventions, Reddit has provided many recent examples when it quarantined, restricted, or banned problematic subreddits such as r/The_Donald, r/CoonTown, r/fatpeoplehate, r/TheRedPill and r/Incels [11, 12, 33, 55] . Finally, common platform-wide interventions involve attaching warning labels to all content related to potentially controversial topics (e.g., COVID-19 vaccines) [58, 65] or changing the platform's interface so as to emphasize content publishers, in order to make it easier for the users to assess their credibility [23] .

Regarding severity, there exist two broad families of possible moderation strategies: soft and hard interventions. Hard interventions occur when administrators remove content, users, or entire communities from a platform (e.g., via deplatforming). Given that this type of interventions faced criticism due to censorship concerns and in support of free speech [65] , soft interventions were introduced as solutions to problematic content and users, without removing those content or users from the platform. For example, the influence of problematic content and users can be reduced by artificially limiting their visibility via demoting or shadow-banning [40] . This reduction in visibility often occurs transparently to all users on the platform, meaning that neither the demoted user nor its audience are aware of the moderation. In addition to this, other soft interventions might involve attaching warning labels to questionable content, thus informing those exposed to it of the potential risks, or even limiting the possibility for other users to interact (e.g., comment or re-share) with such content [58, 65] . Reddit's quarantines can be regarded as a form of the latter type of soft interventions, since they make access to problematic content more difficult, but without removing it from the platform [11] .

Given the multitude of moderation interventions recently put in place, a fundamental question arises about the efficacy of such interventions for mitigating the many issues affecting online platforms.

The recent body of works that evaluated Reddit's quarantines, restrictions, and bans represents the literature most related to our present study. Chandrasekharan et al. as well as Saleem and Ruths evaluated the effects of the bans that targeted r/fatpeoplehate and r/CoonTown in 2015, two subreddits whose users were known for harassment [12, 55] . Chandrasekharan et al. measured overall positive effects for the interventions. Specifically, they found that many former members of r/fatpeoplehate and r/CoonTown ceased using Reddit and that those who remained on the platform markedly decreased their hate speech usage. In addition, Saleem and Ruths found that the counteractions taken by the former r/fatpeoplehate members to circumvent the ban were short-lived and ineffective [55] . As it often happens with deplatforming, members who remained on Reddit after the bans "migrated" to other subreddits [33, 46] . Those subreddits, however, saw no significant changes in hate speech usage after the interventions [12] . Nonetheless, the former members of r/CoonTown more than doubled their posting activity when they migrated to r/The_Donald [11] . Reddit's quarantining of r/The_Donald and r/TheRedPill were evaluated in [11] . Authors concluded that the quarantines made it more difficult to recruit new members to the moderated communities, but that the overall degree of misogyny and racism of their existing members remained unaffected.

The previous studies shed light on (some of) the effects that Reddit's interventions had within Reddit itself. However, since such interventions caused many users to migrate to other platforms, those studies did not provide answers as to whether Reddit's bans made those problematic users "someone else's problem" [12] . Horta Ribeiro et al. aimed to answer this question in the aftermath of Reddit's 2020 deplatforming of r/The_Donald and r/Incels, whose former users migrated respectively to thedonald.win and incels.co [33] . They found that both interventions significantly decreased activity on the new platforms, reducing the number of shared posts, active users, and newcomers. However, former users of r/The_Donald showed increases in toxicity and radicalization, supporting the hypothesis that the reduction in activity may have come at the expense of a more toxic and radical community [33] .

Besides Reddit, Jhaver et al. evaluated Twitter's ban of three controversial influencers. They found that the deplatforming intervention significantly reduced the number of conversations about all three individuals and that their supporters decreased their overall activity and their degree of toxicity after the intervention [35] . Much effort have also been devoted to assessing the efficacy of other interventions, such as those based on attaching warning labels to posts containing misleading political information [44, 50] and COVID-19 vaccine misinformation [58] . Mena found that users were less willing to share content with warning labels, supporting the effectiveness of this approach [44] . This result is however conflicting with more recent research that measured a higher engagement to posts with warning labels with respect to those without labels [65] . Also Sharevski et al. obtained contrasting results when studying the effects of countermeasures to COVID-19 misinformation on Twitter. They found that warning labels failed to reduce the perceived accuracy of misinformation tweets, but also that adding interstitial covers to such tweets before displaying them led to positive results [58] .

Other contrasting results were obtained with respect to the moderation approaches based on emphasizing the source (i.e., the author or publisher) of the content shared online. In detail, Dias et al. concluded that increasing the visibility of publishers is an ineffective, and perhaps even counterproductive, way to address misinformation [23] . Similarly, some scholars designed interventions for exposing polarized users to content or users with the opposing viewpoint, as a way to reduce online polarization [28] . However, experiments carried out in the context of political polarization in the US suggested that exposing users to opposing views on social media can actually increase their political polarization, instead of reducing it [2] .

The results of the studies discussed in this section surface a general consensus that deplatforming interventions lead to overall positive results, at least with respect to the activity. At the same time however, nearly all studies also reported a number of negative side effects. In light of these results, one of the most recent studies along this line of research underlined the importance of carrying out nuanced analyses in order to accurately assess the (sometimes mixed) consequences of moderation interventions applied to complex online social systems [33] . Within this scientific context, our present work extends and complements existing studies by investigating a number of unexplored aspects. Firstly, we extend previous analyses on activity and toxicity by also investigating the quality of the news articles (i.e., in terms of their political polarization and the degree of factual reporting) shared and consumed -important aspects that contribute to accurately understanding the consequences of moderation interventions, including their negative side effects [2] . Secondly, we compare the effects of subsequent interventions applied to r/The_Donald, thus contributing to analyze their relative effectiveness. Until now, such interventions were only analyzed in isolation [11, 33] . Finally, we contribute to filling the scientific gap related to the analysis of Reddit's quarantines, whose effects were so far only analyzed by [11] , and only within the moderated community. Here instead, we also evaluate the nuanced effects that quarantining r/The_Donald had on other communities on Reddit, similarly to what [33] did when moderated communities migrated to other platforms.

Our study is based on observational Reddit data comprising more than 15M posts collected over the course of two years. In detail, our data is organized into three datasets that respectively contain: (i) all content shared within r/The_Donald in a time frame centered around the quarantine; (ii) all content shared within r/The_Donald by core users of the subreddit, in a time frame centered around the quarantine; and (iii) all content shared outside of r/The_Donald (i.e., in all other subreddits) by core users of r/The_Donald, in a large time frame that encompasses both quarantine and restriction. The mapping of our datasets to our research questions is shown in Table 1 , together with other summary information. In detail, we use the first and second dataset to answer to RQ1 and RQ2 (quarantine effects within r/The_Donald). Instead, we use the third dataset to answer to RQ3 (quarantine and restriction effects outside of r/The_Donald).

Posts on Reddit are primarily divided into two kinds: submissions and comments. Previous work about moderation interventions on r/The_Donald aggregated both into a single metric (e.g., daily number of posts) [33] . However, submissions and comments represent two intrinsically different activities that, in general, follow different dynamics [64] . Moreover, certain moderation interventions (e.g., the restriction suffered by r/The_Donald) are designed to have a strong impact on submissions, while leaving commenting activities unaffected. We thus conducted separate analyses for submissions and comments. As anticipated, activity in r/The_Donald came to a halt shortly after the restriction. Hence, the subreddit was already inactive when the ban occurred. Figure 1 shows the daily number of submissions and comments shared in r/The_Donald, as well as the three moderation interventions. The figure clearly shows the drop in activity after the restriction. Consequently, and in line with previous work [33] , the time frame considered for our analyses is centered around the quarantine and the restriction. We describe the definitions and processes used for data collection in the following paragraphs. Between-I

Timeline depicting the pre-post periods used for data collection and for our causal analyses. Specifically, we use 2 time windows centered around the quarantine and the restriction, plus a convenience period between the two interventions. 

For the collection of all Reddit data used in our study, including the posts from r/The_Donald that are no longer available in Reddit because of the ban, we used the monthly archives from Pushshift, a service that provides historical Reddit data [3] . The time frame that we used for data collection and analysis is centered around the two interventions of interest. In literature, different time frames were used: ±60 days around interventions [55] , ±120 days [33] , ±180 days [11, 35] , and ±200 days [12] . Given that circa 245 days passed between the quarantine and the restriction, we used a time frame spanning ±210 days (30 weeks). This choice allows us to divide a given time window around an intervention by groups of ten days [11, 12] or seven days, which eases the analysis of our time series. We thus collected data from November 28, 2018 to September 23, 2020 (inclusive), in which 1.06M submissions and 12.3M comments were posted to r/The_Donald. We then further divided the daily content into two pre-post intervention periods around quarantine (Q) and restriction (R), plus a convenience period between both -all of which exclude the intervention days-as illustrated in Figure 2 .

The selection of representative members of a community (i.e., core users) is an inherently subjective -but sometimes inevitable [47] -process. Before the quarantine r/The_Donald was a public space in which any registered user could post. As a result, there are many posts whose authors participated in the subreddit very sporadically, or even only once during the time frame of our study. Moreover, several users authored many posts in r/The_Donald, but only on a single day, or for a single submission or thread. Keeping in mind that interventions can have different effects depending on user characteristics and that the interventions enforced by Reddit were targeted at the core members of r/The_Donald, we operationalized our definition of core users. In previous work, similar user-filtering steps were based on arbitrary numbers of content that users posted in all of the considered time frame. For instance, a previous work on Reddit deplatforming only considered users who had authored at least five posts (submissions or comments) in the studied communities [12] .

Another study on Twitter deplatforming identified as supporters of the controversial show host Alex Jones those users who posted at least ten times prior to his deplatforming and who had at least five tweets with a hashtag supportive of him [35] . Here, we adopt a more rigorous approach to identify core users and we define them as those users who authored at least one post (i.e., either a submission or comment) a week, for the whole 30 weeks of the pre-quarantine period. In this way, we ensure that a core user had a minimum number of posted content and a prolonged involvement with the community (circa seven months), before any intervention took place. Based on this definition, and after removing a few moderation bot accounts, we identified 2,239 core users of r/The_Donald, which represent only 1.12% of the 198.5K distinct post authors in the subreddit for the study time frame. Despite their small number, core users posted 37.5% of the total submissions and more than 53.9% of the total comments.

After the selection of the appropriate study time frame ( §3.1) and of the core users ( §3.2), we constructed our three datasets, summarized in Table 1 . The first dataset (TD) is centered around the quarantine, and contains r/The_Donald posts made by all users (982K submissions and 11.4M comments). The second dataset (CUw/iTD) is a subset of the previous, with only posts made by core users within r/The_Donald (233.8K submissions and 3.29M comments). The third dataset (CUw/oTD) contains all posts that core users made outside of r/The_Donald, independently on the subreddit. This last dataset encompasses both the quarantine and the restriction. In order to better inform our choice of inference methods we also conducted an exploratory time series analysis on daily content and unique users for all of the datasets. This step allowed us to test for autocorrelation and seasonality before any intervention (i.e., Pre-Q period), which can cause certain causal inference methods to yield inaccurate results [8] . For autocorrelation, we applied the Durbin-Watson (DW) test of the lmtest R package. To identify seasonality, we used iterative STL (seasonal and trend decomposition using LOESS) of the forecast R package, with which we also derive seasonal strength on a scale from zero to one [34, Ch. 6] . All of the time series analyzed showed a significant positive autocorrelation, with values for the DW statistic ranging from 0.582 to 1.536. We also identified a weekly seasonality, with a strength that ranges from modest to strong (0.181-0.764). Both general and core users of r/The_Donald were most active around mid-week and least active around the weekend during the Pre-Q period, albeit core users showed higher weekly seasonal strength. This seasonality also indicates that having a subdivision of time periods in groups of seven days is more appropriate than the ten days used in other works [11, 12] .

Taking into account the content structure and nature of our datasets, we defined intervention effects, gathered additional data, and selected the corresponding statistical descriptive and inference used to answer to our research questions. The conceptual approach we followed is illustrated in Figure 3 and detailed in the following paragraphs.

A single intervention can have multiple effects [39] . Therefore, in order to accurately assess intervention effects it is necessary to investigate possible changes in many dimensions. However, what constitutes an active user could differ from platform to platform. We reprise the definition for core users given in §3.2 to define DAU as those users that performed at least a posting activity on a given day. Analogous to our metrics for posted content, we separate daily active users depending on the fact that they posted on a given day at least (iii) a submission or (iv) a comment. These are thus the four metrics used in our analyses for measuring possible intervention effects related to user activity.

One of the primary objectives of moderation interventions is that of improving the quality of online discussions. In literature, the presence of abusive, hateful, disrespectful, and uncivil posts is often used as an indication of low quality discussions [51] . For instance, several works evaluated the effectiveness of moderation interventions in terms of their capacity to reduce hate speech [11, 12, 55] . This approach, however, has a number of issues. Firstly, we lack a single and well agreed-upon definition of hate speech [19, 54] . Secondly, the detection of hate speech is often based on the presence of certain hateful words in a text. However, selecting an appropriate dictionary of hateful words is challenging and context-dependent [19, 26] . Finally, the mere presence of hateful words in a text has only limited power of expressing the degree of harassment and incivility of a discussion. For these reasons, others recently used toxicity in place of hate speech as a more general indicator for the quality of online discussions [51] and also for the evaluation of interventions [33, 35] .

The Jigsaw unit at Google developed and released the Perspective API 7 , a widely used public service that computes multiple toxicity-related scores for comments [53] . Among these, in our analyses we rely on the severe toxicity score, similarly to several previous works [33, 35] . Severe toxicity is defined as being «very hateful, aggressive, disrespectful [...] or otherwise very likely to make a user leave a discussion or give up on sharing their perspective», and is considered to be the most reliable and robust 8 indicator of toxicity among those provided by the Perspective API [33, 51] . Based on the system's severe toxicity scoring, defined in the [0, 1] range, we compute two metrics of toxicity for Reddit comments: daily median and daily relative frequency. The first one is computed as the median of the severe toxicity scores of comments aggregated on a daily basis. For the second metric, we first convert scores into binary labels by considering as "severely toxic" all those comments whose score is ≥ 0.5. The metric is then computed as the fraction of severely toxic comments on a daily basis. We include both metrics in our analyses because they provide slightly different information and both have been recently used in studies on online toxicity [35, 51] .

Polarization. The degree of polarization or radicalization of a community is another important metric to consider when investigating the "health" of online spaces, and several interventions were specifically designed with the aim of reducing online polarization [2, 28] . Meanwhile, concerns have surfaced about the possibility that deplatforming interventions might increase the polarization of the affected users [33] .

To investigate this hypothesis, we also evaluate intervention effects in terms of the changes they induce to the overall degree of political polarization of the affected users. In order to obtain a metric of political polarization, we adopt an approach based on the analysis of the news articles shared by a community of users. To this end, we leverage data from Media Bias/Fact Check (MBFC) 9 , a widely-used platform that provides expert-curated fact checks and audit information about a large set of US media outlets [7, 14] . Among the information available from MBFC, we use political bias as our metric of political polarization. For each news outlet that is categorized as politically inclined, MBFC provides a political bias label as a Likert-like rating with five labels based on the US political spectrum, ranging from "left" to "right". In our analyses we compare the distribution of such labeling for the articles shared by a community before and after a given intervention.

Grounding comments on facts and sharing a common vision of reality with those with which we interact, represent key elements for achieving healthy online conversations [4] . Because of this, scholars have long monitored the extent to which online communities share and consume news from sources known for adopting fact-based journalistic practices [7] .

Here, we evaluate the extent to which users affected by a moderation intervention change their news sharing behaviors, with respect to the degree of factuality of the news sources they share. To obtain a factuality metric, we again leverage data from MBFC. Specifically, for each news outlet MBFC provides a factual reporting Likert-like rating with six labels, ranging from "very low" to "very high" factuality. Similarly to our analyses of political polarization, we use factual reporting to study intervention effects by comparing their distribution before and after a given intervention.

Proclivity. User participation in online communities can be represented not just by the number of posts or unique users (i.e., what we herein call activity), but also by the overall structural relationship between a group of users and the set of communities in a platform. That is, when a group of users -as a whole-is more inclined to participate in certain communities rather than others, then the former are more important for that group. We refer to this concept as group community proclivity. The concept is akin to ranking user importance in social networks with centrality algorithms such as PageRank [32] , but in this case we have two different kinds of nodes: users and communities. We are thus interested in comparing group community proclivity of core users of r/The_Donald within Reddit before and after interventions on the subreddit. Additionally, since we carry out separate analyses by kind of content, we have two bipartite graphs (one by submissions and the by comments) for each analyzed period. This approach allows us to compare structural changes in the user-subreddit relationship before and after a given intervention, as illustrated in Figure 4 . To obtain a metric of group community proclivity, we first represent the user-subreddit relationship for a given period as a weighted undirected bipartite graph = ( , , , ), where is the set of core users, is the set of subreddits, is the set of edges connecting nodes in and , and are the weights of edges. An edge ∈ exists from ∈ to ∈ if has authored content on in the given period, with ∈ N + being the number of posts authored by in . We then adopt a ranking algorithm for bipartite graphs so as to obtain a ranking of subreddits according to their importance for the group of users. Among the available ranking algorithms, we adopted Co-HITS -a version of the well-known HITS algorithm adapted to bipartite graphs [21] . In our analyses we used the Co-HITS implementation from the birankr R package.

We now describe our choice of statistical methods for assessing intervention effects.

4.2.1 RQ1: Effects on Activity and Toxicity within r/The_Donald. We followed a quasi-experimental approach to measure the intervention effects on activity and toxicity within r/The_Donald. Based on our preliminary time series analysis ( §3.3), we leverage two causal inference methods in a complementary manner to describe the trend, onset and decay of intervention effects: interrupted time series (ITS) regression analysis, and Bayesian structural time series (BSTS) modeling.

ITS regression analysis aims to establish the underlying trend of the variable of interest across a continuous sequence of observations before and after being "interrupted" by an intervention at a well-defined point in time. In its most basic form, ITS can be implemented through simple segmented linear regression models, which renders it easy to use, understand, and plot. For these reasons it was used in works on social media interventions, either as complementary to simpler methods such as difference-in-differences [12] , or as the only causal inference method [11, 35] . However, specific regression families should be used in the case of non-linear trends and for particular underlying data distributions [6] . Moreover, ITS has caveats in case of autocorrelation and seasonality, which demand that the model be adequately adjusted so as to avoid obtaining misleading estimates of effect sizes [56] . Herein we use the simple and interpretable ITS regression to visualize variations in the linear trends of the variables of interest around interventions, according to the following segmented linear model,

where is the outcome variable of the time series; is a continuous variables that indicates the time in days from the start of the observational period, with 1 indicating the trend before the intervention; is a dummy variable indicating the presence (1) or absence (0) of the intervention, with 2 indicating the immediate change upon intervention; is a continuous variable that indicates the days passed since the intervention has occurred (from 0 to 210), with 3 indicating the change in trend after intervention; finally, is the error term of the model. We instead use the more robust BSTS to compute effect size, confidence intervals and significance. BSTS is an approach based on Bayesian statistics that uses a structural time series model to capture the trend, seasonal, and related components of a time series, together with a dynamic regression component using Monte Carlo Markov Chain (MCMC) to create counterfactual data and confidence intervals [57] . Given its flexibility to accommodate multiple sources of variation, BSTS was recently used across different domains for forecasting [61] , nowcasting [49] , and for causal inference [42] . BSTS improves on ITS in two main aspects: it provides a fully Bayesian estimate for the effect across time, which can be updated as new information is available; and it uses model averaging to construct a synthetic control to model the counterfactual, which is particularly valuable for our analyses.

Indeed, Reddit users are mostly free to participate in as many subreddits at the same time as they wish -and the majority of users actually does so-especially across related communities. This results in a significant overlap of users in these communities, which cannot be considered as independent. Additionally, before the quarantine r/The_Donald was one of the most active communities overall, despite its focus on a single political figure. Finding a "control" subreddit with similar characteristics would be problematic, let alone an independent one. For instance, in a work that uses control groups to study intervention effects on Reddit, Chandrasekharan et al. recognize that their method is an «involved process that included many manual and computationally intensive steps, which made it difficult for us to scale our study to analyze more quarantined subreddits» [11] . For these reasons, and despite the adoption of ITS in many recent studies that measured intervention effects [11, 12, 35] , BSTS represents a better alternative for estimating effects, given the autocorrelation and weekly seasonality of our data [8] . For our analyses we leverage the BSTS implementation provided by the CausalImpact R package.

: Effects on the Quality of Shared News Articles within r/The_Donald. Concerning the possible changes in the quality of shared news articles in r/The_Donald, we compare the political polarization and factual reporting scores of the news outlets linked in Reddit submissions, before and after a given intervention. Considered that only circa half of the collected submissions contain a link to an external website, of which only a subset point to news outlets in the MBFC database, we aggregate these links by pre-and post-intervention periods. To probe effects in political polarization, we further aggregate data on the left and right side of the political spectrum (excluding neutral news outlets, which are assigned the "least biased" label by MBFC). Then, we compare the differences in the distributions of political polarization scores before and after an intervention. Finally, we apply 2 tests for significance, albeit with the awareness that the partially-matched data between interventions could give unreliable results in case of small differences. For estimating effects in factual reporting, we follow a similar approach where we first aggregate data by lower and upper ends on the factuality spectrum, and then we conduct 2 tests on the proportions.

Interventions outside of r/The_Donald. Answering RQ3 involves the analysis of possible effects on user activity and toxicity, for which we adopt the methodological approach already described for RQ1, as well as possible effects on the quality of shared news articles, for which adopt the approach used for RQ2. In addition, RQ3 also involves the evaluation of effects in group community proclivity. In line with the approach used in RQ2, to assess group community proclivity we aggregate data on three periods: pre-quarantine, between-interventions, and post-restriction. Between-interventions is a convenience period that covers post-quarantine and pre-restriction, which we confirmed bears no significant difference with respect to the periods it substitutes, as there is less data compared to activity dimensions and their spans mostly overlap, as sketched in Figure 2 .

To measure changes in proclivity (i.e., changes in the rankings generated by Co-HITS) between two periods, we use the rank-biased overlap (RBO) for indefinite lists [63] . RBO is a probabilistic model of similarity between two ranking lists based on average overlap, with a bias in the proportional overlap at each depth of weights, which can handle tied ranks and rankings of different lengths [63] . The latter is important in our case as other methods used for ranking similarity (e.g., Kendall's tau distance) only handle lists with the same items. However, some subreddits become inactive with the course of time and new ones are created. In addition, we are particularly interested in changes about the top subreddits with the most proclivity, not on whole lists. We compute RBO scores with the corresponding function of the gespeR R package.

Our analyses indicate that the quarantine had a significant decreasing and sustained effect on posts and DAU, more so in the case of core users. Toxicity was also subject to a significant decreasing effect, with the immediate downward effect being noticeable. However, toxicity later exhibited a more evident rising trend, reaching or even surpassing pre-intervention values by the end of the analyzed period. In the following we report on our detailed results. Table 2 . BSTS results of quarantine effects on activity within r/The_Donald for all users (TD) and core users (CUw/iTD). The lower the posterior tail-area probability (pp), the higher the probability of a causal effect. ( 1 ) of activity in r/The_Donald was pretty stable -albeit slightly decreasing-in terms of daily submissions, comments, and active users, as shown in Figure 5 . Upon quarantine, there were activity spikes in all of these metrics that lasted circa two days, but in all cases there is a noticeable subsequent effect ( 3 ) of decline in activity. However, the ITS immediate effect ( 2 ) was heterogeneous in direction among the metrics. For instance, upon quarantine there was an immediate increase on submission DAU but a decrease on comment DAU, as visible in Figures 5c  and 5d . At first glance, ITS seems to indicate that core users had a more marked decline in activity compared to all of the users. According to the results of the BSTS analysis, reported in Table 2 , all activity metrics had significant negative relative effects (i.e., average difference between observed and predicted post-intervention values). The most important effects concerned the number of daily active core users (-23%) and the respective comments (-27%), whilst the least important were on posts made by all users (-8.5% for submissions and -10.7% for comments). Hence, the quarantine indeed had a highly significant effect of reducing the activity of core users within r/The_Donald.

We collected toxicity scores for 3.29M comments made by core users within r/The_Donald around the quarantine, which represent 32.1% of the non-erased comment total for the same time span. Based on the ITS analyses shown in Figure 6 , during pre-quarantine the median severe toxicity score of core users' comments had a noticeable increasing linear trend. However, upon quarantine there was a significant immediate downward effect, with the median reaching its lowest value a few days later. Despite this drop in toxicity, the previous increasing trend reprised during post-quarantine, reaching median levels similar to those in pre-quarantine. Regarding "severely toxic" comments, there is a less marked increasing linear trend compared to the median, although both the immediate effect and the subsequent increase post-quarantine are much more evident. Additional analyses on the robustness of our results confirmed that these effects hold at different thresholds of severe toxicity used for labeling "severely toxic" comments, as well as with other Perspective toxicity scores (e.g., toxicity, insult, threat). According to the BSTS results reported in Table 3 , the causal effects can be considered statistically significant (posterior tail-area probability ≈ .001), but as previously stated, there is a noticeable decay of the effects, with the relative frequency of comments classified as "severely toxic" reaching even higher levels compared to pre-quarantine. The quarantine thus had strong positive immediate effects (i.e., it greatly reduced toxicity) but also strong negative long-term effects. 

Around the quarantine, the submissions made to r/The_Donald were 980K for all users and 338K for core users, of which respectively 54.2% and 42.4% have an external link. The submissions with a link to a news outlet in MBFC were 205K for all users (20.9% of total submissions) and 69.3K for core users (20.5% of total submissions). In general, news content within r/The_Donald is highly polarized and unreliable, as clearly shown in Figure 7 , and slightly more so for core users. In either case, for both groups of users the quarantine did not have a noteworthy effect on political polarization or factuality, except for political polarization of all users, which saw a moderate increase in right leaning. Thorough results are reported in the following. Figure 7a shows that most news articles (53%) shared within r/The_Donald came from politically biased sources. Unsurprisingly, the majority of these (62%) fall at the right of the US political spectrum, albeit with little ground for right-center outlets (11%), as depicted in Figure 7b . News shared by core users were even more biased to the right, compared to those shared by all users, with 64% and 58% respectively. Regarding political polarization after the quarantine, there was a small but significant increase of three percentage points in bias to the right for all users ( 2 = 90.3; < .001), while for core users we measured an increase of only one percentage point, which is however not statistically significant ( 2 = 0.641; = .423). Figure 7c shows that around 75% of shared news content came from outlets in the lower half of the factual reporting spectrum, with 40% of the total shared articles deemed as mostly fake (i.e., from sources that deliberately attempt to publish hoaxes and/or disinformation for profit or influence). It should be noted, however, that outlets considered to be fake do not necessarily have the same rating within the lower accuracy levels. On average, core users shared slightly less reliable content compared to all users, but there was no significant change in factual reporting 39 accuracy between pre-and post-quarantine ( 2 = 1.0353; = .308), whereas the factuality decrease for all users was small (-1.2 percentage points) but significant ( 2 = 47.37; < .001).

The activity and the toxicity of core users outside of r/The_Donald were differently affected by both interventions. In the case of the quarantine, and similar to activity within r/The_Donald, the number of comments and the number of DAU for both kinds of content (i.e., submissions and comments) significantly decreased after the intervention. Submissions, however, had an upward trend which rendered the effect non-significant (i.e., there were fewer users but more submissions per user). The effects on toxicity were much weaker compared to those within r/The_Donald, with only the effect on relative frequency of severely toxic comments being significant. Concerning the restriction, the effects on all activity metrics were significant and much stronger than those of the quarantine, except for daily comments in which the relative effect is moderate. The effects on toxicity are less clear and more nuanced, however, especially in the longer term, as the George Floyd protests started circa 90 days after the restriction and seem to be the most likely cause of the toxicity surge that we measured in those days. We thus conducted additional analyses by narrowing the time frame of analysis, so as to exclude the protests. Results of these additional analyses indicate a change in the trend of toxicity that passed from being decreasing to increasing after the restriction. This effect is significant for the fraction of severely toxic comments but otherwise for median severe toxicity values. Detailed results are in the following.

In general, core users activity outside of r/The_Donald suffered an immediate and sustained decrease in the post-quarantine period (consistent with activity within the subreddit), except for the number of daily submissions, which manifested an increasing trend post-intervention. This means that the remaining active core users increased their activity in terms of submissions to other subreddits, after a slight decrease upon quarantine. Indeed, the BSTS analysis reported in Table 4 indicates that there is no significant effect for submissions after quarantine (relative effect of -0.9%, = .343), whilst for the other metrics we measured significant effects ( = .001), with relative effects ranging from -14.5% to -17.9%.

The situation changes in the post-restriction period. On the one hand, even if the number of DAU that commented significantly decreased (relative effect of -20.6%, = .001), the number of comments had an immediate slight increase upon restriction and a less pronounced decreasing trend (relative effect of -6.6%, = 0.051), as illustrated in Figures 8b and 8d . On the other hand, the restriction had a strong and significant effect ( = .001) on both the number of submissions (-33.2%) and submission DAU (-24.6%). In part, this decrease is likely due to the migration of members of r/The_Donald to a different platform (thedonald.win) upon restriction -a discussion forum created after the quarantine and highly publicized in the subreddit during pre-restriction [33] .

We collected toxicity scores for 3.19M comments made by core users outside of r/The_Donald, which is, by the way, a lower number of comments with respect to the 3.5M comments made within r/The_Donald alone by the same users during the same time frame. Regarding the quarantine, the pre-intervention periods of both median values and relative frequency of severely toxic comments had a moderate increasing linear trend, which upon quarantine became decreasing, as shown in Figures 9a and 9c . The BSTS results summarized in Table 5 show that the effect for the median values was not significant ( = .125), whilst it is signiicant for the fraction of severely toxic comments ( = .021). Table 4 . BSTS results of quarantine (Q) and restriction (R) effects on core users activity outside of r/The_Donald. The lower the posterior tail-area probability (pp), the higher the probability of a causal effect. Concerning the restriction, the situation is more complex, as there is a remarkable surge in toxicity during the weeks of the George Floyd protests, started on May 26, 2020. The protests spiked after the killing of the eponymous unarmed Black civilian by a US police officer during an arrest for a minor incident. The intensity of the protests -which further increased uncertainty brought by the COVID-19 pandemic-had a polarizing effect in the US public opinion, particularly between liberals and conservatives [52] . Our result is consistent with the difficulties encountered by Horta Ribeiro et al. in analyzing comment toxicity in r/The_Donald with a different scoring system. For these reasons, we performed additional analyses with narrowed pre-post restriction periods of ±12 weeks, thus excluding these protests.

In either pre-restriction period span, both median values and relative frequencies showed a decreasing linear trend, clearly visible in Figures 9b and 9d . All of the post-restriction trends show an upward trend, except for the 30-week period pertaining to relative frequency of toxic comments in which is downward. Based on the BSTS analysis, whose results are in Table 5 , post-intervention effects are significant ( ≤ 0.003), except for the median values during quarantine ( = .125) and narrowed restriction ( = .462). Finally, we argue that -unlike the quarantine-the restriction had the opposite effect with respect to the intended one and actually increased toxicity in the short term instead of reducing it. Anyway, we cannot be certain of the restriction effect in the longer term because of the unusual confounder that we encountered (i.e., the George Floyd protests).

During the time frame of the study, there were 399K submissions made by core users outside of r/The_Donald, of which 91K have an external link (22.9%), and 49K are present in MBFC (12.37% of the submission total). Around half of the shared news links pertained to politically biased outlets. Before any intervention, core users shared fewer articles from outlets biased to the right of the political spectrum outside of r/The_Donald (51%) compared to within (64%). However, upon both quarantine and restriction there was a significant increase in bias to the right in content posted in other subreddits, respectively reaching 53% ( 2 = 13.07; < .001) and 63% ( 2 = 95.8; < .001) in the post-interventions periods, as shown in Figure 10b .

We noticed a similar phenomenon in the decrease of factuality of shared news after each intervention, as it can be seen in Figure 10 . Moreover, before any intervention, core users shared more factual content outside of r/The_Donald compared to within, as visible in Figures 7c and Fig. 10c . For instance, in the pre-quarantine period the share of news articles from unreliable sources was 67% outside of r/The_Donald and 73% within. Between interventions, Table 5 . BSTS results of quarantine (Q) and restriction (R) effects on core users toxicity outside of r/The_Donald. Pre-post periods span 30 weeks, except those narrowed to avoid George Floyd's protests (n.), which span 12 weeks. The lower the posterior tail-area probability (pp), the higher the probability of a causal effect. the low-factuality outside of r/The_Donald increased to 72% ( 2 = 90.1; < .001), and in the post-restriction period increased to 75% ( 2 = 28.8; < .001). This change is most likely related to the increase in proportion of articles from fake outlets after each restriction, as shown in Figure 10a , which during pre-quarantine period was 37.7%, then 42% between the restrictions, and finally 43.3% in the post-restriction period. 5.3.5 Group community proclivity. Results of intervention effects on group community proclivity denote a different proclivity dynamic for submissions and comments, as reported in Table 6 . Specifically, the rankings of most prominent subreddits based on submissions are less similar among intervention periods compared to those based on comments. In particular, the change in similarity for Pre-Q and Between-I by submissions (.329) is much lower than the respective value by comments (.658). Thus, after the quarantine, core users showed a higher proclivity for a much different set of subreddits in which to post submissions, compared to comments. This phenomenon can also be seen in the changes in proclivity between Pre-Q and Post-R periods depicted in Figure 11 , in which the subreddits that are present in the top 20 lists for both periods, are fewer for the lists based on submissions than for those based on comments. Incidentally, three subreddits climbed to the top four positions in both content types during Post-R: r/conspiracy, r/Conservative, and r/trump.

It should be noted that these changes in proclivity could have been influenced by changes in the number of subreddits in which core users participated. For this reason, we also investigated the effect of interventions on the number of daily distinct subreddits. Before the quarantine, the daily average of distinct subreddits in which core users of r/The_Donald participated -as a whole-was of 132.3 by submissions and 1078.9 by comments. Upon quarantine, our BSTS analysis indicates a significant ( = .001) decrease of -8.6% by submissions and -11% by comments, albeit both immediate effects and trends are moderately downward, as visible from the ITS analysis reported in Figure 12a . Upon restriction, however, the post-intervention downward trend and significant ( = .001) relative effects are more pronounced, especially in the case of submissions at -33% with respect to comments at -11%, as shown in Figure 12b . Interestingly, after the quarantine there was a noticeable and significant decrease of core users activity within r/The_Donald, shown in Figure 5 , as well as outside of it in terms of comments and DAU, for both kinds of content, as shown in Figure 8 . Nevertheless, the number of distinct subreddits in which core users participated was only moderately reduced, as visible in Figure 12a , while the number of distinct submission subreddits increased, but with a very dissimilar proclivity with respect to before the intervention, especially when compared to comments. Therefore, our results suggest that after quarantine the diminishing core users started to post more and more submissions, but in different subreddits with respect to before, as a way to elude the intervention. However, comment activity did not follow suit, as it declined within a set of subreddits with a proclivity relatively similar to before the intervention. 

Our results allow to evaluate the effectiveness of quarantining and restricting r/The_Donald, both within the subreddit as well as in other parts of Reddit. In summary, the quarantine had mild positive effects with respect to reducing user activity and strong positive immediate effects for reducing toxicity within r/The_Donald. These effects were stronger for core users of r/The_Donald with respect to the other users. On the contrary however, quarantining r/The_Donald had no significant effects on news quality and, more worryingly, had strong negative long-term effects concerning toxicity. In particular, the degree of toxicity within r/The_Donald reverted to, and even surpassed, pre-intervention levels around 6 months after the quarantine.

Regarding the effects that quarantining and restricting r/The_Donald had on other subreddits, we found positive effects towards reducing user activity. Specifically, while the quarantine produced mild effects, the restriction was instead very effective at reducing activity. However, both the quarantine and restriction had overall slightly negative effects at reducing toxicity. In detail, the quarantine had small negative immediate effects and strong positive long-term effects, while the restriction had moderate negative immediate effects as well as negative long-term effects. Moreover, both interventions also had marked negative effects on the quality of the shared news, which gradually became more polarized and less factual.

Overall, our results support the following conclusions:

• The restriction produced stronger effects platform-wise than the quarantine.

• Core users suffered stronger effects than other users.

• Both interventions did well at reducing activity.

• At the same time however, both interventions produced an increase in toxicity.

• Both also caused affected users to share more polarized and less factual news.

Next, we critically discuss and compare our results to the current state of knowledge concerning a few important issues. Finally, we conclude this section by discussing the main limitations of our work, and by identifying valuable directions for future research and experimentation.

Longitudinal studies of the effects of moderation interventions are almost non-existent in literature. In fact, almost all existing studies make no distinction between short-term and long-term effects. Exceptions are Chandrasekharan et al. who raised attention to the importance of verifying «whether the subreddits that come out of quarantine maintain improved discourse over long periods of time or whether they return to incendiary behavior», although without providing results [11] . Instead, Jhaver et al. evaluated the effects of deplatforming influential Twitter users and claimed that the intervention had long-term positive consequences [35] . To this regard, our results suggest caution, as we found important differences between the short-term and long-term effects of quarantine and restriction. Such differences were striking in the case of toxicity levels within r/The_Donald after the quarantine. In that case, the strong initial positive effect started decaying a few days after the intervention, up to the point that 6 months later, toxicity had surpassed pre-intervention levels.

These results relate to the existing literature in a number of ways. Firstly, Chandrasekharan et al. pointed that particular attention should be devoted to monitoring toxic and hateful behaviors when quarantines are lifted [11] , implying that removing an intervention might revert the positive effect that it originally had. Here however, we observed a progressive reversal of the effect that initiated just a few days after the quarantine was enforced (not lifted). Our results thus reinforce and go beyond Chandrasekharan et al.'s concerns, suggesting that intervention effects should be continuously evaluated through time, and not only in the aftermath of major changes. In addition, our results about the temporal variations of intervention effects also suggest that more attention should be paid in future research to this issue. Therefore, in future studies it would be interesting to separately evaluate short-and long-term effects. Moreover, results of previous studies that do not make this distinction should be taken with care.

Our results also provide useful guidance for policing online platforms, in that platform administrators should continuously monitor the effects of their interventions and, in case, should reiterate them when the initial advantages have decayed. Some interventions should thus not only be seen as a one-off medicine, but rather as continuous or recurring treatments, depending on the need.

When a community suffers a moderation intervention, part of its members migrate to other communities or platforms. This long-studied behavior is inevitable when a community is permanently closed (i.e., when it is deplatformed) [12, 33, 38, 46] . However, it has been shown that also milder interventions might induce a subset of users to migrate [11] . When evaluating moderation interventions it is thus important to assess changes not only within the moderated community, but also in the other communities to which users affected by the intervention might have migrated. We call the latter type of changes spillover effects.

Chandrasekharan et al. evaluated spillover effects in the aftermath of the quarantines that affected r/The_Donald and r/TheRedPill [11] . We can thus compare our results about r/The_Donald's quarantine with those from [11] . Chandrasekharan et al. found evidence that the quarantine depressed the activity levels and the influx of new users in many other communities frequented by existing users of r/The_Donald. In other words, they found that not only did the quarantine reduce activity and participation within the moderated subreddit, but it also had a similar effect in other related subreddits. Our analysis confirms these results. Then, based on their findings about user activity, Chandrasekharan et al. concluded that «quarantining r/The_Donald did not spread the infection to other parts of Reddit» [11] . Our new results about the toxicity and the quality of shared news provide additional evidence for evaluating this claim. In detail, we found that the quarantine had a small negative immediate effect, and a strong positive long-term effect, on the toxicity of core users of r/The_Donald when they commented in other subreddits. However, it also caused those users to share more low-quality information, i.e., more politically biased and less factual. Overall, our results paint a more complex and nuanced picture of the effects that quarantining r/The_Donald had on other parts of Reddit. In conclusion, our results call for additional efforts at assessing spillover effects of moderation interventions, especially for mild interventions that do not involve deplatforming.

Our work, as well as the majority of existing studies on the effectiveness of moderation interventions, investigated Reddit's quarantines, restrictions, and bans [11, 12, 33, 55] . All three interventions have different goals: quarantines deter interaction, restrictions limit content creation, and bans permanently shut subreddits. Still, it is worth noting that while their mechanics directly affect user activity and participation, they do not directly impact other aspects, such as the habits of the moderated community or the way in which the users express themselves. As such, it is safe to assume that all three aforementioned interventions were primarily designed to reduce user activity in problematic subreddits. For this specific objective (i.e., reducing activity), all existing studies support the efficacy of Reddit's interventions [11, 12, 33, 55] and our new results strongly confirm these findings.

However, a single intervention is capable of affecting multiple dimensions of user behavior [39] , and even interventions designed with a specific objective (e.g., reducing activity) might cause a number of side effects. To this end, our results provide evidence that despite achieving the desired objective of reducing the activity of problematic users within and outside of r/The_Donald, the sequence of interventions also had the undesired side effects of increasing the toxicity of such problematic users and of reducing the quality of the news shared and consumed by them (i.e., shared articles became more polarized and less factual). In previous literature, we found mixed support for our findings and many contrasting results. Chandrasekharan et al. evaluated the effects of banning two hateful subreddits, concluding that former members greatly reduced their hate speech and did not cause an increase in hate speech in the subreddits to which they migrated afterward [12] . In a subsequent study on two quarantine interventions [11] , some of the authors from the previous study obtained different results, finding that users in quarantined subreddits did not change their use of toxic terms. They concluded that the quarantine did not serve the goal of reducing offensive posts in the moderated community since it left this dimension essentially unaffected. Horta Ribeiro et al. reached yet another conclusion, finding a rise in toxicity following the restriction of two subreddits [33] .

Our results are aligned with the findings by Horta Ribeiro et al. [33] , since we both measured a significant surge in toxicity following Reddit's interventions. At the same time however, our results question their interpretation of these findings. Horta Ribeiro et al. explained their results in terms of the affordances of social platforms, concluding that the rise in toxicity observed when users migrated to fringe and unregulated platforms could be the «consequence of the removal of platform moderation» [33] . However, in our study we measured a similar rise in toxicity without any reduction in platform moderation, since we studied users who remained on the same platform. If anything, our users even faced stronger moderation as they migrated to other subreddits with stricter content policies and rules. An alternative explanation was proposed by Chandrasekharan et al. in the context of Reddit's quarantines [11] . They postulated that quarantines could increase the insularity [1] (i.e., polarization and radicalization) of moderated communities and push users toward more extreme positions. Our results seem to confirm this interpretation, since we measured increases in both polarization and toxicity after Reddit quarantined and restricted r/The_Donald.

The above discussion provides theoretical contributions to the understanding of the effects of moderation interventions. In addition to these, our results also suggest some practical research and policing guidelines. The presence of a multitude of consequences following a moderation intervention mandates care in drawing conclusions on an intervention's efficacy. In particular, future studies should carry out nuanced analyses to assess effects across many behavioral dimensions. In the context of health interventions -a field from which the study of online moderation interventions inherits many characteristics-patients that experience side effects of a treatment can provide feedback to their doctors. Here, we lack this valuable feedback and must therefore pay extraordinary attention to the possible unintended consequences of moderation interventions.

To date, all studies that evaluated the effects of Reddit's interventions found evidence for their effectiveness at reducing the activity of problematic users. Hence, many studies concluded that such interventions had largely positive effects [12, 33, 35, 55] .

However, the effectiveness of such interventions depends on the objective that Reddit administrators had when they enforced them, which raises a conundrum. On the one hand, Reddit administrators stated that «one of the primary goals of quarantining is to compel users to rethink their behavior and reduce offensive posts» [11] , which suggests that their objective was related to reducing the toxicity of problematic users. On the other hand however, we noted ( §6.3) that the mechanics of Reddit's interventions are such that they only directly affect user activity, rather than their toxicity. This reflection surfaces a discrepancy between the motivations stated by Reddit administrators and the moderation interventions that they deployed. Interestingly, the existence of this conceptual discrepancy explains the practical results of our study, where we measured a strong reduction in activity, counterbalanced by an overall increase in toxicity and in the sharing of biased news. More importantly, the discrepancy also suggests that current community-level interventions such as quarantines, restrictions, and bans might be misdirected and hence ineffective, or at the very least inefficient, at supporting platform's objectives.

In addition to the above misdirection issue, Reddit's community-level interventions might cause a second problem. Indeed, interventions that reduce activity cause user migrations and, depending on the scope and severity of the intervention, a large number of users might decide to migrate to alternative platforms [33] . Platforms might therefore be disincentivized to apply such interventions, for fear of losing users and revenues [9] . As the only "solution" to this issue, current literature appealed to platform's ethical principles, underlining the importance of pursuing the goal of platform moderation even at the cost of reduced revenues [35] .

The two aforementioned problems highlight some of the limitations of community-level interventions, which appear to be unfit and inappropriate for the current moderation needs of online platforms. For the future, it would thus be advisable to design and deploy nuanced interventions that are more in focus with the objectives of online moderation. The need for better and alternative interventions is also testified by the current interest in soft moderation interventions [65] , as opposed to hard interventions like deplatforming. Designing nuanced interventions also relates to Kiesler et al.'s theory of graduated sanctions [37] . The latter could contribute not only at making platforms more accountable for their moderation [11] , but also at having more targeted interventions, capable of reducing negative side effects such as user migration and loss of revenues. The design and evaluation of targeted and nuanced interventions thus represents fertile ground for future research and experimentation on online moderation.

Since the 2016 Donald Trump presidential win and the unexpected outcome of the UK Brexit referendum, social media platforms have been facing a tremendous pressure to take action against issues such as misinformation and online misbehavior. Recently, the pressure heightened even more as a consequence of other dramatic events such as the George Floyd protests and the emergence of the COVID-19 infodemic. As a result of those events the platforms responded to the growing pressure by hastily enforcing a number of interventions. As notable examples within the scope of our study, Reddit issued several changes to its policies, including those addressing the George Floyd protests murder 10 that eventually led to the ban of r/The_Donald 11 , which was by then already inactive; thus, this last intervention could be seen as a symbolic gesture.

Despite appearing as reasonable solutions and serving as public evidence of the platforms' willingness to tackle the issues they contributed to create, many of such interventions were devised and applied light-mindedly. Post-hoc scientific analyses revealed that many interventions proposed in recent years produced mixed effects or no effects at all [10, 11, 13, 33] , and some even exacerbated the problems they were trying to solve [2, 23, 50] . Pennycook and Rand concluded a 2020 op-ed 12 on The New York Times remarking that moderation interventions should «not just rely on common sense or intuition» but that should instead be «empirically grounded». In consideration of our mixed results on the effectiveness of the sequence of interventions on r/The_Donald, we reiterate this recommendation to balance the timeliness of an intervention with its empirical soundness, motivating present and future research along this important direction.

6.6 Limitations and Future Work 6.6.1 Selection Bias. On the one hand, our choice of focusing on the sequence of moderation interventions that targeted r/The_Donald allowed us to thoroughly analyze the former most prominent political community on Reddit. Furthermore, it allowed us to compare and discuss our results with those of several other existing studies that were so far disconnected. For example, we compared our results about the effectiveness of the quarantine and the restriction with the results from [11] and [33] , respectively. We also compared our results with those of previous works that studied other deplatforming interventions [12, 35, 55] . Our work thus contributed to reconcile many of the existing studies on the topic. On the other hand, our analyses are solely focused on the moderation interventions enforced to r/The_Donald and, as such, might suffer from selection bias and lack of generality. Therefore, more research is needed in order to verify whether our findings hold also for other communities of users, which could react differently to Reddit's interventions. To this end, our study provides useful guidelines for future research aimed at assessing more precisely the consequences of online moderation strategies.

Causes. An inherent limitation when estimating causal effects with observational data is the susceptibility of the measured quantities to possible exogenous causes (i.e., confounders). Our study, as well as many others [11, 12, 33, 35] , is affected by this limitation. In fact, our analysis did not explicitly consider possible exogenous events that might have contributed to cause some of the effects that we measured. The most noteworthy of such events is the murder of George Floyd, occurred on May 25, 2020, which lays within our post-restriction time window. As such, results about the effects of the restriction might be influenced by this event, as visible in Figure 9 and as already noted in [33] . To circumvent this limitation, we repeated the ITS analysis about toxicity changes caused by the restriction, by only considering data up to the killing of George Floyd. The results of this additional analysis confirm our initial findings, in that the restriction caused a marked increase in toxicity. In other words, our conclusions are robust to this exogenous event. Nevertheless, for the future it is advisable to adopt causal inference methodologies that allow to account for, at least, straightforward and known exogenous causes. To this regard, the BSTS method that we adopted to provide our quantitative results admits the possibility to model known exogenous events [8] , and thus represents a favorable methodology for future causal analyses on the effects of online moderation interventions. 6.6.3 Multidimensionality of Effects. When elaborating on the possible consequences of the exposure to fake news, Lazer et al. concluded that they might have "many potential pathways of influence, from increasing cynicism and apathy to encouraging extremism", in addition to the obvious consequence of affecting political preferences [39] . In other words, fake news might have many and diverse effects spread across multiple dimensions of user behavior and ideology. The same can be said about the interventions that the platforms put in place to contrast fake news and the other ailments that affect online social spaces. Driven by this consideration, in this work we carried out a nuanced analysis and we moved beyond existing studies that almost solely investigated changes in activity and toxicity (or hate speech) [11, 12, 33, 35, 55] . We did this by including several important dimensions and metrics that were so far overlooked. Among them is the analysis of the quality of shared news articles, which in turn provides insights into the degree of political polarization and factual reporting of the news consumed by the analyzed communities, and the group community proclivity. Nonetheless, the interventions investigated in our work might have affected many additional dimensions of user behavior and ideology. For example, there might have been changes in user stances toward certain relevant (e.g., political) topics, in their trust of authoritative institutions, and in other emotional, social, and psychological dimensions. Because of this, our results likely provide only a partial view of the full extent of effects caused by the interventions issued on r/The_Donald. For the future it is thus important to progressively make measurable a larger set of dimensions and to investigate intervention effects also in these dimensions.

6.6.4 User-level Effects. Our study considers effects aggregated at the community-level, like most of the works on the same subject [11, 12, 33, 35] . Indeed, aggregated community-level effects are perhaps the most natural and direct way to evaluate community-level interventions (e.g., ban of an entire community). Nonetheless, a community-level post-intervention effect is the combination of many and potentially diverse user-level effects. Hence, the aggregated community-level effect might be weakly representative of the underlying behavior of individuals or smaller user groups. For example, Saleem and Ruths measured user-level changes in comment activity and subreddit participation that resulted from the ban of two problematic communities on Reddit [55] , showing very diverse user-level effects. Based on these considerations, it would be interesting to evaluate user-level effects for a larger set of moderation interventions. In particular, for future work we aim at assessing whether certain community-level effects are the result of homogeneous or heterogeneous user behavior. In the latter case, it would be interesting to investigate which user-level characteristics determine effect diversity, and thus study the possibility of preemptively identifying users likely to significantly deviate from intervention expectations. In either case, understanding effects at the user-level might increase our chances to design and deploy more effective interventions.

We carried out a multidimensional causal analysis of the sequence of moderation interventions enforced on r/The_Donald. Our results paint a nuanced picture of the effects of such interventions and support the following take-away messages: (i) the restriction produced stronger effects platformwise than the quarantine, (ii) core users of r/The_Donald suffered stronger effects than other users, (iii) both the quarantine and the restriction significantly reduced user activity, however (iv) both also caused an increase in toxicity and (v) caused users to share more polarized and less factual news. We conclude that the sequence of interventions had mixed effects. For the future, it will be important to advance the understanding and the development of moderation interventions, so as to obtain tools capable of achieving the objectives of online moderation with minimal side effects.

Communal quirks and circlejerks: A taxonomy of processes contributing to insularity in online communities

Exposure to opposing views on social media can increase political polarization

The Pushshift Reddit dataset

Network propaganda: Manipulation, disinformation, and radicalization in American politics

Study: Breitbart-led right-wing media ecosystem altered broader media agenda

Interrupted time series regression for the evaluation of public health interventions: a tutorial

Influence of fake news in Twitter during the 2016 US presidential election

Inferring causal impact using Bayesian structural time-series models

Are You Sure You Want to View This Community? Exploring the Ethics of Reddit's Quarantine Practice

#thyghgapp: Instagram content moderation and lexical variation in pro-eating disorder communities

Quarantined! Examining the effects of a community-wide moderation intervention on Reddit

You can't stay here: The efficacy of Reddit's 2015 ban examined through hate speech

How community feedback shapes user behavior

2020. The COVID-19 social media infodemic

Reddit quarantined: Can changing platform affordances reduce hateful material online?

Design frictions for mindful interactions: The case for microboundaries

A decade of social bot detection

A survey on computational propaganda detection

Automated hate speech detection and the problem of offensive language

2021. r/WatchRedditDie and the politics of Reddit's bans and quarantines

A generalized co-hits algorithm and its application to bipartite graphs

New Dimensions of Information Warfare

Emphasizing publishers does not effectively reduce susceptibility to misinformation on social media

2020. Misinformation, manipulation and abuse on social media in the era of COVID-19

Mobilizing the Trump train: Understanding collective action in a political trolling community

How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?

Large scale crowdsourcing and characterization of Twitter abusive behavior

Reducing controversy by connecting opposing views

Upvoting extremism: Collective identity formation and the extreme right on Reddit

It takes a village to manipulate the media: Coordinated link sharing behavior during 2018 and 2019 Italian elections. Information

Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media

Identifying key users in online social networks: A pagerank based approach

Do platform migrations compromise content moderation? Evidence from r/The_Donald and r/Incels

Forecasting: principles and practice

Evaluating the effectiveness of deplatforming as a moderation strategy on Twitter

Populist supporters on Reddit: A comparison of content and behavioral patterns within publics of supporters of Donald Trump and Hillary Clinton

Building successful online communities: Evidence-based social design

Understanding user migration patterns in social media

The science of fake news

Setting the Record Straighter on Shadow Banning

Political Polarization and Platform Migration: A Study of Parler and Twitter Usage by United States of America Congress Members

Causal impact analysis for app releases in Google Play

Roots of Trumpism: Homophily and social feedback in Donald Trump support on Reddit

Cleaning up social media: The effect of warning labels on likelihood of sharing false news on Facebook

And We Will Fight for Our Race!" A Measurement Study of Genetic Testing Conversations on Reddit and 4chan

User migration in online social networks: A case study on Reddit during a period of community unrest

Coordinated behavior on social media in 2019 UK General Election

Savvas Zannettou, and Gianluca Stringhini. 2021. Soros, Child Sacrifices, and 5G: Understanding the Spread of Conspiracy Theories on Web Communities

Nowcasting unemployment rates with google searches: Evidence from the visegrad group countries

The implied truth effect: Attaching warnings to a subset of fake news headlines increases perceived accuracy of headlines without warnings

Quick, community-specific learning: How distinctive toxicity norms are maintained in political subreddits

The opinion-mobilizing effect of social protest against police violence: Evidence from the 2020 George Floyd protests

The fabrics of machine moderation: Studying the technical, normative, and organizational structure of Perspective API

Assessing the extent and types of hate speech in fringe communities: A case study of alt-right communities on 8chan, 4chan, and Reddit

The aftermath of disbanding an online hateful community

Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: a guide for evaluating large-scale health interventions

Predicting the present with Bayesian structural time series

Misinformation Warnings: Twitter's Soft Moderation Effects on COVID-19 Vaccine Belief Echoes

Gaming Reddit's algorithm: r/The_Donald, amplification, and the rhetoric of sorting

A characterization of political communities on Reddit

Picking the winner(s): Forecasting elections in multiparty systems

Information disorder: Toward an interdisciplinary framework for research and policy making

A similarity measure for indefinite rankings

An exploration of submissions and discussions in social news: Mining collective intelligence of Reddit

I Won the Election!": An Empirical Analysis of Soft Moderation Interventions on Twitter

The Web centipede: Understanding how Web communities influence each other through the lens of mainstream and alternative news sources

Who let the trolls out? Towards understanding state-sponsored trolls

A quantitative approach to understanding online antisemitism