key: cord-0273278-7g344c6z
authors: Cresci, Stefano; Trujillo, Amaury; Fagni, Tiziano
title: Personalized Interventions for Online Moderation
date: 2022-05-19
journal: nan
DOI: 10.1145/3511095.3536369
sha: e3c023629e86aabb75469684f87732eb902a8f7f
doc_id: 273278
cord_uid: 7g344c6z

Current online moderation follows a one-size-fits-all approach, where each intervention is applied in the same way to all users. This naive approach is challenged by established socio-behavioral theories and by recent empirical results that showed the limited effectiveness of such interventions. We propose a paradigm-shift in online moderation by moving towards a personalized and user-centered approach. Our multidisciplinary vision combines state-of-the-art theories and practices in diverse fields such as computer science, sociology and psychology, to design personalized moderation interventions (PMIs). In outlining the path leading to the next-generation of moderation interventions, we also discuss the most prominent challenges introduced by such a disruptive change.

Nowadays, social media play a pivotal role in shaping public opinion. Online users constantly read, create, and share an ever-growing amount of content, with social media now overtaking the more traditional media, especially among the younger generations. On the one hand, this paradigm-shift created new opportunities for civic engagement and for democratizing access to information [25] . On the other hand, this freedom also gave rise to multiple online harms, such as misinformation, polarization, toxic and hateful speech [5] . The consequences of such harms are not limited to online platforms, but also affect the offline world, as demonstrated by recent political riots [20] and by the decreased confidence in vaccines [17] .

For this reason, since the 2016 Donald Trump presidential win and the UK Brexit referendum, platforms have been facing a tremendous public and governmental pressure to take action against online harms. Recent dramatic events such as the COVID-19 infodemic and the Russian-Ukrainian conflict increased such pressure even more. Platforms responded to the growing pressure by hastily deploying a number of moderation interventions -actions taken to enforce content policies and rules. For example, Twitter, Facebook and Instagram attached warning labels to disputed posts [28] and banned users, groups, and pages that misbehaved [13] . Pinterest blocked search results for anti-vaccination queries and Reddit quarantined and banned toxic communities [11, 22, 24] . However, despite appearing as reasonable solutions and serving as public evidence of the platforms' willingness to tackle the issues they contributed to create, these interventions Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). were designed and applied light-mindedly. Recent studies measured limited or no effects at all [3, 11, 24] , and showed that some interventions even exacerbated the very issues they aimed to solve [1, 6, 21, 29] . Overall, the design of moderation interventions received limited scholarly attention and progress was sought by trial-and-error rather than by a rigorous scientific approach. Pennycook & Rand concluded an op-ed 1 on The New York Times remarking that moderation interventions should «not just rely on common sense or intuition» but that should instead be «empirically grounded». Science-based online moderation is still in its infancy.

Until now, online moderation always followed a one-size-fits-all approach, where each intervention was applied in the same way for all users. As a recent example, when Twitter sent warning messages to dissuade users from posting toxic tweets, all users received the exact same message [15] . However, results on user reactions to moderation interventions showed that different users react in different ways to the same intervention, according to their individual characteristics [22, 24] . Theories from the social, psychological, and behavioral sciences support these empirical results [7] and posit that the efficacy of interventions depends on individual and contextual characteristics [27] . In line with this literature, but in contrast to the platforms' objectives, the interventions recently applied by Twitter and Reddit against toxic users caused a subset of users to become even more toxic and radicalized [11, 15, 24] .

Overall, the existing literature across multiple disciplines opposes the current one-size-fits-all approach to online moderation and instead suggests that interventions should be tailored to the individual characteristics of the users. In other words, the current naïve approach to online moderation neglects individual differences and the advantages of personalization. Indeed, personalization has already proved valuable in several online domains, such as advertising, music and video streaming services, and the improvement of health-related behavior via apps [12, 26] . By taking inspiration from medicine -a field with which online moderation shares many commonalities-we observe that current generic interventions are deployed as a sort of universal cure to treat the ailments of all online users, instead of following the virtuous examples of personalized medicine where patients affected by a disease receive personalized treatment based on their condition, individual characteristics, environment, and behavior. 

Times are ripe for a disruptive change in the approach to online moderation. Motivated by the aforementioned empirical results showing the inadequacy of current generic interventions, the socio-behavioral theories that support the application of personalized interventions, and the proven benefits of personalization in many domains, we envision a paradigm-shift towards personalized moderation interventions (PMIs). As depicted in Figure 1 , PMIs are tailored to the individual characteristics of the users to which they are applied, which allows maximizing the desired effects of the moderation, while minimizing the possible undesired side effects. To fulfill the vision of PMIs, existing knowledge and methods from several disciplines and scientific communities must be combined in order to produce new knowledge and the methods needed to develop PMIs. As illustrated in Figure 2 , among the scientific areas mostly involved in this process are computer science, sociology, psychology and statistics. In the following, we outline the contributions that each of these areas can provide to the design and development of PMIs.

The most effective way to induce a behavioral change in a user -the goal of moderation interventions-depends on the personal characteristics and context of the user itself [7] . To this regard, sociology and psychology can provide the theories and knowledge needed to profile users and design persuasive PMIs. Each user could be described in terms of its social and personality profile [14] , including its social vulnerabilities [27] . The former characteristics are linked to the emergence of online misbehavior [16] , while the latter are linked to the effectiveness of a given intervention for a given user [2, 10] . Knowledge of such individual characteristics is instrumental in the development of effective PMIs.

Then, successful personalization requires accurate user models, powerful analytics and a significant degree of automation to reach scalability. In this regard, many areas of computer science can provide massive contributions to the development of PMIs. For example, the application of theories and practices from human-computer interaction (HCI) can drive the process of user modeling. In addition, HCI can contribute to design the way in which moderation interventions are tailored and presented to the users [4] , while also leveraging machine and deep learning (ML/DL) techniques for obtaining accurate and scalable user representations. For instance, ML/DL can be leveraged to infer personal characteristics of the users based on their publicly available data, such as account information, posting or browsing history, and online social relationships [19] . Furthermore, ML/DL can also be profitably applied to design multimodal interventions that combine personalized counter-narratives, generated with natural language processing (NLP) techniques, and images [23] . In addition, PMIs require to match each user with a favorable intervention. Again, this step can be carried out by resorting to ML/DL techniques and by introducing a new task: estimating the most effective moderation intervention for any given user. The novelty of the task mandates the development of new datasets, sensible baselines, and novel methodologies. At the same time however, its similarity with traditional tasks in the area of recommender systems implies that established techniques in that area will likely represent good initial solutions also for the development of PMIs [18] .

Finally, validating PMIs and assessing their improvement over generic interventions mandates to go beyond mere correlations and associations, by following rigorous causal inference approaches. These could be applied to draw conclusions from survey experiments, or from simulations and field experiments on real online platforms. Recently, many of such statistical techniques have been successfully adopted to estimate the effects of generic interventions, such as those techniques designed to detect causal effects in time series data resulting from a given event [3, 11, 24] . In the future, the same, or similar, techniques could be adapted to evaluate the effects of PMIs.

In addition to cleverly combining existing knowledge and to developing new one, PMIs will also mandate solving a number of open challenges. First and foremost, deploying PMIs entails solving a number of ethical challenges regarding the use of personal data for user modeling, the right to explanation, and the fairness of automated moderation mechanisms, which must scale in line with sound theory from social and personality psychology. Overcoming such challenges will involve developing agreed-upon ethical standards as well as adopting privacy-preserving computational techniques (e.g., federated learning, anonymization). Motivating the interventions and guaranteeing fair and unbiased decisions will also require the adoption of best practices in explainable and fair recommendation, opening up the opportunity to improve platform transparency and accountability, two areas in which online platforms are being harshly criticized. Some technical and methodological challenges are also limiting our capacity to accurately estimate the effects of moderation interventions, such as the difficulty at accounting for confounders and possible exogenous causes [11, 24] .

In addition, validating new interventions, such as PMIs, involves performing extensive field experiments on online platforms, which typically cannot be carried out without the participation of the platforms themselves, thus more efforts are needed to strengthen collaborations between these and scholars.

Finally, online moderation also carries important philosophical challenges. The present proposal embodies the vision to develop the theoretical and technological tools that will enable PMIs. However, the effectiveness of online moderation depends only in part on the availability of powerful and accurate technological tools. Most importantly, it depends on the strategic goals and the regulatory context of the platforms that perform (or not) the moderation. As remarked by Gayo-Avello, the paramount goal of online platforms is to «commoditize and monetize individual communication» and their commitment to scientifically-sound and thorough moderation can only exist to the extent that it does not «affect their investors or the laws under which they operate» [8] . Furthermore, the effects that PMIs will have on the safety and reliability of online platforms will depend on the use that humans will make of them, as it always happens when new technologies are introduced. As such, some actors might use PMIs to manipulate, rather than to persuade, or to censor and silence, rather than to support plurality of opinions and free speech [9] . Overcoming some of these challenges will probably require social, cultural and regulatory changes, in addition to mere technological advancement.

Personalized moderation interventions (PMIs) promise to transform online moderation by shifting from a coarsegrained, platform-centered approach to a fine-grained, user-centered one. By taking into account the peculiar traits and individual characteristics of the users, PMIs will enable nuanced and effective interventions. Despite this promising outlook, the challenges along this research direction are manifold. Solving them will require combined endeavors from multiple interrelated scientific communities, that we call to join the effort.

Exposure to opposing views on social media can increase political polarization

Artificial intelligence against hate: Intervention reducing verbal aggression in the social network environment

Quarantined! Examining the effects of a community-wide moderation intervention on Reddit

Design frictions for mindful interactions: The case for microboundaries

New Dimensions of Information Warfare

Emphasizing publishers does not effectively reduce susceptibility to misinformation on social media

Study conspiracy theories with compassion

Social media, democracy, and democratization

Social media won't free us

Empathy-based counterspeech can reduce racist hate speech in a social media field experiment

Do platform migrations compromise content moderation? Evidence from r/The_Donald and r/Incels

Music personalization at Spotify

Evaluating the effectiveness of deplatforming as a moderation strategy on Twitter

Konstantinos Kafetsios, Ilias Dimitriadis, and Athena Vakali. 2022. My tweets bring all the traits to the yard: Predicting personality and relational traits in Online Social Networks

Reconsidering Tweets: Intervening During Tweet Creation Decreases Offensive Content

I did it for the LULZ': How the dark personality predicts online disinhibition and aggressive online behavior in adolescence

Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA

Learning to Rank in Theory and Practice: From Gradient Boosting to Neural Networks and Unbiased Learning

Recent trends in deep learning based personality detection

The long fuse: Misinformation and the 2020 election

The implied truth effect: Attaching warnings to a subset of fake news headlines increases perceived accuracy of headlines without warnings

The aftermath of disbanding an online hateful community

Generating Counter Narratives against Online Hate Speech: Data and Strategies

Make Reddit Great Again: Assessing Community Effects of Moderation Interventions on r/The_Donald

From liberation to turmoil: Social media and democracy

Just-in-the-moment adaptive interventions (JITAI): A meta-analytical review

Individual differences in susceptibility to online influence: A theoretical review

I Won the Election!": An Empirical Analysis of Soft Moderation Interventions on Twitter

Debunking in a world of tribes