key: cord-0212597-465shrqv authors: Sharevski, Filipo; Devine, Amy; Pieroni, Emma; Jacnim, Peter title: Meaningful Context, a Red Flag, or Both? Users' Preferences for Enhanced Misinformation Warnings on Twitter date: 2022-05-02 journal: nan DOI: nan sha: 8500b01cbdb3c92b83cba6c4a4f6bc5eb46419b4 doc_id: 212597 cord_uid: 465shrqv Warning users about misinformation on social media is not a simple usability task. Soft moderation has to balance between debunking falsehoods and avoiding moderation bias while preserving the social media consumption flow. Platforms thus employ minimally distinguishable warning tags with generic text under a suspected misinformation content. This approach resulted in an unfavorable outcome where the warnings"backfired"and users believed the misinformation more, not less. In response, we developed enhancements to the misinformation warnings where users are advised on the context of the information hazard and exposed to standard warning iconography. We ran an A/B evaluation with the Twitter's original warning tags in a 337 participant usability study. The majority of the participants preferred the enhancements as a nudge toward recognizing and avoiding misinformation. The enhanced warning tags were most favored by the politically left-leaning and to a lesser degree moderate participants, but they also appealed to roughly a third of the right-leaning participants. The education level was the only demographic factor shaping participants' preferences. We use our findings to propose user-tailored improvements in the soft moderation of misinformation on social media. Warnings and secure user behavior seems to have a perennially fraught relationship, despite the rich mediation of usability [2, 16] , interaction/visual design [19] , and behavioral insights [59, 81] . It is understandable that the complexity of this problem requires patience and eventual alignment between the security literacy of the average user and the pace with which new security hazards are introduced into users' daily life [17, 28] . Usable security has, for one, made noticeable advancements of warnings that users do actually heed in conformance with the security recommendations: avoiding phishy websites and questionable attachments [57] , skipping unencrypted communication [75] , warming up to multi-factor authentication [40] , and following up on system updates [43] . Advancements such as adaptive strategies for getting accustomed to warnings and security advice also help users transition to an acceptable secure behavior [22, 29] . What actually is a bit difficult to understand is why, despite these advancements in usable security, warnings about misinformation on social media have made little progress in fostering desirable security behavior [69] . One could argue that the nature of the security hazard differs between the two settings -traditional programmatic security is far more complex to grasp than picking up on a causal post that links the COVID-19 vaccines with infertility -and that makes designing misinformation warnings an entirely different challenge. True, the one-size-fits-all here won't work because yesterday were the elections [32, 84] , today is COVID-19 and QAnon [6, 49] , and who knows what alternative narratives will emerge tomorrow. Embracing this predicament as a challenge in a usable security context has been sporadic so far, with the focus largely placed on mapping the "sources of misinformation" [13] . Misinformation sources won't go away. They existed long ago, learned to adapt and thrive in new information environments, and so long as the Internet evolves, they will too [58] . In the context of social media, these sources generate misinformation content that includes all false or inaccurate information such as: disinformation, fake news, rumors, conspiracy theories, hoaxes, trolling, urban legends, and spam [83] . It took some time for mainstream platforms to acknowledge that they have a serious problem on hands when misinformation started piling up [45] . They responded with warnings which conformed to the aesthetics of their interfaces and with language presumably appearing as unbiased and non-judgmental to users with diverse perspectives [72] . But this so-called "soft moderation" was applied halfheartedly, turning the warnings into hazards themselves-users started believing the misinformation more, not less, when a warning was explicitly appended to it [10, 50] . The design of the warnings, thus, requires adaptation of the approach to retain their usability in various misinformation scenarios while avoiding a "backfire effect" [71] . To help this effort, we developed enhancements to the misinformation warning tags used by Twitter and evaluated them with a sample of 337 regular users. These enhancements address two elements that mainly contribute to the aforementioned predicament of soft moderation: meaningful context of the intentionally spread misinformation [83] and sufficiently potent interruption of the regular social media consumption flow [15, 23] . Therefore, we formulated the warnings' text to fit the scenario surrounding a misinformation tweet and introduced red flag watermarks as a characteristic iconography of the visual frictions that users encounter in every aspect of their daily life [11] . The results of an A/B evaluation study with the warning tags currently employed on Twitter show that the majority of users do welcome the usable security enhancements. The added meaningful context was praised in helping participants avoid, ignore, and skip misinformation "right away. " The red flag watermarks were lauded for their "attention-grabbing" effect. Expectantly, there were also groups of users that leveraged this opportunity to express their protest against soft moderation as a way of forceful oppositionopinion-forming by Twitter as a self-appointed truth authority. Therefore, we analyzed the sentiment the warning tags incited and found that the enhancements did tilt the overall sentiment toward more positive from the status quo of the original Twitter warnings. Sentiment often reflects users' political leanings and is shaped by the structure of their demographic identity [38, 74] . Our results suggest that the left-leaning participants overwhelmingly welcomed the meaningful context, and the moderates and rightleaning joined them in lauding both the context and the red flag watermarks. Users' age, gender, and race/ethnicity did not factor in the sentiment in a significant measure; only the education level did. While users with either a high school education/GED or a college diploma were evenly split in preferring the original warnings and their enhanced counterparts, the users with some college education overwhelmingly preferred the latter. All but one of the users with a post-graduate level of education were entirely in preference for both the context and the red flag watermarks. Scope and contribution of this work. With this work we aim to materialize the wealth of usable security cues, nudges, and advises in a social media environment where towards curbing misinformation. Our contributions, respectively, are: The first A/B evaluation of enhanced social media warnings providing meaningful context and introducing visual design frictions in interacting with misinformation; Analysis of users' sentiment toward soft moderation in general and enhanced warning tags in particular from a political and demographic perspective; Basis and recommendations for user-tailored adaptations of soft moderation toward mindful and safe interaction on social media. Following this introduction, we delve into the current state of misinformation warnings on social media in Section 2. We then elaborate on our usable security enhancement approach in Section 4. Section 5 provides the results of our A/B evaluation study and sentiment analysis. We discuss the results in Section 6 and provide our recommendations for the future of soft moderation before we conclude the paper in Section 7. Warnings on social media usually come in two main forms: (i) interstitial covers which obscure the misinformation and require users to click through to see the information; or (ii) trustworthiness tags which appear under the content and do not interrupt the user or compel action [36] . The former are more suitable for sensitive content where the exposure to the hazards should be avoided in the first place and the latter are usually applied to disputed or unverified content where the decision whether it is or not of misinformation nature is left to the user. But the COVID-19 "infodemic" demanded all hands on deck for soft moderation, and so mainstream social media platforms applied both warning variants to warn users of misleading and harmful COVID-19 information [45, 62] . Evidence suggest that only the interstitial covers, but not the trustworthiness tags, make the users heed the warnings of misinformation [64, 65, 69, 84] . It is tempting to simply discard the trustworthiness tags and only use interstitial covers, however. The interstitial covers do require additional clicks to get to the content in question, which could make the users avoid the content, but leave with a feeling that the social media platform is overtly imposing, "biased, " "punitive" or "restrictive of free speech" [33, 64] . The trustworthiness tags might be more usable and mitigate the overt intrusion by blending with the visual aesthetic of the platforms interface (e.g. same colors, fonts, and obscure text), but they do run into other problems. Next to the "backfiring effect" [10, 15, 48, 74] , the tags could desensitize users to soft moderation when applied too frequently or create an "illusory truth effect" [51] . The absence of the tags in some scenarios might even create an 'implied truth effect" and lead users deem any misinformation content they encounter as credible and accurate [50] . Other factors also contribute to these negative effects, for example users' political affiliations and demographic identities. When trustworthiness tags directly challenged political falsehoods, they had the intended effect on Democrats but the opposite effect (e.g. they "backfired) on Republicans [74] . In the context of the COVID-19 pandemic, the tags resulted in a "belief echo, " manifested as skepticism of adequate COVID-19 immunization particularly among Republicans and Independents [69] . Usually older users with a level of education corresponding to higher analytical thinking succumb less to these negative effects [38] . Another factor is the asymmetrical nature of soft moderation-the mere exposure to misinformation often generates a strong and automatic affective response, but the warning itself may not generate a response of an equal and opposite magnitude [23] . This is because the trustworthiness tags often lack meaning, have ambiguous wording, or ask users to find context themselves which is cognitively demanding and time consuming [15] . Therefore, a natural step toward minimizing the said negative effects, is enhancing the trustworthiness tags to counter this asymmetry while keeping the appeal relevant for users of all ages, analytical prowess, and political leanings [38] . The trustworthiness tags applied by Twitter make an interesting case of usable security interventions. Appended under suspected misinformation, this brand of tags warns after a user is exposed to the potentially harmful content [62] . Choosing to warn a user after-the-fact goes somewhat against the practice of using warning screens in browsers that come before a user gets a chance to visit a questionable website [19] (this effect is achieved with the interstitial covers, but they are verbose and disruptive of the natural social media consumption flow [7] ). One could argue that the after-the-fact notification is chosen to counter "habituation", or the diminished response with repetitions of the same warning screens like these, or perhaps break the effect of "generalization" that might occur when habituation to these screens carries over to novel security interventions that look like the warning tags [79] . Seemingly designed to camouflage itself amongst the existing interface features, the warning tags are blue and not red in color, they do not obscure the suspected misinformation tweet, nor do they occur predictably like the warning screens every time an Internet browser cannot verify the visiting website's certificate (the tweets in question have to be fact checked, if not automatically flagged [32] ). Twitter's warning tags might compare to the lock icons at the beginning of an URL bar in a browser indicating a "secure" connection [56] . Besides the habituation and generalization, the lock icons are confusing and don't convey the threat to the users in the first place so proposals have been made to pair the usable security iconography with words when possible [20] . Thus, it seems reasonable to pair an exclamation mark with a generic short text for warning users about misinformation tweets. But both the icon and the text are colored in the specific Twitter blue and fail to provide contrast to attract user's attention like the lock icons do with either red for "insecure" or green for "secure" browsing (alternatively a display of a locked/broken golden lock or strike-through the word "HTTPS"). Deliberately avoiding contrast makes it easier for users to overlook, ignore, or simply mistrust the warning tags as honest security aids [40] . Perhaps pairing the generic warning text with a link to a Twittercurated page or external trusted source containing additional information on the claims made within a suspected tweet could compensate for the lack of contrast. Often with a one-liner, users are offered for example to "get the facts on the COVID-19 vaccine, " "learn why health officials say vaccines are safe for most people, " or "learn how the voting by mail is safe and secure" [62] . The Fear of Missing Out (FOMO) aside [4] , the warning text in fact advises users to contextualize the tweet themselves on the particular (mis)information topic. Users, unfortunately, rarely heed this advice and largely refuse to investigate any (mis)information further [24] . Security advice is not entirely anathema to users, particularly when it comes to their online security hygiene [55] . So it is not unreasonable to expect that users might heed the suggestion brought forth, brandished in a warning tag, if the advice itself provides a meaningful context for a particular topic of contention on Twitter without asking users to follow a link (which conflicts Twitter's own idea of curating "more accounts, and less links" in user's feeds [7] ). Balancing for comprehensibility, we developed enhanced warning tags that provide meaningful context in regards (1) fabricated facts; and (2) improbable interpretations of facts. The enhancement choice follows the misinformation front put forth by Twitter and allowed us to conduct an A/B usability evaluation with the current warning tags applied to misinformation hazard. The enhanced warnings, in their tag-only variant, incorporate catchy acronyms as frictions indented to grab users attention in the absence of contrast [11] . We paired the text-only warning tags with the hereto ignored usable security intervention when it comes to misinformation: red flags as watermarks over suspected misinformation tweets. The tag-and-watermark variant provided option for us to also test users' receptivity to warnings that incorporate contrast (red), gestalt iconography for general warnings (flag), and actionable advice for inspection (watermark). The choice of red flag was made after an extensive deliberation concerning warning design [12, 82] , warning cognition (automatic or System I; deliberate or System II) [44] , and user experience design [19] . We decided against a smaller red flag as smaller labeling symbols were ignored on social media, e.g. Facebook used a small red box on the left with an exclamation mark and was either ignored or users believe the flagged post more, not less [61] . We decided to use red and not other color flags because a "red flag" is a common signal of oncoming danger and requires users to switch from System I to System II of cognition. Green usually signals "no danger" while orange or yellow signal "caution" but are often processed by System I cognition [60] . Red also has the highest "perceived hazardousness" on the color palette [82] . The first text-only warning tag is shown in Figure 1a . We crafted a tweet, based on [37] , and tagged for fabricated facts and presented it under a generic name, username, and without a profile image to avoid any threat to the validity of our A/B evaluation. Instead of advising the users to "get the facts about the COVID-19 vaccine" [62] , we coined a catchy, yet familiar acronym: SPAM or Strange, Potentially Adverse Misinformation. With SPAM we wanted to see if we can contextualize the tweet's content, with an analogy to an already meaningful aspect of spam email, something most Twitter users have experience with [8] . We did break the one-liner rule for the warning text, but we opted for a minor engagement pain for a major gain in increased attention and warning adherence behavior. Our warning text following the SPAM acronym read: "If this was an email, this would have ended up in your spam folder." The overarching idea with the SPAM warning was to harness the "availability" and "recognition" heuristics characteristic captured in a Twitter flow [1, 47] . Misinformation and fabricated facts are not always spam or vice versa, but anyhow align on the actionable outcome: ignore, delete, or take it with a grain of salt [52] , which we argue is preferable compared to the "backfiring effect" of the generic warning tags [10, 69] . The upgraded SPAM warning tag with a 50% transparency red flag watermark over the entire tweet is shown in Figure 1b . The "upgrade" bolsters the warning tag context along the same lines of "availability" and "recognition" heuristics by invoking the wellknown analogy between red flags and calls for attention. We opted for a watermark and not a replacement of the exclamation point inline the warning tag to avoid confusion with the red flag emoji frequently used on social media. The watermarking, centered in a ratio over the entire tweet area, follows the paradigm for misinformation flagging proposed in [71] with a midpoint transparency to create a non-negligible design friction for anyone attempting to read the tweet. By this choice, we wanted to stretch the overall text-and-flag warning throughout the suspected misinformation tweet and not only after it. The second set of warning tags is shown in Figure 2a and Figure 2b for the text-only and text-and-flag variants, respectively. Here we crafted a tweet, based on [46] , containing an improbable interpretation of facts, keeping the engagement and posting structure in the similar order. In this case, we chose to provide a meaningful choice of context when tweets attempt to "spin" facts as a refined way of promulgating misinformation [18] . This practice, for example, earned Representative Marjorie Taylor Greene a permanent ban from Twitter [3] . Since we want to draw users' attention to such practices, we decided to ask whether they consider such tweets for For Facts' Sake or FFS, if not for anything else. We deliberately selected the acronym FFS to blend with the characteristic communication on Twitter that utilizes "compact language" due to the tweets' length restriction [86] . The FFS warning tag intended to provoke a pause in "recognition" heuristics since there are multiple meanings associated this acronym. We were aware that this might cause brief confusion, but nonetheless proceeded, since we wanted to explore if a brief confusion followed by contextual advice would suffice in refraining from taking the improbable interpretations of facts at face value. We utilized the growing evidence of "design frictions" purposefully created to disrupt mindless automatic interactions, prompting moments of reflection [11] . The brief confusion, promptly, is resolved by the following warning text advising users that "In this tweet, facts are missing, out of context, manipulated, or missing a source." To gauge the limits of the warnings-as-friction, the red flag watermark provides another stimulus to capitalize on by seeing what works as a resolution against the questionable content: incomplete factual presentation [80] , lack of contextual consistency [34] , overt factual manipulation [68] , or obscure factual provenance [30] . The evaluation of the enhanced warning tags was intended to gauge a preferential approach to soft moderation as well as understand the underpinning reasoning for it's acceptance (or lack thereof). A/B testing is a regular practice in usable security studies that informs the design of interface affordances, cues, and frictions [25, 63, 67] . Building on the exposure to contextual warning tags, a qualitative inquiry of how they fare in the misinformation front is important because the soft moderation employed by social media in general, and Twitter in particular, so far has yielded far from desirable results [39] . Users' often materialize their identity and political personas within social media and Twitter discourse [26, 70] , therefore we also investigated how this materialization shapes the preferences for our proposed soft moderation nudges. Based on this argumentation, the resulting research questions were: • RQ1: What are the preferences of Twitter users for the SPAM and the FFS enhanced misinformation warning tags in both the text-only and text-and-flag variants? • RQ2: How effective are the SPAM and the FFS, enhancements in dispelling fabricated facts and improbable interpretations of facts? • RQ3: What is the relationship between the Twitter users' preferences for the enhanced misinformation warning tags and users' political leanings? • RQ4: What is the relationship between the Twitter users' preferences for the enhanced misinformation warning tags and users' demographic identities (race/ethnicity, level of education, gender identity, age)? Our study was approved by our Institutional Review Board (IRB) before any research activities began. Subsequently we set to sample a population that was 18 years or above old, regular Twitter users from the United States through the Amazon Mechanical Turk. Both reputation and attention checks were included to prevent input from bots and poor responses. The survey took around 20 minutes and participants were compensated with the standard participation rate ($18 per hour). Participants were randomly assigned to either the A/B evaluation of the text-only or text-and-flag enhanced warning variants. The survey was anonymous and allowed users to skip any question they were uncomfortable answering. We refrained from exposing the participants to similar stimuli to prevent from generalization and obtain a direct comparison to the original warning tags on Twitter. We also randomized the order of each of the SPAM and FFS text-only and text-and-flag segments for each participant. We selected the content of the tweets to be of relevance to the participants so they could meaningfully engage with the tweet's content and see a clear relationship between the tweet and the warning tag (i.e. to prevent arbitrary and irrelevant responses). The two COVID-19 related tweets represent the main target of soft moderation front by Twitter during the execution of the study [November 2021 -January 2022] [76] . We selected one misleading tweet by Nate Silver [46] , and wrote a second one based on a common piece of vaccine misinformation [37] . To account for accessibility, we provided alternative text describing each of the tweets and interventions we used to avoid visual misinterpretation. We assumed participants understood the Twitter interface, the tweets, and the warning tags. Participants first indicated the reasons they usually come to Twitter for. Next, each participant was asked to indicate if they encountered warning tags and what were the content and the warnings about. We were aware that not every participant might have been exposed to warning tags so we included a small training segment where we created exposure to the concept of soft moderation with generic warning tags. The pre-exposure training, shown in the Appendix, was used to ensure a baseline understanding of content moderation among the participants, i.e. that Twitter uses content indicators for various types of contents (misinformation, sensitive content, graphic content, etc.). The training was general and referred to only "content indicators" without any references to "misinformation to avoid any potential impact on user responses. Participants then were asked to evaluate each of the enhancements in comparison to the original tag ("Get the facts about COVID-19") [62] . Participants were next asked if seeing an enhanced warning tag would influence their dismissal of the tweet or tweets on the same contested topic as misinformation. Finally we collected participants' political leanings, race/ethnicity, level of education, gender identity, and age. The qualitative responses were coded and categorized in respect the preference and the justification for it. These categories later helped perform a chi-square statistical analysis ( ) of the relationships between the preferences and participants' political leanings as well as their demographic identities. We performed a basic exploratory analysis of the preferences and justification to uncover the aspects in which the enhanced warning tags fair well (or vice versa) as a usable security nudges against misinformation. For each of the justifications in the open-ended questions we performed a sentiment analysis using the Valence Aware Dictionary for Sentiment Reasoning (VADER) [14, 31, 35] . VADER yields a compound score between -1 for a very negative piece of text, and 1 for a very positive one. We also used a Linguistic Inquiry and Word Count (LIWC) analysis to qualify the sentiment expressions in the responses respective to clout and tone [73] . Each one ranges between 0 and 100 with scores close to 0 indicates less confidence and weak argumentation (clout) or negative emotions (tone). Finally we performed a Correspondence Analysis (CA) on a contingency table with rows of adjectives/verbs as keywords and the justification text as columns. The CA projects the variance in justification onto two dimensions using a weighted single value decomposition [27] . In CA, the further away the keywords are from the origin of the plot, the more discriminating they are, and smaller angles between a pro/against preference and a keyword (connected through the origin) indicates an association of the two. In our case, the two axes correspond with justifications' keywords (y-axis) respective to the participants' pro/against preferences (x-axis). After the consolidation and consistency checks, a total of 337 participants have completed the study, with 176 in the text-only and 161 in the text-and-flag warning tag groups, respectively. Users indicated that communication was the most frequent factor for coming to Twitter (85.4%), followed by entertainment/cultural awareness (71.8%), news (63.5%), politics (46.5%) and health (26.7%). Around every third participant (32.9%) has encountered some form of a warning tag as part of Twitter's soft moderation effort in general. The distribution of participants per their self-reported political leanings was: 147 (43.6%) left-learning, 96 (28.5%) moderate, 61(18.1%) right-leaning, and 33 (9.8%) apolitical. In respect to race and ethnicity, 247 (73.3%) identified as White, 29 (8.6%) as Black or African American, 42 (12.5%) as Asian, 12 (3.6%) as Latinx To ensure consistency in the analysis and validity of the results, each of the open-ended responses in the survey was coded independently by three researchers. The codebook was simple and included a coding on the preference expressed for the A/B evaluation as well as codes for the preference justification quotes from the participants. The Fleiss's kappa , as a measure of inter-coder agreement, was 0.960 on average with a 0.878 lower bound for the 95% confidence, which indicates an "excellent" inter-coder agreement overall. 5.1.1 A/B Evaluation. The breakdown of preferences for both variants of the SPAM warning tag is given in Table 1 . In the text-only variant, more than a half of the participants who preferred the original warning tags explicitly echoed a protest against Twitter's intrusion in contested matters such as COVID-19 vaccination. Verbosity and confusion was cited by roughly one out of five participants as a preference against the SPAM. The same number of participants didn't provided any justification. A small number of participants judged the SPAM tag as misaligned with Twitter's aesthetic and therefore, illegitimate. Neither of the text-only warning tags was the choice of 12.6% of the participants. The SPAM text-only warning tag (Figure 1b) received the highest preference (46.3%). The meaningful context provided by the extended security advice was welcomed by 43.2% of them indicating that "The SPAM explanation is a valid one, and makes sense in the context of the tweet's content. " The on-point warning of questionable content was cited by 36.8% in preference of "a direct misinformation label right there without having to dig further into it. " One tenth of the pro-SPAM participants found the acronym and the text catchy, cheeky, and positively attention-grabbing. Reluctance to follow the links in the original tag variant was cited by 6% of the participants. Only 4% didn't provide any justification. The pairing of the red flag with the SPAM warning tag was either too distracting or an indicator of Twitter's intrusion into the way content should be consumed. The preference against the text-and-flag SPAM tag was expressed in terms of "visual clutter that makes the tweet more difficult to read", "Doomsday level of importance", or "symbol of political hate". The pro text-and-flag SPAM tag participants welcomed the attention grabbing of the red flag suggesting that "the flag gets your attention; the text tells you it is misinformation -I tend skim when reading twitter posts and the other one is not as noticeable.". The enhanced context and the on-point warning for misinformation was preferred because " the flag reinforces the positive information that the tweet is spam". Analysis. The sentiment analysis of the preferences for SPAM warning tags is shown in Figure 3 . The violin plots show a multimodal distribution of sentiments where the original warning tag received an equal number of positive sentiments for being "simple and straightforward" as well as negative sentiments that "rather not see Twitter's judgement on whether something is misinformation or not". The justifications showed low confidence (clout = 26.15) but positive emotions (tone = 60.65). The text-only SPAM positive sentiment outweighs the negative one that captures justifications indicating that "'B' does a better job letting you know that the tweet's information is bad", with a bit more confidence (clout = 32.59) and on par with the positive emotions (tone = 62.74). The introduction of the red flag in the SPAM warning tag apparently induced more negative sentiment when justifying the choice for the original warning tag. The justifications were a bit more convincing (clout = 30.91) but the emotions were highly negative (tone = 7.61). The red flag increased the positive sentiment for the text-and-flag SPAM warning tag with the most confidence of all justifications (clout = 34.11) and positive emotions (tone = 55.52). While the participants that were neither "A" or "B" were evenly distributed in the text-only variant, the negative sentiment was dominant in the text-and-flag variant. Both being very low on confidence and high on negative emotions, the introduction of the flag might have exacerbated the feelings against the soft moderation for some of these participants. Additionally, we performed a CA to review the adjectives used in explanations for user preferences for the SPAM warning tags. In the first component on the x-axis, which accounts for 56.57% of the inertia in the justifications, all but three keywords show values larger than zero. This suggests a bit more consistency in the way that the preferences for both the "A" and "B" text-only options were worded. Put it simply, the predicative/comparative "more, " "clear, " and "better" adjectives were associated with the text-only SPAM tag, while the "own" and "true" with the original warning. The tendency for the prior is a praise of the enhancements themselves while the latter hints of a general contempt for soft moderation on Twitter. The less discriminating "wrong, " and "false" echo a similar sentiment by the neither "A" or "B" participants in the text-only variant. The remaining keywords show values less than a zero, indicating that the adjectives used to justify those selections were generally less consistent, outside of the trend of expressing the preferences for text-and-flag SPAM with the keywords "obvious" and "red. " The A/B evaluation only obtained the preference for the SPAM warning tags without explicitly asking the participants to consider the security advice as applied to the tweets containing fabricated facts. To see if the SPAM warning tags actually work, we ask the participants to indicate if the tags helped them dispel fabricated facts in the example tweet. The results shown in Figure 5 indicate that the SPAM warning tags doesn't have to be users' best choice in order to work. Roughly half of the ones that preferred the original warning tag commented that the text-only "helped them understand the meaning of the tweet in a broader context." In the text-and-flag participants found the warning tags helpful too rationalizing that "Twitter should just remove the whole post in general if it comes to a big red flag watermark. " Even some of the neutral participants noted that the warning tag was reassuring on the inaccuracy of the content. Overall, 62% of the participants indicated that the SPAM warning tags worked for them with the desired effect of dispelling the fabricated effects of the COVID-19 vaccines. Leanings. The COVID-19 pandemic didn't escape deep politicization and that naturally was reflected [49] . We were interested, therefore, to see if participants' preferences are affected by their political leanings. For both SPAM warning tags variants, as indicated in Table 5 , the Pearson's Chi-Square test yielded a statistically significant relationship between their choices and where they stand on the political spectrum: (3) = 24.934, = .000 * and (3) = 24.611, = .000 * , respectively. The original tags are appealing to left-learning with a 1:1 ratio to the moderate and 2:1 ratio to the right-leaning participants. The text-only SPAM variant has these ratios increased to a 4.5:1. Here, the neither "A" or "B" participants are uniformly distributed. The introduction of the flag tipped the left-leaning with a 1:1.3 ratio to the moderate and with a 1.3:1 ratio to the right-leaning ones that preferred the original warning tag. Left-leaning preferences for the text-and-flag SPAM warning tag were 1.56:1 with the moderates, but 4.875:1 with the right-leaning participants. The moderate and right-leaning were the most present for the neither "A" or "B" in the text-and-flag variant. Overall, the context is useful for the left-leaning and moderate participants the most, with a considerable portion of the moderates and right-leaning preferring a minimum intervention and distraction from Twitter. Identities. The demographic identities, as the earlier evidence suggests [69, 84] , factor in the way (mis)information is consumed from Twitter. Our analysis didn't find any significant relationship between the demographic identities and the preferences except between the education level and the textand-flag SPAM variant: (3) = 17.328, = .008 * . The enhanced tag, as Table 3 reveals, roughly evenly splits the high school/GED and college graduates' preferences but almost entirely earns the preferences of the ones with a post-graduate degree. It also does so with a 3:1 ratio for the participants with some college degree. Verbosity and confusion was the reason for almost two thirds of the participants to dislike the text-only FFS warning-tag. Roughly one third disliked it because of an anti-soft-moderation stance and one tenth provided no justification. The meaningful context provided by the FFS text-only warning tag (Figure 2b ) was welcomed by almost 70% of the participants "because it doesn't just say that the tweet is disputed, it mentions the various ways that the tweet is incorrect. ". One out of ten participants liked that the FFS text-only warning tag because of the "assertive statement as opposed to just one word 'disputed' in 'A'. 'B' is more specific.". A small number deemed the acronym as "funny/witty" and 14.5% simply just liked the FFS security advice. The preference against the text-and-flag FFS tag was again expressed in terms of destruction by more than a half of the participants preferring the original tag. A third of them cited the contempt for Twitter's decision to patronize users about how to interpret facts. A bit more than one tenth of the pro-original warning tags didn't provide justification. The participants pro the text-and-flag FFS liked the attention grabbing effect of the red flag noting that they "like that the red flag is big; You can see right away there is a problem with the tweet. " in roughly half of the cases. The context (34.2%) and the on-point warning that the tweet is a form of misinformation (11%) was preferred because "knowing that something is missing context is more informative than knowing it's disputed; Everything is disputed by someone. ". Only 6.9% didn't provide justification pro the text-and-flag FFS warning tag. Analysis. The sentiment analysis of the preferences for FFS warning tags is shown in Figure 6 . As the violin plots demonstrate, the original warning tag received roughly an equal number of positive sentiments for the "simple and straightforward and it doesn't try to make a judgment of the tweet" as well as negative sentiments that "the red watermarking is overkill regardless of placement and size.". The justifications showed again showed low confidence (clout = 21.15) but positive emotions (tone = 67.72). The text-only FFS positive sentiment further outweighs the negative one praising the tag's way of "explaining why the facts are probably being used in a misleading way.". The praises show twice as more confidence as the ones for the original warning tag (clout = 51.93) and more positive emotions (tone = 72.32). The red flag in the FFS warning tag again caused a shift toward more negative sentiment as was cast as "condescending" and "too distracting". The confidence plummeted in response to the flag-andtext variant (clout = 19.3) with the emotions remaining negative (tone = 31.61). The positive sentiment is prevalent with the pro FFS text-and-flag tag participants, which wielded a tad better justifications (clout = 30.95) and expressed more positive emotions (tone = 60.52). The red flag again tilted the balanced sentiment of the neutral participants in the text-only variant toward a more negative one in the text-and-flag variant. The CA for the FFS A/B evaluation is plotted in Figure 7 . For brevity, we used verbs as keywords here as the adjectives showed very similar dimensionality in the SPAM case (and vice versa). Here, the first component on the x-axis, accounting for 48.67% of the variance in justifications, shows the responses in order of preference from left (least popular), to right (most popular). Verbs used in explanations for some of the less popular choices include "disputed," and "seems," which both are terms that indicate more ambiguity in the truth ( Option "A" in the text-only comparison), and "know," which indicated more confidence ( Option "A" in the text-and-flag comparison). Justification for more popular responses include the verbs "prefer, " and "like, " which suggests approval for both FFS variants rather than a dislike for the original text-only warning. The most closely associated keyword with the text-only FFS warning tag is "tells, " which is an appreciation for the informal yet meaningful context conveyed. The y-axis, accounting for 27.54% of the variance, shows the Option "B" preferences center around the origin as an indicator of higher consensus between the pro FFS. Figure 8 shows that the detailed context provided through the FFS security advice is even more potent in dispelling improbable interpretation of facts. Overall, 68% of the participants indicated that the FFS warning tags worked for them, which is a 6% increase from the dispelling rate for the SPAM warning tags. Roughly half of the participants preferring the original tag conceded that the FFS warning tags in both variants are helpful in discrediting the manipulative tweet. A small but noticeable increase in the dispelling effect is also present for the neither "A" nor "B" participants compared to the SPAM warning tags. Similarly, the participants preferring both FFS tags were slightly Leanings. The preferences for both FFS warning tags variants, as indicated in Table 5 , were related with a statistical significance to participants' political leanings: (3) = 27.732, = .000 * and (3) = 36.483, = .000 * , respectively. The original tags are appealing to left-learning participants with a 1:1 ratio to the moderate ones and with a 2.5:1 ratio to the rightleaning participants. The text-only FFS variant has these ratios increased to a 3.6:1 between the left-leaning and the moderate participants and 4.8:1 between the left-learning and right-leaning participants. Unlike the SPAM variants, here, the neither "A" or "B" participants are dominantly right-leaning with a 2:1 ratio to both the left-leaning and moderate participants. The introduction of the flag again kept the balance between the left-learning and moderate participants, but increased the ratio to almost 2:1 to the right-leaning ones that preferred the original warning tag. The left-leaning preferences for the text-and-flag FFS warning tag were in a 1.65:1 ratio with the moderates, but in an overwhelming 5.42:1 ratio with the right-leaning participants. The right-leaning again dominate in the neither "A" or "B" preferences for the FFS text-and-flag variant. Compared to the SPAM case, the extended FFS context is even more useful for the left-leaning participants. The moderates are roughly evenly split, but the right-leaning participants show a more salient anti-soft-moderation preference when exposed to the FFS warning tags. Identities. Same as before, only the level of education mattered when it comes to the preferences. The Pearson's Chi-square tests revealed a significant relationship in this case with (3) = 17.773, = .007 * . As the Table 6 reveals, the high school/GED and the college graduates are slightly more in preference for the original tags, with a considerable dismissal for the soft-moderation altogether by the college graduates. The participants with some college-level education are 2:1 in ratio to the preference for the FFS text-and-flag variant with the ones preferring the original tag and 3:1 with the ones without a preference. The biggest difference is in for the participant with a post-graduate education level -they are almost entirely in favor of the FFS way of warning against improbable interpretations, manipulation, or selective choice of facts. In this study we were motivated to bring soft moderation closer to users' everyday experiences while minimizing imposition, which as witnessed, often backfire [71] . We distinguished between a need for context when the hazard comes from the fabrication of facts and when the hazard comes from the interpretation of facts in a rather improbable way. In the first case, we were careful to avoid the perception trap of "correction of feelings, not falshoods" [41] and used an analogy with spam emails. We did so because users, by now, can recognize spam when they see it [9] and accept that spam filtering, performed by email providers, works well [53] . Understanding this, we wanted to regain the trust in the provider -Twitter in the case of the warning tags -and signal absence of bias or judgment in their action [42] . With this in mind, the SPAM warning tag shows a very promising step toward unified interpretation and increased trust in soft moderation (only related to COVID-19 misinformation, for now). If support from left-leaning participants was already hinted at from previous studies on soft moderation, it was nonetheless strongly reinforced in both the text-only ("...it tells participants, rather quickly, that the tweet is garbage" and text-and-flag variants ("the red flag will alert me before I even read any of it". Moderates were evenly split, expectedly, but reassured that the text-only variant "really tells you more of what is going on" while the text-and-flag variant "gives more specifics and is thus tougher to refute". In significant numbers, right-leaning participants made it clear that the text-only variant seems more appropriate because it's far more specific; the original tag feels more like an ad and nothing that I didn't already know." and praised the text-and-flag variant as "a large visual cue that's hard to ignore and will bring attention to the idea that something is going on with this information. ". More promising evidence for the SPAM approach is the support across all levels of education without distinction of age, gender, or race/ethnicity. In the text-and-flag variant, only 10% of the participant with only high school education/GED disliked both the "A" and "B" options while the rest gave equal support of 40% for each option. Even though the participants with college diplomas tilted toward the option "A" (a relative difference of 8%), the group unequivocally acknowledged that the text-and-flag " is more clear and strong, and tells you exactly what is incorrect". After all, the SPAM tags helped more than 60% of all the participants to dispel the fabricated facts about COVID-19 vaccine side effects. In the second case, we wanted to avoid authoritative imposition and thus worded the warning not to personify senior public health experts, usually responsible for interpretation of facts [77] . We also opted for a "bold" acronym choice to lure users' attention to the text of the warning tag, for a moment, instead of the warning tag as soft moderation. Once "hooked, " the cost to read the entire warning tag text was less then avoiding it as the derivation of new meaning to acronyms and words is a pragmatic way of conveying context on social media -take for example the hashtags on Twitter [66] . The text wasn't asking the user to "get facts" or "learn more," but instead, it gave several convincing options for users themselves to pick why the context is fitting to the possibly misinformation tweet [54] . The FFS tag did just that and succeeded. Left-learning participants liked that the text-only variant "gives real reasons why this tweet is suspect" and moderates seconded that the FFS's context "goes more in depth and makes you more alert to the tweet". Rightleaning participants confirmed our idea to avoid any relationship to an imposing authority: "The context in 'B' is better because Facebook came out saying that most if not all of their fact checkers don't check for facts, they just do it on opinion base. I'm sure Twitter does the same". The consensus across the participants of all political leanings that the "red flag watermark was really draws more attention", lead by the left-leaning ones, supports the potency of the FFS acronym as the "hook" entirely absent in the current soft moderation on Twitter. The FFS text-and-flag variant appealed almost entirely to all participants with a post-graduate level of education. Interestingly, they were concerned not just for themselves but other users on Twitter and misinformation in general, noting that "it's important people really pick up on the fact this information might be misleading". So were the participants with a college degree even though they again tilted toward the original tag "A": It seems like option 'B' would help resolve the problems that false news or fake profiles create. The participants with some college experience, in favor of the FFS tag, pointed out that a "disputed facts" warning is less informative than a "missing facts" warning. The support from the participants with only high school education/GED underlined the essential usability of the warning itself: "It makes it known that something is up with this post and I shouldn't trust it 100% without doing more research. ". Overall, the FFS achieved a 68% effectiveness in dispelling an improbable interpretation of COVID-19 related facts among all participants. We did observe, albeit anecdotally, the backfire effect in 1.48% of the participants' response (all politically right-leaning). One participant, who was pro enhanced warnings, even provided a testimony of the backfire effect: "I have seen some of my crazy friends of mine where they think if Twitter disputes it, then it makes it ever more correct.". The warning tags in the original option were blamed that "lead you to the lying, paid, 'fact checkers"'; The enhanced warning tags were dismissed because they "force an opinion on you and suppress a side that has been more accurate than the CDC and Fauci so far since COVID". Few participants even declared that the warning tags "makes them leave Twitter entirely", perhaps rappelled by the Twitter's sweeping COVID-19 misleading policy from December 2021 [78] . The contempt for Twitter's soft moderation was made clear in the responses of a considerable group of participants, stating that "Twitter is not a medial expert. " Roughly a half preferring the original warning tags versus the SPAM variants cited Twitter intrusion into opinion formation as a choice for the "lesser of the two evils. " This fell down to a third of the Option "A" supporters in the case of the FFS, but considering that around 15% of the overall participants did not have a preference for either of the options is an indication that soft moderation has still a lot to do to appear unbiased and non-judgmental to users with diverse perspectives [72] . Our results reveal several aspects worth considering for improving the soft moderation appeal among the Twitter users. There is no doubt that the meaningful context is useful but runs the risk of being avoided due to verbosity/confusion. In the SPAM case, a possible variation would be keeping just the acronym with a bit of text rewording, for example SPAM: Content like this usually ends in spam folders. This improvement avoids the words "strange, " "adverse" and "misinformation" while indirectly hints that it should be handled on user's discretion. Plus, it becomes a one-liner warning appearing more of a suggestion than an opinion voiced by Twitter, as several of our participants complained about. Because the warning plays on the experience with spam emails, we also think it's worth testing the email iconography in line with the warning as shown in Figure 9 . In this example, we borrowed the icon from Gmail's spam folder, but certainly could use any hexagram with an exclamation mark that provides contrast. This could also be an alternative to the red flag watermark to avoid participant recoiling from the sudden splash of red while still having an attention grabbing effect. Similarly, this could address the concerns for illegitimacy of the enhancement cited by some of the participants. The context conveyed by the FFS tags was well received, but some participants, expectedly, expressed concerns about the "catchy" nature of the acronym. It is therefore worth testing dropping the acronym altogether or replacing it with simply the word "facts" as shown in Figure 10 . Here, the preceding iconography changes to a question mark, retaining the element of "hook" we envisioned in the first FFS variant. The following text is essentially the same, blending the acronym to look more "professional," as the participants expected. The red flag watermark, the results confirm, produces the desired effect of attention-grabbing. However, adaptations could be made here too. Participants commented on the size and the transparency, so variations could include testing options where these two variables are determined by the level of confidence of fact checkers or the engagement it attracts over time, as suggested in [71] . The watermark display could vary based on a particular user's content preferences, e.g. one group of users could see a red flag and another could see the words "red flag" as a watermark. Twitter already uses this approach to suggest adds and content in the users' feeds [7] . Ethical concerns do arise when dealing with misinformation, or allegedly harmful information, within the pluralistic social media population. The tension between impartiality, profitability, and social responsibility of the platforms might not always ensure that misinformation is dealt with using consistent soft moderation criteria. With the honest, yet inevitable false-positives/negatives, the proposed enhancements -if applied -might be seen as unfair at best or simply harmful at worst. We therefore are open for democratic participation in the design that allows for remediation of concerns in such instances. Soft moderation, at least in our view, is a form of honest communicative action rather than an authoritative and absolute determination of truth, and as such, beneficial to all Twitter users without discrimination [5] . We are aware that facts change, become irrelevant, or are refuted over time so a retroactive application should also be considered to enable versatile soft moderation to the best of our (and Twitter's) abilities. We note several limitations of our study, which could be addressed in future work. The size of the sample could be enlarged to obtain an as varied as possible Twitter population. We used only two examples of misinformation on COVID-19, which is a limitation steaming both from restricted financial resources and limited attention span of participants [36] . An extended, or perhaps a longitudinal study that incorporates more COVID-19 misinformation instances over a time could not just help generalize our findings, but reveal important behavioral patterns in dealing with soft moderation. Also, it could help with an A/B evaluation for warning tags pertaining other contested topics such as elections [84] . Participants were exposed to generic formatting of the tweets emphasizing the content and the warning tags. In reality, misinformation could come from individual accounts, influencers, or accounts controlled by nefarious actors [85] . Misinformation is often amplified by social bots, and appears in users' feeds next to other posts, adds, with variable degree of visual interference [21] . All of these aspects could influence the preferences for or against soft moderation. Controlling for them will require a study executed in partnership with Twitter where the enhancements are tested with selected users on the live platform. Such a test could not just capture the preferences of the regular Twitter users but help closely observe the "backfiring," "implied truth, " and "illusory truth" effects. We didn't explicitly test for these in our study, but it is important to track how misinformation itself materializes in the individual Twitter consumption. Our A/B evaluation is limited by the current formatting of the original warning tags on Twitter [62] . If Twitter chooses to reformat the tags, eliminate the links, or place them elsewhere, the enhancements also should change and the results might not hold for these new conditions. This paper conveys the first extensive A/B evaluation of enhancements for misinformation warnings on Twitter. Providing users a meaningful context and attention-grabbing iconography, our results suggest, does help users recognize and contain COVID-19 misinformation. We weren't poised to solve the predicament of soft moderation in one shot; rather, the goal was to utilize the usable security body of knowledge to trace a path toward "inoculation" against information hazards on social media. Nudges for Privacy and Security: Understanding and Assisting Users' Choices Online Alice in Warningland: A Large-Scale Field Study of Browser Security Warning Effectiveness Twitter Permanently Suspends Marjorie Taylor Greene's Account How Can Social Networks Design Trigger Fear of Missing Out Communicative actions we live by: The problem with fact-checking, tagging or flagging fake news -the case of Facebook Analyzing QAnon on Twitter in Context of US Elections 2020: Analysis of User Messages and Profiles Using VADER and BERT Topic Modeling More Accounts, Fewer Links: How Algorithmic Curation Impacts Media Exposure in Twitter Timelines Spam: A Shadow History of the Internet Quantifying Phishing Susceptibility for Detection and Behavior Decisions Real solutions for fake news? Measuring the effectiveness of general warnings and fact-check tags in reducing belief in false stories on social media Design Frictions for Mindful Interactions: The Case for Microboundaries Does color of warnings affect risk perception? International Misinformation and Disinformation in the Era of COVID-19: The Role of Primary Information Sources and the Development of Attitudes Toward Vaccination Analyzing Twitter Users' Behavior Before and After Contact by the Russia's Internet Research Agency. 5, CSCW1, Article 90 Explicit warnings reduce but do not eliminate the continued influence of misinformation You've Been Warned: An Empirical Study of the Effectiveness of Web Browser Phishing Warnings Why Do They Do What They Do?: A Study of What Motivates Users to (Not) Follow Computer Security Advice A functional analysis of disinformation Improving SSL Warnings: Comprehension and Adherence Rethinking Connection Security Indicators The Rise of Social Bots Do or Do Not, There Is No Try: User Engagement May Not Improve Security Outcomes USENIX Association Just Say No" is not enough: Affirmation versus negation training and the reduction of automatic stereotype activation Fake News on Facebook and Twitter: Investigating How People (Don't) Investigate Is FIDO2 the Kingslayer of User Authentication? A Comparative Usability Study of FIDO2 Passwordless Authentication Computing Political Preference among Twitter Followers Correspondence analysis in practice So Long, and No Thanks for the Externalities: The Rational Rejection of Security Advice by Users Taking out the Trash": Why Security Behavior Change Requires Intentional Forgetting Identifying Disinformation Websites Using Infrastructure Features VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text TrollHunter2020: Real-Time Detection of Trolling Narratives on Twitter During the 2020 U.S. Elections Evaluating the Effectiveness of Deplatforming as a Moderation Strategy on Twitter Linguistic Signals under Misinformation and Fact-Checking: Evidence from User Comments on Social Media Girls Rule, Boys Drool: Extracting Semantic and Affective Stereotypes from Twitter Adapting Security Warnings to Counter Online Disinformation The COVID vaccine totally makes your boobs bigger and grows your pp AT LEAST 3 inches. WHO CAN CONFIRM? We have to spread awareness and the truth about these vaccines! Countering Fake News: A Comparison of Possible Solutions Regarding User Acceptance and Effectiveness Countering Fake News: A Comparison of Possible Solutions Regarding User Acceptance and Effectiveness If HTTPS Were Secure, I Wouldn't Need 2FA" -End User and Administrator Mental Models of HTTPS The 'post-truth'world, misinformation, and information literacy: A perspective from cognitive science. Informed societies-Why information literacy matters for citizenship You're definitely wrong, maybe: Correction style has minimal effect on corrections of misinformation online They Keep Coming Back Like Zombies": Improving Software Updating Interfaces Appealing to sense and sensibility: System 1 and system 2 interventions for fake news on social media Addressing Hoaxes and Fake News If nearly half of *vaccinated* people are "avoiding other people as much as possible" then public health and media messaging about the risks COVID poses to vaccinated people has been badly miscalibrated Heuristic Evaluation of User Interfaces When corrections fail: The persistence of political misperceptions Parlermonium: A Data-Driven UX Design Evaluation of the Parler Platform The implied truth effect: Attaching warnings to a subset of fake news headlines increases perceived accuracy of headlines without warnings Prior exposure increases perceived accuracy of fake news Examining the Demand for Spam: Who Clicks? Examining the Demand for Spam: Who Clicks? 2016. I Think They're Trying to Tell Me Something: Advice Sources and Selection for Digital Security A Comprehensive Quality Evaluation of Security and Privacy Advice on the Web An Experience Sampling Study of User Reactions to Browser Warnings in the Field Tatiana von Landesberger, and Melanie Volkamer. 2020. An investigation of phishing awareness and education over time: When and how to best remind users Active measures: The secret history of disinformation and political warfare. Farrar, Straus and Giroux Mental models of domain names and urls Warning Research: An Integrative Perspective Fake news game confers psychological resistance against online misinformation Updating our approach to misleading information A Comparative Usability Study of Key Management in Secure Email Encounters with Visual Misinformation and Labels Across Platforms: An Interview and Diary Study to Inform Ecosystem Approaches to Misinformation Interventions Twitter flagged Donald Trump's tweets with election misinformation: They continued to spread both on and off the platform The pragmatics of hashtags: Inference and conversational style on Twitter I Don't See Why I Would Ever Want to Use It": Analyzing the Usability of Popular Smartphone Password Managers An exploratory study of COVID-19 misinformation on Twitter Misinformation warnings: Twitter's soft moderation effects on COVID-19 vaccine belief echoes Mis)perceptions and Engagement on Twitter: COVID-19 Vaccine Rumors on Efficacy and Mass Immunization Effort VoxPop: An Experimental Social Media Platform for Calibrated (Mis)Information Discourse Designing Against Misinformation The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods Belief echoes: The persistent effects of corrected misinformation A Usability Evaluation of Let's Encrypt and Certbot: Usable Security Done Right The New York Times. 2022. Tracking Viral Misinformation Strategies to combat medical misinformation on social media COVID-19 misleading information policy The Fog of Warnings: How Non-essential Notifications Blur with Security Warnings Fact-Checking: A Meta-Analysis of What Works and for Whom Folk Models of Home Computer Security Research-based guidelines for warning design and evaluation Misinformation in Social Media: Definition, Manipulation, and Detection. SIGKDD Explor I Won the Election!":An Empirical Analysis of Soft Moderation Interventions on Twitter. arXiv 2101.07183v1 The Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans The Stateof-the-Art in Twitter Sentiment Analysis: A Review and Benchmark Evaluation Content indicator is defined as a label that is assigned by Twitter under a Tweet in blue font preceded by an exclamation mark as shown in the Figure 11 . Content indicators could be assigned for various types of contents, such as: misinformation, sensitive content, graphic content, etc.