key: cord-0488986-ce7suqux authors: Shi, Jialiang; Ghasiya, Piyush; Sasahara, Kazutoshi title: Psycho-linguistic differences among competing vaccination communities on social media date: 2021-11-09 journal: nan DOI: nan sha: 67f4777b7ecaa5d9b0cd1d07018046bb8502d329 doc_id: 488986 cord_uid: ce7suqux Currently, the significance of social media in disseminating noteworthy information on topics such as health, politics, and the economy is indisputable. During the COVID-19 pandemic, anti-vaxxers use social media to distribute fake news and anxiety-provoking information about the vaccine, which may harm the public. Here, we characterize the psycho-linguistic features of anti-vaxxers on the online social network Twitter. For this, we collected COVID-19 related tweets from February 2020 to June 2021 to analyse vaccination stance, linguistic features, and social network characteristics. Our results demonstrated that, compared to pro-vaxxers, anti-vaxxers tend to have more negative emotions, narrative thinking, and worse moral tendencies. This study can advance our understanding of the online anti-vaccination movement, and become critical for social media management and policy action during and after the pandemic. The opposition to vaccines or anti-vaccinationism is an old phenomenon with roots traced back to the 1850s. Much of the anti-vaccine sentiment of the era was laid out by John Gibbs, when he published a booklet against the Vaccination Act of 1853 (British Government) -which required mandatory vaccination for all infants over four months old [1] . There are several reasons for this, such as conspiratorial beliefs, disgust, individualism, and hierarchical worldviews for anti-vaccine behaviour or attitudes of people [2] . While early anti-vaccine activists distributed pamphlets and organized rallies, the anti-vaxxers of the 21st century have access to Facebook, Twitter, and other social media platforms to spread their views globally. Due to vaccination concerning the evolution of anti-vaxxers. Therefore, we measured mass vaccination effects on emotions in pro-and ant-vaccine groups to get hints on vaccine operations. Answering these questions and understanding the whole picture of antivaxxers helps us develop countermeasures against the online anti-vaccine movement. As detailed later, our results indicate that we need to pay attention to the apparent negative tendency of anti-vaxxers in language expressions and the fact that they are resolute in their beliefs in network structure. Various social media platforms work as conduits in the circulation and amplification of fake news [12] . Health-related misinformation and fake news is also a burgeoning research topic among social scientists and medical research professionals. In this line of research, a recent study by Suarez-Lledo et al. investigated health misinformation on social media and found that 'vaccine' is the fastest spreading topic on Twitter [13] . YouTube is another platform on which anti-vaccine narratives are often broadcast. Lahouti et al. investigated YouTube videos to understand anti-vaccine sentiments in France where vaccine hesitancy is high and found that anti-vaxxers are very active on YouTube [14] . Different demographic characteristics and living conditions also play an important role in anti-vaccine views/beliefs. Lyu et al. showed these differences in their analysis of Twitter data [15] . Cultural differences also play a critical role in developing pro-or anti-vaccine views. A study by Luo et al. observed that the difference between the topic of anti-vaxxers in China and the United States is caused by cultural distinctions between each country [16] . In India, concerns about anti-vaccination are more likely to stem from health concerns and fear of allergic reactions [17] . In addition to differences in content and geography, curiosity to understand what kind of arguments anti-vaxxers give on Facebook has also influenced research. Wawrzuta et al. analysed Polish media fan pages and found that the COVID-19 anti-vaccine movement has new arguments, such as the vaccine not being properly tested. However, the classic argument -not trusting the government -also remains popular [18] . Nuzhath et al. analysed Twitter data to understand the prominent topics of discussion among anti-vaxxers in Bangladesh and found that misinformation, vaccine safety and effectiveness, conspiracy theories, and mistrust in government are some of the main topics [19] . Jamison et al. also found that both pro-and anti-vaxxers are spreading less reliable information or claims on social media and suggested that while all research focus is on bad actors (anti-vaxxers) to understand the anti-vaccine movement, good actors (pro-vaxxers) also play a role in the spread of the 'infodemic' [20] . LIWC is a useful tool in investigating psycho-linguistic features. Mitra et al. utilized LIWC to understand anti-vaccination attitudes in social media and found that anti-vaxxers tend to be influenced by conspiracy theories [21] . Linguistic differences often accompany network differences. Johnson et al. utilized network analysis to understand the evolution of pro-and antivaccine communities [22] . One significant finding of their research is related to undecided individuals. Their finding challenges the current thinking that undecided individuals are a passive background population in the battle of 'hearts and minds'. Germani and Biller-Andorno show that, compared to pro-vaxxers, anti-vaxxers on Twitter have a high number of influencers and these influencers lead the anti-vaxxers discussion [23] . They also showed that before the suspension of his Twitter account, Donald Trump was the main driver of anti-vaccine misinformation on Twitter. Lastly, Menon and Carley characterized COVID-19 misinformation communities on Twitter [9] . Their analysis suggested that a large majority of misinformed users may be anti-vaxxers. Further, their socio-linguistic analysis also showed that informed users (who spread true information) use more narrative thinking than misinformed users (who spread misinformation). Social and behavioural scientists have worked to understand the moral basis of people's judgment for a long time. Voluminous research and a theoretical framework led the foundation of the Moral Foundations Theory (MFT) [24, 25] . MFT works on the assumption that there are five major moral foundations: (1) 'Care/Harm', which focuses on not harming others and protecting the vulnerable; (2) 'Fairness/Cheating', which assumes equivalent exchange without cheating to be good; (3) 'Loyalty/Betrayal', which concerns a collective entity instead of individuals; (4) 'Authority/Subversion', which postulates respect for authority, resulting in maintaining the hierarchy; and (5) 'Sanctity/Degradation', which involves a feeling of disgust caused by the impure. Moral Foundations Theory (MFT) was also used to understand why morality varies across cultures yet still shows similarities and recurrent themes [24] . Within vaccination hesitancy, past research also shows that core morality will influence people's attitudes toward vaccination [26] . For example, 'Liberty' is likely implicated in the decision to not vaccinate a child [27] ; endorsement of the foundations of Purity and Liberty are associated with vaccine hesitancy [10, 11] . Moral Foundations Dictionary (MFD) is often used to quantify and understand the extent to which moral foundations are expressed in a text [8] . We used the original version of MFD for this study. There are also some candidates for moral foundations like 'Liberty/oppression' related to MFT [28] . However, since the original MFD does not include this dimension, we do not discuss it in this study. Although several aspects of anti-vaccine communities have been re-ported by a series of studies (such as those previously mentioned), the psycho-linguistic features of anti-vaccine posts that may increase vaccine hesitancy, especially in the context of the COVID-19 pandemic, remain unclear and understudied. This knowledge is critical to reduce vaccine mis/dis-information and achieve herd immunity towards a post-pandemic era. Therefore, we investigate psycho-linguistic properties of anti-vaxxers in terms of the above-mentioned research questions. To obtain the longitudinal data of social media posts we used the Twitter Search API to collect COVID-19 related tweets, replies, and retweets (RTs) with keywords such as 'covid', and 'covid-19' from February 20, 2020 to June 30, 2021. In this research, we only focused on English language content. We then filtered vaccine-related tweets from the English dataset for our analyses, using the keywords 'vaccination', 'vaccines', 'vaccine', 'vaccinated', 'vaccination', 'vaccineoutside', 'vaccinate', 'vaccinologist', 'vacciner', and 'coronavirusvaccine'. In total we collected 11,395,103 retweets, 11,395,103 tweets, 465,037 replies and 3,781,447 unique users. Our data and code are available online (https://osf.io/FSM23/). Previous studies have found that understanding retweet networks and community detection is a useful method to reveal communication patterns among communities [29] . Therefore, we followed the same approach to classify pro-and anti-vaxxers on Twitter. We first constructed a retweet network from the vaccine-related retweets. A retweet network on Twitter can be defined as a directed weighted graph, where nodes and edges represent users and retweet transmissions, respectively. In our research, the direction from one node to another represents a user retweeting another user's post. The weight represents the number of times the user retweeted another user's post. We used the Louvain algorithm, a standard algorithm for community detection [30] , to find clusters including pro-and anti-vaxxers. We then applied the -core decomposition ( = 1) and retained nodes whose indegree (i.e., the number of retweets by different users) was greater than 20 in order to focus on significantly influential users. The Forceatlas2 layout [31] was used to visualize the structure of the resulting clusters. For these processes we used the Gephi software [32] . By manually confirming the top 10 popular users in each cluster (i.e., looking at users' profiles, tweets, and retweeted contents), we deciphered which clusters represent pro-and anti-vaxxers, and others. The retweet network was also quantified using the following standard measures to illuminate its structural features, in addition to the number of nodes (users) and links (unique retweet relations): • Network density: the ratio of actual connections and potential connections. • Clustering coefficient: measures the degree to which nodes in a network tend to form triangles or 'clusters'. • Average distance: the average minimum number of connections to be crossed from any arbitrary node to any other. For this measurement, we converted our retweet network to undirected graphs because these measures are defined for undirected graphs. LIWC is a standard tool in social psychology for computerized text analysis [7] . Given a text, LIWC can quantify emotions, thinking style, and social concerns by counting the dictionary words registered in LIWC. The psychological expressions in a text can also reveal critical aspects. Table 1 shows all the LIWC categories and subcategories used in this study. LIWC was used in many studies, such as sentiment analysis and social relationships, to evaluate the impact of psychological expression [33] . We used LIWC 2015 to investigate attitudinal and linguistic differences between pro-and anti-vaxxers. LIWC is especially useful for measuring the sentiment and thinking styles in a tweet. According to [34] , analytical thinking and narrative thinking are often in opposition, which may characterize pro-and anti-vaxxers. Narrative thinking is identifying conceptual categories and organizing them in hierarchical ways [34] , which can be linked to the frequent use of pronouns and function words [35] and the less frequent use of analytic categories in LIWC. Thus, we compare LIWC scores for analytic, pronouns, and function words between pro-and anti-vaxxers to determine whether they are analytic thinkers or narrative thinkers. As mentioned in 2, Graham et al. developed the moral foundations dictionary (MFD), which consists of 156 words and 168 word stems to quantify the frequency of words referring to virtues and vices associated with each moral foundation [8] . They attempted to understand moral tendencies among liberals and conservatives and found that liberals consistently showed greater endorsement and use of the Care and Fairness foundations compared to the other three foundations, whereas conservatives endorsed and used the five foundations more equally. The Japanese version of MFD was used to reveal that a trade-off between the Fairness and Authority foundations plays a key role in the online communication of Japanese users on Twitter [36] . We also used the MFD in this study to measure five moral foundations as additional psycho-linguistic features for pro-and anti-vaccine clusters on Twitter. The above-mentioned processes resulted in the retweet network based on vaccine-related tweets (77,934 nodes and 1,999,164 edges). In this network, we identified six main clusters: 1) Left-wing (32.0%), 2) Pro-vaxxers (22.2%), 3) Right-wing (13.0%), 4) Anti-vaxxers (11.7%), 5) India-related (5.7%), and 6) Canada-related (3.6%). These clusters are shown in Fig. 1 . We noticed the apparent differences in the content of the top 10 users selected by indegree (i.e., the number of retweets by different users) in each cluster. Our examinations identified the second largest cluster as a pro-vaccination group and the fourth largest cluster as an anti-vaccination group. Table 2 shows two example tweets for pro-vaxxers and anti-vaxxers, respectively. Here we can see that anti-vaxxers intimated the Bill Gates conspiracy theory (top) and distrust about the government (bottom). In addition, we observed that some groups have a clear political orientation, such as the first and third largest groups. In the COVID-19 pandemic, vaccine operation is an important political issue; thus, polemical clusters might emerge together with pro-and anti-vaccine clusters. Two other groups turn out to be country-related (i.e., India and Canada), which are related to vaccine strategies in these countries. Because our interest is in psycho-linguistic features in anti-vaccine groups to gain insights into countermeasures, we hereafter restricted our analysis to pro-and anti-vaccine groups (clusters 2 and 4). The network measures for the entire retweet network and the pro-and anti-vax clusters are summarised in Table 3 , which reveals that the number of pro-vaxxers is larger than that of anti-vaxxers. However, anti-vaxxers are more densely connected according to network density and clustering coefficient values. In addition, the distributions of the indegree for pro-and anti-vaccine groups have heavy-tailed distributions (Fig. 2) . This implies the existence of influential accounts in both pro-and anti-vaccine groups, which is often the case in a spreading phenomenon. Furthermore, we compared the network characteristics of these two communities that changed after mass vaccination (i.e. December 2020). As shown in Table 4 , overall, the difference between two period is minute; only changes in network density significantly decreased. When it comes to two groups, the sizes (nodes and links) of pro-vaxxers increased but network density decreased in the after period. Conversely, those of anti-vaxxers decreased, but network density increased in the after period. We can infer from these results that although the pro-vaccine community grew once mass vaccination began, they became sparser than before. However, although the anti-vaccine community is comparatively small and lose members once the vaccination started, those became more tightly knit. We found that among the top 10 users in these clusters, six accounts belonging to anti-vaxxers have been banned by Twitter, while none of the pro-vaxxer accounts have been banned. This observation indicates that our classification of pro-and anti-vaxxers is correct and reliable. After identifying pro-and anti-vaccine communities, we compared the two in terms of psycho-linguistic features, including emotion. Specifically, we selected categories such as analytical thinking, affective processes, and per- Table 1 . In the following, we used the independent -test to compare the average score between anti-and pro-vaxxers. For all tests, the confidence level is 95%. The confidence intervals were computed from the bootstrapped samples (We randomly sampled tweets ( =45,809) and replies ( =254,660) and repeated it multiple times ( =10,000) for statistical evaluation.) Additionally, to find out the differences between tweets and replies, we separately listed the scores of the two communities. Affective processes include several emotional words in LIWC. Examples of positive emotion words are 'love', 'nice', and 'sweet', while words such as 'hurt', 'ugly', and 'nasty' are seen as negative. Figure 3 shows that, overall, anti-vaxxers expressed more affect (positive and negative combined) than pro-vaxxers in both tweets and replies. If we consider positive and negative affect separately, we find that in tweets and replies, anti-vaxxers have higher negative and lower positive emotions, while the opposite is true for provaxxers. Mass vaccination for COVID-19, which was expected to improve negative emaitions both in pro-and anti-vaccination groups, started in December 2021 [37] . However, Figure 4 shows that negative emotions instead increased before the mass vaccination (Period 1: February 2020 to November 2020) and after it (Period 2: December 2020 to June 2021). In addition, for replies and tweets, anti-vaxxers showed higher negative emotions compared to pro-vaxxers during Period 1. This trend increased in Period 2, and similarly anti-vaxxers' expression of emotion was larger than that of pro-vaxxers. These results suggest that anti-vaxxers propagated their antivaccination beliefs more passionately and emotionally even after the start of mass vaccination [38] . There are subcategories of personal concerns in LIWC. In Fig. 6 , the frequency of words used in personal concerns for anti-and pro-vaxxers are visualized using Wordcloud. We found that the three most highly used subcategories by pro-vaxxers are money (black), religion (orange), and leisure (blue), while anti-vaxxers have shown a higher usage of death (purple) and work (green). Comprehending the thought process (whether analytical or narrative thinking) of both pro-and anti-vaxxers can also provide critical information that can be utilized to create a mitigation strategy. In both replies and tweets, we found that anti-vaxxers used more function words and pronouns, and had a lower analytic score than pro-vaxxers. Recall that the higher function words, pronouns score and lower analytic scores represent narrative thinking. This suggests that anti-vaxxers use more narrative thinking than pro-vaxxers (Fig. 5) . Similarly, we found that replies showed higher usage of function words, pronouns and a lower analytic score than tweets. This result also indicates that the trend of narrative thinking can be stronger in replies, a targeted message). Furthermore, we compared analytic thinking tendency in replies between anti-vaxxers and between antiand pro-vaxxers. It turns out that anti-vaxxers were in the narrative mode Figure 5 : LIWC scores for analytical, total function words, and total pronouns on average for anti-and pro-vaxxers. Differences between anti-and pro-groups are all significant (independent -test, < 0.001). Figure 6 : Word clouds for pro-and anti-vaxxers. Colours correspond to different LIWC subcategories: green for work, blue for leisure, red for home, black for money, orange for religion, and purple for death. when replying to pro-vaxxers (Fig. 7 ). Morality is another psycho-linguistic feature. It is important to explain the process of making social judgements [39] . As explained, we used MFD to assess morality tendencies among pro-and anti-vaxxers. The comparison of the average scores of both groups is shown in Table 5 . We can see that anti-vaxxers show higher scores in the Vice moral foundations (e.g. Harm, Cheating, Betrayal, Subversion, Degradation), indicating they tend to use moral violating words. While in the Virtue moral foundations, pro-vaxxers show higher scores in Care, Fairness, Loyalty and Sanctity, indicating they expressed moral content. Taken together, anti-vaxxers frequently used immoral language in their posts, thereby distributing anxiety-provoking (fake)news and messages. Figure 7 : LIWC scores for analytical, total function words, and total pronouns in replies between anti-vaxxers and between anti-and pro-vaxxers. Differences are all significant (independent -test, < 0.001). We have shown that anti-vaxxers used more affective process and negative words, while pro-vaxxers used more positive words (Fig. 3) . Another research also showed that the strategy of the anti-vax groups involves strong emotions with highly toxic and negative words during the early COVID-19 pandemic [40, 41] . Thus, our finding is consistent with the previous result. In addition, we showed that anti-vaxxers tend to use vice moral languages in all of the five moral foundations, whereas pro-vaxxers exhibit the opposite tendency. These findings are the answer to RQ1. We have seen positive effects globally since the mass vaccination started. How did anti-vax communities change their interactions on Twitter after the mass vaccination? Our results showed that, compared to pro-vaxxers, antivaxxers expressed even higher negative emotions after the mass vaccination, which suggests their firmness (Fig. 4) . These results are the answer to RQ2. Strangely, similar results were observed in pro-vaxxers, although we expected that pro-vaxxers became more positive emotionally. These results cannot be explained by the 'backfire effect', where exposure to opposing views can lead to the increased commitment to preexisting beliefs [42] . Therefore, further investigation is needed to unveil this emotional shift phenomenon. In addition, we found that anti-vaxxers show more narrative thinking (higher score of total function words and total pronouns) and a lower score in analytic thinking, as shown in Fig. 5 . Unlike analytic thinking, narrative thinking tends to be more personal, and rumours are easier to spread [43] . However, a previous study showed that informed groups (pro-vaxxers) have more narrative thinking [9] , which is contrary to our results but that study was conducted on a small dataset for a short period (less than a month). We used larger and more longitudinal data, and our results show that narrative thinking is more dominant among anti-vaxxers, especially during the COVID-19 pandemic. As mentioned above, anti-vaxxers showed negative moral tendencies (i.e., usage of vice words) based on MFD, consistent with anti-vaxxers using more negative emotions because vice words are correlated with those [44] . Fig. 6 shows that anti-vaxxers use more 'death' words, a subcategory belonging to personal concern. This result suggests that antivaxxers may be more concerned about death caused by vaccination. Because replying can reach beyond follow-follower relations, reply patterns on Twitter can also provide valuable information, and previous studies have provided such evidence (e.g., [45] , [46] , [47] ). We also investigated reply patterns to understand the psycho-linguistic features of Twitter conversations on vaccination. We found that replies showed more narrative thinking (Fig. 5) , because replying is an immediate reaction and thus less inclined toward analytic thinking. These findings provide useful implications for spreading credible content from pro-vaxxers while reducing the exposure of misinformation and anxiety-provoking posts from ant-vaxxers. Because anti-vaxxers' posts are typically charged with higher negativity and vice morality, algorithms can detect their harmful content by measuring psycho-linguistic features. Then, the SNS system can hide such posts until users agree to avoid unnecessary exposure. At the same time, psycho-linguistic features as well as network structures help us identify those pro-vaxxers who spontaneously transmit trustful vaccine information in near real-time. Thus, we may leverage such information for social fact-checking to oppose anti-vaccine narratives at scale. Several previous studies have shown that Twitter and other social media platforms are not representative of the general population [48, 49, 50] . Even though our findings are statistically significant, we should be aware of the gap between online social networks and reality. Besides this, the focus of our study is only Twitter, but there are several other popular social media platforms such as Facebook, Reddit, and Instagram. Analysis of vaccination communities on these platforms can also bring forward critical insights. In our future study, we will focus on the comparative analysis of vaccination communities on Twitter, Facebook, Instagram, and Reddit. Comparison between different languages and cultures is also an important future direction (e.g. [41] ). In this study, we quantified psycho-linguistic differences among competing vaccination communities on Twitter during the COVID-19 pandemic. Based on the differences in linguistic usage, we found that anti-vaxxers tend to show more negative, narrative thinking, and immoral tendencies. In terms of network difference, anti-vaxxers showed a tighter network structure and strengthening of their anti-vaccine beliefs. The mass vaccination of people during the COVID-19 pandemic shows that these vaccines are working. However, even this news does not deter anti-vaxxers from their beliefs; rather; it strengthens these negative emotions. This vicious circle needs to break if we want to immunize all highrisk people in the world and achieve herd immunity, while preventing the online anti-vaccine movement. Our results provide key insights for developing countermeasures against the online anti-vaccine movement, both at individual and society levels. This work was supported by JST, CREST Grant Number JPMJCR20D3, Japan. None. Professor in Graduate School of Informatics, Nagoya University. Since 2020, he is an Associate Professor in School of Environment and Society, Tokyo Institute of Technology. His research interests are computational social science and social innovation. Jonathan m. berman, anti-vaxxers: How to challenge a misinformed movement The psychological roots of anti-vaccination attitudes: A 24-nation investigation A global database of covid-19 vaccinations Anti-vaccine movement could undermine efforts to end coronavirus pandemic, researchers warn An analysis of the human papilloma virus vaccine debate on myspace blogs The online anti-vaccine movement in the age of covid-19 The development and psychometric properties of liwc2015 Liberals and conservatives rely on different sets of moral foundations Characterizing covid-19 misinformation communities using a novel twitter dataset Association of moral values with vaccine hesitancy Using moral foundations in government communication to reduce vaccine hesitancy Fake news Prevalence of health misinformation on social media: systematic review Spread of vaccine hesitancy in france: What about youtube™? Social media study of public opinions on potential covid-19 vaccines: informing dissent, disparities, and dissemination Exploring public perceptions of the covid-19 vaccine online from a cultural perspective: Semantic network analysis of two social media platforms in the united states and china Analyzing the attitude of indian citizens towards covid-19 vaccine-a text analytics study What arguments against covid-19 vaccines run on facebook in poland: Content analysis of comments Covid-19 vaccination hesitancy, misinformation and conspiracy theories on social media: A content analysis of twitter data Not just conspiracy theories: Vaccine opponents and proponents add to the covid-19 'infodemic'on twitter Understanding antivaccination attitudes in social media The online competition between pro-and anti-vaccination views The anti-vaccination infodemic on social media: A behavioral analysis Moral foundations theory: The pragmatic validity of moral pluralism Atlas of moral psychology Shifting liberal and conservative attitudes using moral foundations theory No jab, no pay and vaccine refusal in australia: the jury is out The Righteous Mind: Why Good People Are Divided by Politics and Religion. Vintage Retweet networks of the european parliament: evaluation of the community structure Fast unfolding of communities in large networks Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software Gephi: an open source software for exploring and manipulating networks The psychological meaning of words: Liwc and computerized text analysis methods How the candidates are thinking: Analytic versus narrative thinking styles When small words foretell academic success: The case of college admissions essays Appraisal of the fairness moral foundation predicts the language use involving moral issues on twitter among japanese Coronavirus pandemic (covid-19) Exploiting emotions for fake news detection on social media Moralitybased assertion and homophily on social media: A cultural comparison between english and japanese languages The strategy behind anti-vaxxers' reply behavior on social media Aggressive behaviour of anti-vaxxers and their toxic replies in english and japanese When corrections fail: The persistence of political misperceptions Automated fake news detection using linguistic analysis and machine learning An algorithm for estimating human emotions by using semantic information of words Twitter makes it worse: Political journalists, gendered echo chambers, and the amplification of gender bias Twitter reciprocal reply networks exhibit assortativity with respect to happiness Reply trees in twitter: data analysis and branching process models Understanding the demographics of twitter users Social media update 2016 Twitter and facebook are not representative of the general population: Political attitudes and demographics of british social media users We would like thank the members of CREST projects (JPMJCR20D3 and JPMJCR17A4) for fruitful discussions.