key: cord-0136440-8nu1890m authors: Ng, Lynnette Hui Xian; Carley, Kathleen title: Flipping Stance: Social Influence on Bot's and Non Bot's COVID Vaccine Stance date: 2021-06-21 journal: nan DOI: nan sha: c376e390c13934658f1d341a144fe9cc85f3e147 doc_id: 136440 cord_uid: 8nu1890m Social influence characterizes the change of opinions in a complex social environment, incorporating an individual's past stances and the impact of interpersonal influence through the social network influence. In this work, we observe stance changes towards the coronavirus vaccine on Twitter from April 2020 to May 2021, where 1% of the agents exhibit the stance flipping behavior, of which 53.7% are identified bots. We then propose a novel social influence model to characterize the change in stance of agents. This model considers an agent's and his neighbor's past tweets and the overall network structure towards a stance score. In our experiments, the model achieves 86% accuracy. In our analysis, bot agents require lesser social influence to flip stances and a larger proportion of bots flip. Social influence characterizes the change of opinions in a complex social environment, incorporating an individual's static conditions (past posts) and the impact of interpersonal influence from his social network [5] . Previous influence studies on social media involved the construction of an influence locality model to predict retweet behavior using attributes like personal attributes, number of followers/followees and reciprocal following relationships [22] , and modelling how majority opinions influence an individual's opinion on Twitter through Markov state transitions [14] . In particular, Xia and Liu observed that an individual's conformity to social influence and initial level of susceptibility are crucial to vaccination stance [20] . The 2020 coronavirus pandemic sent the world into a standstill and researchers scrambled to develop a vaccine that would ease the pandemic. Public opinion about vaccination has always taken two main polarizing camps, pro-vaccine and anti-vaccine. These camps are fondly termed "pro-vaxxers" and "anti-vaxxers", characterized by their stance towards vaccination. Prior work in analyzing the polarizing vaccination debate on social media [9, 18] characterized that both groups exhibit different online behavior in terms of the vaccines discussed, reach and network structure [6] . In general, the two camps interact mostly in separate echo-chambers with the same type of content [16] . Other works investigate another aspect of the debate: personas of state-sponsored actors on the polarization [19, 21] and the higher activity of bots in spreading anti-vaccine messages [3] . In combining social influence and stance detection, previous work ranked Twitter users by social influence in the polarizing BREXIT debate [7] , yet others explored the changes in neighborhood overlap of Twitter agent stances [11] . Bringing this combination one step further is the observation of stance changes between decided and undecided in a debate setting, where linguistic factors and audience factors are combined to predict whether an undecided audience member would make a stand [13] . Political studies have also examined the "flip-flopping" of stance in US electoral politics, branding the observation as an attribute of their conviction on issues brought up in presidential debates [12] and other situations like gun-control [2] . Contributions. In this work, we aim to bridge the gap between analyzing the polarizing vaccination debate and identifying individuals susceptible to stance changes due to social influence. We study network and linguistic factors that influence a Twitter agent to flip his stance and propose a novel stance flipping prediction model utilizing social influence to predict stance flips. Our model successfully predicts 86% of the agents whose stance flips in the context of COVID vaccinations. We then analyze the response of bots and non-bots to social influence, observing that bots have less conviction and flip even with fewer neighbors with the opposite stance compared to non-bots. This furthers misinformation research in identifying agents that are susceptible to changing their opinions. We collected Twitter data surrounding the COVID pandemic using the Twitter REST API using the hashtag #coronavirus on a daily basis from 1 April 2020 to 10 May 2021. We begun data collection after the Pfizer-Biotech vaccine began development (in March 2020). We filtered the data to tweets that talk about vaccines, keeping the tweets that have the sub-phrase "vaccine" in one of the hashtags. Additionally, as we are specifically looking for agents that flip stances, we disregard agents that only have 1 tweet in the dataset. Finally, we have 679,235 agents and more than 1.3 million tweets. To describe the tweets further, we labelled the tweets in terms of their stance and linguistic cues. We then combined an agent's tweets to label agents in terms of their overall stance, network centrality and mean linguistic cues. Bot Annotation. We annotated the data by performing botprobability annotation using the BotHunter algorithm at the 0.70 threshold level [1] . It extracts account-level metadata and classifies agents using a supervised random forest method through a multitiered approach, each tier making use of more features. For each user agent, BotHunter provided a probability that the account is inorganic. A probability over 70% indicates the agent is likely to be a bot. We also annotate users that self-identify as bots through having the word "bot" in their username, i.e. "coronaupdatebot". Stance Labelling. We manually inspected all vaccine-related hashtags in the dataset and classified the hashtags into pro-and anti-vaccine hashtags. We left out generic hashtags like "#vaccine" and "#covidvaccine" as they do not present a stance. The list is in Appendix 7.2. We use a network-based stance propagation algorithm which models a user-hashtag bipartite graph and propagate the stance labels between the two parts, providing a label and a confidence value for all tweets and agents [10] . We further filter the tweets to those with a defined pro-/anti-stance and their authoring agents. Linguistic Annotation. Language gives us an insight to an agent's thoughts and emotions [15] , and we infer these measures by characterizing linguistic cues. To characterize messages of both groups, we use Netmapper 1 software to count the frequency of key lexical categories including abusive absolutist, positive and negative terms. This builds on psycholinguistic theory associating particular words and expressions with behavioural, cognitive and emotional states [17] . We also included the tweet count as an agent's endogenous variables, as an indication of how expressive the agent is. Network Annotation. We measure the indication of how an agent is influenced by his neighbors by characterizing social network variables. We used ORA network analysis tool to analyze the network interactions and spread between the agents [4] . We calculated an agent's global centrality values with respect to all the other agents in the entire dataset. The centrality values are: number of followers, eigenvector centrality, total degree centrality, betweenness centrality, super friends and super spreaders. These 1 http://netanomics.com/netmapper/ variables provide an indication of how connected and influential the agents are in the network. Agent Annotation. Finally, we annotate each agent with their corresponding stance, linguistic and network values, as agents are the focus of the model. We labelled each agent's stance as the stance of his final collected tweet. We additionally kept a chronological history of each agent's stance, which we used in identifying agent stance flipping. We take the mean of each agent's tweets linguistic cues as the agent's overall linguistic cues. Agents' network values are annotated using the values generated from the network annotation, which represents each agent's global centrality value. In this section, we build a social influence model to evaluate whether an agent would flip his stance towards the vaccine. In our social influence model, we describe the formation of a stance towards the coronavirus vaccine (pro-vaccine or anti-vaccine) in terms of an agent's static variables and the interpersonal influences from other agents in the network. We describe an agent's stance in terms of variables and the process linking them. Specifically, an agent's stance towards the vaccine is dependent on his previous stances and linguistic cues of his tweets and his neighbor's information. Agent stance. We define agent stances with the following model: = , in which is an agent stance outcome score, is an 1 × matrix of scores on endogenous and exogenous variables of the agent and is a 1 vector of coefficients giving the effects of each of the endogenous variables. In our study, we used agents' linguistic cues as endogenous variables and network values as exogenous variables. Since we are probably analyzing only but a subset of the variables that might affect an agent's stance, we partition the and matrices. That is, the equation is modified to represent observations and coefficients of a subset of variables as in Equation 1. The Base Influence Model. The base influence model estimates the impact of an agent's past tweets and influence from his neighbors on his stance. Neighbors are other agents that has made communication with the agent in focus. The opinions of these neighbors, or "peers" in the social influence model, have a direct effect on an individual's opinions. On Twitter, this means a reply, retweet or mention by either neighbor agent and agent in focus. Equation 2 represents the base influence of stance upon an agent by his neighbors. is an agent's influence stance outcome score, comprising of the sum of stances of the neighbors in the agent's network. A 1st degree neighbor is a node that is one hop away from the agent, a 2nd degree neighbor is nodes two hops away from the agent, and so forth. Based on the number of hops away from the agent, the influence of the node's stance on the agent decreases by a scalar multiple such that each neighbor in that hop contributes an equal influence on the agent in focus. This concept is borrowed from the Katz Centrality concept. For the 1st degree neighbor, each neighbor of the total 1st-degree neighbours contributes 1 influence on the agent; this is further reduced by a scalar multiple of 1 for 2nd degree neighbors and so on. Figure 1a illustrates how neighbors are weighted based on their distance to the agent in focus for neighbors up to the 2nd degree. The influence weights of each neighbor is a function of the number of neighbors the nodes has and the distance to the agent. We calculated the influence weights of each neighbor as the number of hops from the agent increases for 20,000 agents. The results are plotted in 1b, in which by the elbow rule, the optimal number of hops away from an agent node is 2 hops. The influence per neighbor exponentially decays and tends to 0 as the number of hops increases. As such, our stance flipping prediction model considers only the influence of the first and second degree neighbors. The model that considers only the first degree neighbor influence is the Base Model, presented in Equation 3 . Equation 4 extends the base model to evaluate the effect of adding the influence of second degree neighbors. We enhance this base model by adding mechanisms: stance strength, connection and reciprocity. Stance Strength. The first mechanism we add is the effect of stance strength on an agent's outcome score: = * * , where is a scalar representing the agent's stance strength and its importance. Stance strength alludes to the fact that the more an agent expresses a stance, the stronger he believes in it. It is defined as the proportion the final stance is expressed against the number of expressed stances , multiplied by the a variable importance value . With this mechanism, neighbor's stances are calculated similarly. Connection. Connection is the proportion of neighbors that support an agent's stance: = #neighbors with same stance #neighbors (6) Connection represents opinion similarity between the agent and his neighbors, which lends strength to the stance an agent expressed. Collectively, agent (and neighboring agent) stances are enhanced with this mechanism: = * * . Reciprocity. Reciprocity is the two way interaction between two agents. The higher the reciprocity value, the closer the agents are in friendship, leading to a higher influence on the agent. This mechanism thus modifies the stance score of a neighbor agent: ℎ = * * + . The stance score of the agent in focus remains the same. Deflection Score. We define a deflection score in Equation 8, which characterizes the difference in the score between the agent's stance and the influences from the variables. The agent will flip his stance if ≥ . For the base model, we set at 10% of the number of agents. For the models with 2nd degree neighbor information, is 1% of the total number of agents, reflecting the proportion of the number of agents that flip stances in the overall dataset. We need to determine the coefficients * of the variables in the model. As such, we performed a binary classification task with a decision tree model using the Python sklearn library. We run this decision tree across the entire dataset to collectively determine feature importances. The task uses all the defined linguistic and network variables to classify whether the agent flips or not. We performed a five-fold cross-validation with an 80-20 train-test split. To account for the huge class imbalance, we used the stratified sampling method which makes sure that both the train and test sets have both types of agents. We then extracted the feature importance from the decision tree model, which are used as the variable importance matrix * . We apply the social influence model on our dataset to predict stance flips. We only investigate agents who have more than 1 tweet in order to have changes in vaccine stances. For these agents, we leave out each agent's last stance, and use the collected historical data to predict the final stance. However, in the collected historical data, we do include agents that have only one tweet, as they contribute influence to the agents in focus. We progressively add mechanisms to the base model, studying the effects each variable has on improving the model. We measure the macro-F1 accuracy to factor for the unbalanced dataset. Then, we analyze the social influence and tendency to flip by the two classes of data: bots and non-bots. Across the dataset, we collected 679k agents with 1.3 million tweets. The vaccine-related tweets were mostly of the languages English, French and Spanish. 32% of the dataset are classified as bots and 1.6% of the agents self-identify as bots. The proportion of stances for tweets and agents in the dataset are around the same: 90% provaccine and 10% anti-vaccine. In total, only 1% of the agents exhibited the stance flipping behavior. Most agents flipped from provaccine to anti-vaccine. Table 1 shows two examples of agents that flipped from pro-vaccine to anti-vaccine stances. These are original messages, i.e. the messages are written by the agents themselves and are not retweets, quoted tweets or replies. Based on a five-fold cross validation run on a decision tree, the most important features are: (a) linguistic variables: number of tweets, average word and sentence length, reading difficulty; (b) network variables: number of followers, eigenvector centrality, super spreaders and betweenness centrality. These importance values are used as the coefficients in * in the social influence model. The importance scores of each endogenous and exogenous variables of the agents are reflected in the Appendix at Table 4 . The importance values are used as the coefficients in the social influence model. We perform incremental experimental runs on our dataset, each run adding a mechanism to the model. The results are presented in Table 2 . Our final stance flipping model outperforms all the other models with an accuracy score of 86%, showing that a combination of all the identified factors are important to the influence of the agent stances. Ablation analysis where we removed either network or linguistic variables in the base model shows a low prediction score of around 0.17%, indicating that both linguistic and network variables contribute to the success of the model prediction. Our baseline decision tree model performs at 37% accuracy. Our base prediction model that takes in only the first degree neighbor information performs similar to the decision tree model. The accuracy increases 11% with the addition of information from second degree neighbors, indicating the importance of indirect influence on an agent stance. While there is a slight 5% increase in accuracy with the addition of connection, the accuracy increases drastically at 11% with addition of reciprocal ties, showing that the stronger the tie between two agents, the stronger the influence mechanism. Out of the agents that flip their stances, 53.7% are identified as bots by the BotHunter algorithm. 6.6% of the overall bot population flip stances while only 2.7% of non-bot agents flip stances. Bots are easier to predict, resulting in a higher accuracy score than non-bots agents. We show an example of an identified bot agent that flip stance in Table 3 . This account repeats a message from the antivaccine camp several times, before repeating a message from the pro-vaccine camp. The bot population has a lower deflection score than the non-bot population and the overall population average, which is visualised in Figure 4 . The histogram of deflection scores of non-bots are shifted to the right, showing they generally form more interactions with other agents (connections/ reciprocal) and are more convicted on their stance. Figure 2 and 3 show positive results for bot and nonbot agents respectively: agents that are predicted to flip according to the social influence model and their final stance indeed is a flipped stance. In general, we observe that agents that are detected to flip have a very strong network influence of the opposite stance, emphasizing the importance of peer effects, where connected agents have a strong influence over an agent's opinion. Compared to nonbot agents that flip stances (Figure 3 , non-bot graphs are typically very sparse and connected to one or two other large clusters. Selfdeclared bot accounts with the word "bot" in their user names typically have large deflection scores that are in the 95th-percentile zone of the deflection scores dataset, and 5% of these agents flip stances. Figures 5a and 5b depict the deflection scores against the number of 1st and 2nd degree neighbors that have the opposite stance. Bot agents flip stances at a lower value of opposite stances of neighbors compared to non-bot agents, i.e. the number of 1st and 2nd degree neighbors with stances opposite to the agent's current stance. Nonbot agents are harder to predict as there is no clear distinction between the deflection scores of agents that flip and those that don't. In this paper, we constructed a social influence model to predict whether an agent on Twitter will change his stance towards the coronavirus vaccination. The model was incrementally built from the base influence model which estimates the impact of an agent's past tweets and his neighbors on his stances, then additional mechanisms of stance strength, connection and reciprocity were added. We also further investigate whether social influence has differences between bot and non-bot accounts. Agent 2 If you're vaccine hesitant, just a reminder: COVID-19 is not remotely human hesitant #COVID19 #VaccinesWork When a business has a 20 times return on investments u push for it the best u can #business #VaccinesWork #covid19 There is no way a vaccine can be dmonstrated to be safe and ready before year's end. I would love to be wrong on this #CovidVaccine #VaccinePassport #COVIDIOTS #covid19 why risk your precious health on a trial vacc for a disease with over 97% recovery In our estimate of linguistic variable importance, the variables word length, sentence length and Flesch-Kincaid reading difficulty score relates to readability of the tweets. Tweets that are easier to read catches other agents attention better. We observe that more importance is placed on 2nd and 3rd person pronouns compared to 1st person pronouns. Pronouns highlight the attention of the author [8]-2nd person pronouns like "you" directly addresses the reader and pulls him closer to the author; 1st person pronouns like "we" signifies the authors as embedded within a social relationship, making for a more inclusive conversation; 3rd person pronouns like "she/he" expresses opinion of others as a distinct identity from the author. In our estimate the network variable importance, we identify that eigenvector and betweenness centrality has a high variable importance. These two measures signify the influence an agent has in a network, based on the concept of connections to influential agents and information flow respectively. An agent's position in the social network is one of the key factors in influencing others. Our results contribute to the reflection of factors that influence an agent's stance in online social media: a combination of network and linguistic variables is crucial in predicting an agent's future stance. An agent is deeply influenced by the opinions of the network of neighbors around him, as observed from the increased in accuracy after the addition of second degree neighbor information and reciprocal ties. In addition, one's conviction towards a stance plays an important role in the agent flipping behavior. In our model, this is represented by stance strength, which is an indication on how easily an agent can be influenced. In our contrast analysis of bots and non-bot agents, we observe that bot agents have lesser conviction and a larger proportion flip stances (6.6%). Bot accounts flip even with fewer neighbors in the opposite direction of stances. In contrast, non-bot accounts have more conviction and a smaller proportion flip (2.7%) flip stances and require more neighbors of the opposite stance to flip. For accounts that declare themselves as bot accounts by the use of the word "bot" in their account name, the proportion of these agents flipping stances is five times higher than the population proportion. Agents in this group that flip stances tend to have large deflection scores, signifying that they mix with communities that are predominantly different from their original stance. Bot accounts typically repeat the same message from one stance several times before switching stance and repeating another message. While intuitively we expect mis/disinformation bots to hold firm to their stance and not be impacted by influence from neighbors, we postulate that bot agents easily flip their stances to match their neighbors' stances, possibly to fit into their surrounding network, so their future tweets have a higher chance of getting viewed by the network. This involves further investigation from a longitudinal perspective. Bot agents are also easier to predict stance flips, as observed by the higher accuracy score compared to the non-bot group. This, together with bots requiring a lower number of neighbors of the opposite stance for a flipped stance, suggests that bots are more prone to flipping their stances. Several limitations nuance our conclusions from this work. Users with extreme opinions are typically more vocal on social media, suggesting caution in extrapolating findings. The list of pro-vaccine and anti-vaccine hashtags needs to be continually updated as new hashtags emerge in social media lingo. Nonetheless, we hope that our work provides an understanding into characterizing agents who flip their stances on Twitter, and provides an insight into the difference difference in influence bot and non-bot agents require to change its stance. In future work, we hope to incorporate apriori assumptions about content like an agent's personal values in our stance flipping model and experimenting with a graph neural network model. In this study, we observe stance changes towards the COVID vaccine on Twitter from April 2020 to May 2021, where 1% of the agents exhibit the stance flipping behavior. To predict stance changes, we propose a novel model of stance dynamics in the Twitter social network which integrates linguistic information from an agent's past tweets and interpersonal influence from an agent's network connection. The model predicts whether an agent will flip his stance with 86% accuracy. In a contrast analysis between bots and non-bot agents, we identify that a larger proportion of bot agents flip and they flip even with fewer neighbors in the opposite direction of stances, signifying the social influence on these agents can be lesser compared to non-bot agents for them to change their stance. Bot-hunter: a tiered approach to detecting & characterizing automated activity on twitter Guns and votes Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate ORA & NetMapper Social influence and opinions Asymmetric participation of defenders and critics of vaccines to debates on French-speaking Twitter Stance and influence of Twitter users regarding the Brexit referendum Pronoun Use Reflects Standings in Social Hierarchies Semantic network analysis of vaccine sentiment in online social media Social media analytics for stance mining a multi-modal approach with weak supervision Extracting Graph Topological Information and Users' Opinion On 'flip-flopping': Branded stance-taking in US electoral politics 1 Persuasion of the Undecided: Language vs. the Listener Emotion dynamics of public opinions on twitter Psychological aspects of natural language use: Our words, our selves Polarization of the vaccination debate on Facebook The psychological meaning of words: LIWC and computerized text analysis methods Communities of shared interests and cognitive bridges: the case of the anti-vaccination movement on Twitter Russian Twitter Accounts and the Partisan Polarization of Vaccine Discourse A Computational Approach to Characterizing the Impact of Social Influence on Individuals' Vaccination Decision Making Examining Emergent Communities and Social Bots Within the Polarized Online Vaccination Debate in Twitter Who influenced you? predicting retweet via social influence locality NoVac-cine4Me, TrudeauVaccineContractsLie, vaccinesKill, VaccinesAre-Poison, VaccinesAreNotCures, NotAVaccine, jesusisvaccine, HALT-theVaccines, NoVaccines, dontgetthevaccine, VaccinesHarm, Team-NoVaccine, TheVaccineIsTheVirus, fakeVaccines, killervaccine, lethalvaccine, CoronaVaccineFail, vaccineBioweapon, saynotovaccines, Fuckvaccines, fuckyourvaccines, vaccinedeath, JustSayMoToVaccines, NoCOVIDVAccineMandate, NoMandatoryVaccines, GoHome-LeaveOurVaccinesAlone, antivaccine, anti_vaccine, stopcovidvaccine, CovidVaccineHesitancy, VaccinesCanKill, Antivaccines, notovaccine