key: cord-0590182-ubdcgjyp authors: Valensise, Carlo M.; Cinelli, Matteo; Nadini, Matthieu; Galeazzi, Alessandro; Peruzzi, Antonio; Etta, Gabriele; Zollo, Fabiana; Baronchelli, Andrea; Quattrociocchi, Walter title: Lack of evidence for correlation between COVID-19 infodemic and vaccine acceptance date: 2021-07-16 journal: nan DOI: nan sha: 0eca986cea37b502694eb91c5ec46eb3c269b0d7 doc_id: 590182 cord_uid: ubdcgjyp How information consumption affects behaviour is an open and widely debated research question. A popular hypothesis states that the so-called infodemic has a substantial impact on orienting individual decisions. A competing hypothesis stresses that exposure to vast amounts of even contradictory information has little effect on personal choices. The COVID-19 pandemic offered an opportunity to investigate this relationship, analysing the interplay between COVID-19 related information circulation and the propensity of users to get vaccinated. We analyse the vaccine infodemics on Twitter and Facebook by looking at 146M contents produced by 20M accounts between 1 January 2020 and 30 April 2021. We find that vaccine-related news triggered huge interest through social media, affecting attention patterns and the modality in which information was spreading. However, we observe that such a tumultuous information landscape translated only in minimal variations in overall vaccine acceptance as measured by Facebook's daily COVID-19 Trends and Impact Survey (previously known as COVID-19 World Symptoms Survey) on a sample of 1.6M users. Notably, the observation period includes the European Medicines Agency (EMA) investigations over blood clots cases potentially related to vaccinations, a series of events that could have eroded trust in vaccination campaigns. We conclude the paper by investigating the numerical correlation between various infodemics indices and vaccine acceptance, observing strong compatibility with a null model. This finding supports the hypothesis that altered information consumption patterns are not a reliable predictor of collective behavioural change. Instead, wider attention on social media seems to resolve in polarisation, with the vaccine-prone and the vaccine-hesitant maintaining their positions. have eroded trust in vaccination campaigns. We conclude the paper by investigating the numerical correlation between various infodemics indices and vaccine acceptance, observing strong compatibility with a null model. This finding supports the hypothesis that altered information consumption patterns are not a reliable predictor of collective behavioural change. Instead, wider attention on social media seems to resolve in polarisation, with the vaccine-prone and the vaccine-hesitant maintaining their positions. Social media platforms have radically changed how we access information and are often used as a proxy to understand social mood and opinion trends. Users online tend to acquire information adhering to their beliefs and ignore information dissenting from their views [6, 7, 8] . This process, combined with the unprecedented amount of information available online, has fostered the emergence of groups of like-minded individuals framing and reinforcing a shared narrative (i.e., echo chambers) [9, 10, 11, 12, 13] . Furthermore, recent studies provided evidence for the effect of feed algorithms in bursting polarisation of social dynamics [14, 15] . Such a scenario may be considered a fertile environment for misinformation spreading, and an eventual threat for democracies [16, 17] . However, the effect of the interplay between online information diffusion and offline user behaviour is still an open scientific question, with fundamental societal implications especially during a crisis like the ongoing pandemic [1, 18, 19, 20] . Indeed, the World Health Organisation raised concerns about the effects of the so-called "infodemic", defined as "overabundance of information -some accurate and some not -that occurs during an epidemic" [21, 22] , on global health. Two main hypotheses compete in accounting for the impact of information consumption on human behaviour. The first states that infodemic has a substantial impact on orienting individual decisions [1, 2] . The second view stresses that exposure to vast amounts of even contradictory information has little effect on personal choices [3, 4] . In this work we investigate the possible connection between online information and offline behaviour, by looking at possible correlation between the COVID-19 infodemic and vaccine acceptance as measured by intention to getting a vaccine. First, we analyse the news diet of 20M unique users on Facebook and Twitter over 16 months (from 1 January 2020 to 30 April 2021) to investigate the relationship between online discussions about COVID-19 vaccines and offline vaccine acceptance rate, in the six European countries (Denmark, France, Germany, Italy, Spain, and United Kingdom) that were mostly involved in the investigation performed by European Medicines Agency (EMA) about cases of blood clots occurred after some vaccinations To measure how the online debate may reverberate in offline intentions, we consider Facebook's daily COVID-19 Trends and Impact Survey [5] on a sample of 1.6M users. Considering the overall observation period, we find that vaccine announcements triggered users' engagement on social media massively. However, we do not observe significant variations in the vaccine acceptance rate during the same period.Second, to further investigate this lack of evidence for a correlation between infodemic and vaccine acceptance, we focus on the effects of the temporary suspension of the AstraZeneca (now Vaxzevria) vaccine issued by several EU countries and EMA. Also in this case minimal variation in the vaccine acceptance curves was observed after the information related to this event started circulating. Finally, we extend our analysis to 43 countries worldwide and corroborate our findings by testing correlation between vaccine acceptance curves and several infodemic indices, as measured by the COVID-19 Infodemic observatory [23] . We analyse the social media debate around vaccine-related topics on Facebook and Twitter, collecting a large corpus of posts selected via keyword search (see Methods). First, we perform topic modelling on the dataset, employing a Deep Learning based approach (see Methods). In this way, we can assign posts to different arguments while studying their evolution over time. Figure 1 reports the most debated topics. Consistently with the search, most of the keywords are covid pfizer coronavirus vaccine cases dose deaths people anti total vaccine covid dose received pandemic second today people doses vaccination shot vaccine free season year protect influenza health getting important immune boost immunity healthy exercise health body help boosting fitness astrazeneca vaccine covid questions oxford health join vaccines blood clots covid vaccine county appointments older vaccination eligible clinic health related to the vaccines and the general vaccination campaign. "Pfizer vax," "Vax campaign," "AstraZeneca Vax" and "COVID-19 appointments" share a similar trend as they all start increasing around November 2020, which coincides with the disclosure of Pfizer [24] and AstraZeneca [25] vaccine efficacy statistics. The debate about "Seasonal flu vax" peaked around autumn 2020 while "Boosting immunity" shows a roughly constant growth rate. Next, we consider the social media traffic of different post categories: Link, Photo, Status, Video. In the case of Facebook ( Figure 1C ), we observe an increase in the number of links after the announcement of Pfizer's efficacy on 18 November 2020. This increase unfolded into a surpass of the Link category over Photo after the AstraZeneca efficacy announcement on 23 November 2020, which marked a shift in the most frequent type of content posted. A similar dynamic can be observed on Twitter ( Figure 1D ). Since external links usually point to news or scientific articles (see Figure 6 of SI), we may argue that the increase of links circulation on social media right after vaccines announcement is a signal of a sudden information void that was promptly fulfilled [22] . This behaviour may reflect the urge of people to inform themselves about vaccines, whose safety has always been a matter of discussion and one of the main arguments used by the anti-vax community [26] . We now focus on a specific event of paramount relevance during the general European vaccination. On 10 March 2021, the European Medicines Agency (EMA) reported a rare incidence of blood clots after the vaccine inoculation that induced many European countries to implement a temporal suspension of AstraZeneca [27, 28] . The event triggered a relevant increase of traffic on social media, as shown in Figure 2 . Specifically, we report the temporal evolution of the social debate -measured through the cumulative number of posts -for the first four officially approved vaccines (Pfizer, AstraZeneca, Moderna, and Johnson&Johnson) on Twitter (first row) and Facebook (second row). The social debate around the selected COVID-19 vaccines reaches an increasingly wide audience as shown by the number of unique users involved on Twitter (third row) and Facebook (fourth row). The volume of posts and users sharply increases in correspondence with the suspension date of the AstraZeneca vaccine for every nation and on both platforms. In contrast, the other three types of vaccines show only a moderate growth. We now investigate the effect of these intense online debates on people's actual intentions, as measured by proxies of vaccine acceptance. To do so, we leverage data from the COVID-19 Trends and Impact Survey [5, 29] , part of the broader Facebook Data for Good project. In this survey, a sample of Facebook users is asked several questions about behaviours and concerns related to COVID-19 on a daily basis. We consider positive answers about vaccination intention from 44 countries (see Figure 8 respondents per day for South Korea. The response to this survey is a valid proxy of the offline users' behaviour [30, 31] and a good indicator of vaccine acceptance at the country level. For each country, we perform a linear fit of the vaccine acceptance rate time series (from 23 January to 30 April 2021), i.e. the percentage of users willing to get vaccinated. Relatively flat trends results for Denmark, France, Germany, Italy, Spain, while United Kingdom shows a slightly decreasing trend (see Figure 3A ). In general, the average slope computed for the whole set of 44 countries (see Figure 3C ) is (1.6±8.1)·10 −2 [%/day], thus being compatible with close-to-zero values. Furthermore, by performing a t-test on the distribution of slopes we fail to reject the (null) hypothesis that the average slope is zero (t = 1.33, p = 0.19). In Figure 4 we report a focus on the vaccine acceptance rate in correspondence of the EMA announcement (9 March 2021) about blood clots cases initially occurred in certain EU countries [27] . The empasse occurred after 9 March and lasted until 18 March 2021 when EMA declared that the benefits of As-traZeneca vaccination outweigh risks [28] . In the six considered countries the uncertainty about the side effects of vaccinations translated into a sudden drop of a few percentage points (3-4%). Thus, despite the great echo generated over the web (see Figure 2 ) by the news about AstraZeneca vaccinations, we do not observe any relevant variation in vaccination intentions. To further investigate the possible link between online infodemics and vaccine intentions, we consider the quantitative infodemic indices developed and measured by the COVID-19 Infodemics Observatory [23] , and quantify the correlation with vaccine acceptance rates described earlier. We consider the Infodemic Risk Index (IRI), the Dynamic Infodemic Risk Index (DIRI), and the number of Tweets. In particular, the IRI is "estimated indirectly on the basis of the number of followers of users who tweeted, retweeted or quoted unreliable news about COVID-19", the DIRI is "estimated directly from users' online endorsement and engagement to evaluate at which rate an user interacts with online messages pointing to potentially unreliable sources of misinformation or disinformation about COVID-19" [23] while the number of Tweets is a simple count. As shown in Figure 5 , we find that the observed correlations are compatible with a null model in which the relationship between infodemics indices and vaccine acceptance rates is randomised (i.e., the infodemic index of country X is paired with the curve concerning vaccine acceptance of a randomly extracted country Y, and then the correlation between these two signals is measured, see Methods). In fact, we note that (on average) 66%, 93% and 99% of the observed correlations fall within one, two and three standard deviations of the normal distributions deriving from the null model (further details are reported in Table 4 and Figure 9 of SI). This finding corroborates the hypothesis of a weak or absent link between the exposure to an overabundance of information on social media platforms and offline behaviours. Our investigation shows that vaccine acceptance has been mostly stable since vaccines were first released, in stark contrast with the major changes in vaccinerelated information production and consumption observed on social media platforms over the same period. Vaccines for COVID-19 were approved for emergency use. The information landscape became unavoidably turbulent and subject to continuous updates of scientific evidence from many information providers. Great concern was expressed towards the possible impact of misinformation that, leveraging the absence of an established and scientifically shared consensus about vaccines, could limit the effectiveness of vaccination campaigns. More generally, an open research question is whether infodemics might orient people's intention of behaviours. Our results suggest that, at least for COVID-19 vaccinations, this may not be the case, at the aggregated level, and that the evidence is more con- sistent with the concept of echo chambers enhancing conservatism of opinions. This result highlights a crucial methodological aspect, namely that studying the phenomenon only through the lenses of online social media might lead to misleading conclusions. Indeed, the abrupt increase in information consumption does not seem to pair with the almost steady longitudinal behaviour of the acceptance rates. It is important to highlight the limitations of our study. First, we considered only two social media platforms (Facebook and Twitter) to measure information spreading. As social media platforms widely differ with respect to their structure, audience and content moderation policy, further investigations are required to reveal possibly different correlation scenarios. Second, we considered only some -albeit prominent -definitions of infodemics. Further possibilities exist, and future studies will be able to assess the consistence of our findings across definitions [32] . Third, we considered a single survey about vaccine acceptance. However, to the best of our knowledge, the COVID-19 Trends and Impact Survey is the only one that has enough granularity to observe immediate reactions to EMA's investigation. For instance, publicly accessible data from the MIT COVID-19 survey [33] provide only 19 data points over nine months while another available survey involves only low and medium income countries [34] . Finally, another potential limitation comes from the timescale of our investigation. Since building cognitive frames and vaccine hesitancy could be slow processes, the analysis of short time periods might be unable to capture long term effects. However, COVID-19 vaccines were announced in late 2020 and by mid 2021 the vaccination campaign was mature in the considered countries, hence short time effects (or lack thereof) are crucial in this case. On a broader perspective, it is clear that further studies should investigate under which circumstances the link between online information consumption and offline behaviour weakens or strengthens. Data from Facebook and Twitter were collected covering the period going from 1 January 2020 to 30 April 2021. Posts from Facebook were collected using Crowdtangle [35] , a Facebookowned tool that tracks interactions on public content from Facebook pages and groups, verified profiles, not including paid advertising. The keyword search was performed by searching for all the possible inflections of the following terms: "immune", "dose", "vaccine", and "pharma". The search was then extended in order to include vaccines and relative brands. The aforementioned keywords and their inflections were then translated into native languages of the six countries that were taken into account. Consistently with the queries performed on Crowdtangle, we collected data from Twitter by means of a full-archive historic search within the v2 endpoint and academic research product track. The full list of keywords for the two social media platforms is reported in Tables 1 and 2 of SI. To perform topic modelling we followed the procedure illustrated in [36] . For each element of a corpus comprising several posts (including tweets and Facebook posts) we computed an embedding, i.e. a vector representation, through BERT [37] , a state-of-the-art Natural Language Processing engine. After encoding the corpus (with a number N of elements) we obtain a matrix T representing our corpus of size N × B, where B is the dimension of the embedding representation. We leveraged the pre-trained model "paraphrase-mpnet-base-v2", yielding an embedding of size B = 768. The encoding of the sentences was obtained through the sBERT package [38] . Next, to extract the leading topics from the encoded corpus, we applied the HDBSCAN [39] clustering algorithm to the rows of matrix T (after reducing the dimension of the embedding space to 5, via the UMAP algorithm [40] ). Once all the posts were clustered, we obtained a collection of documents each corresponding to a given topic. To ex-tract the most relevant words, we computed the tf-idf statistics and selected the words with the highest score. The described procedure was applied to 500000 randomly sampled posts, so to fit our computational resources. We performed the following statistical analysis to assess the significance of the correlations observed in real data. For N = 43 countries (we excluded Taiwan from the analysis since infodemic indicators were unavailable for this country) we consider time series of IRI, DIRI and #Tweets and the vaccine acceptance rate; each time series spans from 23 January to 30 April 2021. Let us consider the case of IRI for simplicity. First, we compute Spearman correlation coefficient for each country, obtaining an empirical distribution with average µ e and standard deviation σ e . Next, we compute the correlation for all possible country pairs, obtaining N (N − 1) = 1806 correlation values. For instance, while in the empirical case the IRI of Denmark is compared against the vaccine acceptance rate of Denmark, in the randomisation process we compare the IRI of Denmark with that of all the other countries except Denmark and we iterate this comparison over all countries. We obtain a null gaussian-like distribution parametrised by µ n , σ n . Next, we standardise correlation values x (both randomised and empirical) computing We also repeat the same procedure computing the Pearson correlation coefficient between the three infodemic indicators and the vaccine acceptance rate. The distribution of z is reported in Figure 5 . Experimental evidence for a scalable accuracy-nudge intervention. Psychological Science, 31 (7):770-780, 2020. [4] Katrin Schmelz. Enforcement may crowd out voluntary support for COVID-19 policies, especially where trust in government is weak and in a liberal society. Proceedings of the National Academy of Sciences, 118(1):e2016385118, 2020. Table 1 : Vaccine-related keywords. List of vaccine-related terms used for the 6 EU countries under consideration to collect data both from Twitter and Facebook. The terms ending with an asterisk indicate that they were expanded to obtain a set of meaningful terms sharing the same root. The extended list of Facebook keywords is visible in Table 2 . Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA Vaccine misinformation and social media Fighting COVID-19 misinformation on social media Exposure to ideologically diverse news and opinion on Facebook The spreading of misinformation online The political blogosphere and the 2004 US election: divided they blog Filter bubbles, echo chambers, and online news consumption Echo chambers: Emotional contagion and group polarization on Facebook Information gerrymandering and undemocratic decisions Echo chambers on social media: A systematic review of the literature Positive algorithmic bias cannot stop fragmentation in homophilic networks The echo chamber effect on social media The spread of true and false news online The psychology of fake news How to fight an infodemic The covid-19 social media infodemic The covid-19 infodemic: Twitter versus facebook Framework for managing the COVID-19 infodemic: methods and results of an online, crowdsourced who technical consultation The covid-19 infodemics observatory Pfizer and Biontech conclude phase 3 study of COVID-19 vaccine candidate, meeting all primary efficacy endpoints AZD1222 vaccine met primary efficacy endpoint in preventing COVID-19 The online competition between pro-and antivaccination views COVID-19 vaccine Astrazeneca: PRAC preliminary view suggests no specific issue with batch used in Austria COVID-19 vaccine Astrazeneca: PRAC investigating cases of thromboembolic events -vaccine's benefits currently still outweigh risks Weights and methodology brief for the COVID-19 symptom survey by The missing season: The impacts of the COVID-19 pandemic on influenza Household COVID-19 risk and in-person schooling Autopsy of a metaphor: The origins, use and blind spots of the 'infodemic Global survey on covid-19 beliefs, behaviors, and norms Covid-19 vaccine acceptance and hesitancy in low-and middle-income countries CrowdTangle Team. Crowdtangle. 2020. Facebook BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics Pre-training of deep bidirectional transformers for language understanding Sentence-BERT: Sentence embeddings using siamese BERT-networks hdbscan: Hierarchical density based clustering Solid line: Evolution of the number of respondents for Countries with more than 500 average daily responses. The considered time window ranges from 23 The authors acknowledge the 100683EPID Project "Global Health Security Academic Research Coalition" SCH-00001-3391. Walter Quattrociocchi wants to thank Michele Secci for advice and inspiration.