key: cord-1052449-vwkcte5z authors: Marcec, Robert; Likic, Robert title: Using Twitter for sentiment analysis towards AstraZeneca/Oxford, Pfizer/BioNTech and Moderna COVID-19 vaccines date: 2021-08-08 journal: Postgrad Med J DOI: 10.1136/postgradmedj-2021-140685 sha: df901e584955cf6ad3996b3276dc9edb1d84ff5a doc_id: 1052449 cord_uid: vwkcte5z INTRODUCTION: A worldwide vaccination campaign is underway to bring an end to the SARS-CoV-2 pandemic; however, its success relies heavily on the actual willingness of individuals to get vaccinated. Social media platforms such as Twitter may prove to be a valuable source of information on the attitudes and sentiment towards SARS-CoV-2 vaccination that can be tracked almost instantaneously. MATERIALS AND METHODS: The Twitter academic Application Programming Interface was used to retrieve all English-language tweets mentioning AstraZeneca/Oxford, Pfizer/BioNTech and Moderna vaccines in 4 months from 1 December 2020 to 31 March 2021. Sentiment analysis was performed using the AFINN lexicon to calculate the daily average sentiment of tweets which was evaluated longitudinally and comparatively for each vaccine throughout the 4 months. RESULTS: A total of 701 891 tweets have been retrieved and included in the daily sentiment analysis. The sentiment regarding Pfizer and Moderna vaccines appeared positive and stable throughout the 4 months, with no significant differences in sentiment between the months. In contrast, the sentiment regarding the AstraZeneca/Oxford vaccine seems to be decreasing over time, with a significant decrease when comparing December with March (p<0.0000000001, mean difference=−0.746, 95% CI=−0.915 to −0.577). CONCLUSION: Lexicon-based Twitter sentiment analysis is a valuable and easily implemented tool to track the sentiment regarding SARS-CoV-2 vaccines. It is worrisome that the sentiment regarding the AstraZeneca/Oxford vaccine appears to be turning negative over time, as this may boost hesitancy rates towards this specific SARS-CoV-2 vaccine. The WHO officially proclaimed SARS-CoV-2 a public health emergency of international concern on 30 January 2020. As of 18 April 2021, more than 140 million cases and 3 million deaths have been reported worldwide. 1 To control the pandemic, several vaccines have been developed and approved in record time; the first to get approved for widespread use was the Pfizer/BioNTech vaccine, which was authorised for use in the UK on 2 December 2020, less than 1 year after the declaration of the pandemic. Currently, several vaccines are approved worldwide; however, the Western world relies mostly on messenger RNA (mRNA) vaccines developed by Pfizer/BioNTech and Moderna, as well as on the ChAdOx1 vaccine from AstraZeneca/ Oxford. A worldwide vaccination campaign is underway to bring about an end to the pandemic, but the success of such a campaign relies heavily on the actual willingness of individuals to get vaccinated. According to our prior work, it seems that a significant proportion of European countries could face difficulty in reaching adequate immunisation levels and will need to conduct interventions to increase the willingness of their populations to get vaccinated. 2 Planning such interventions could be difficult, as public attitudes towards vaccines can change in response to recent events and even differ between different COVID-19 vaccines. Traditionally, when planning such interventions, surveys would be used to gather data on vaccination hesitancy; however, although surveying remains a valuable tool for information gathering, its implementation is often costly and time-consuming, while the results only provide a static representation of the real situation, making this method impractical for tracking dynamic variables such as the attitude towards COVID-19 vaccines in real time. On the contrary, social media platforms such as Twitter may prove to be a valuable source of information that can be tracked and evaluated almost instantaneously. The idea of using social media as a source of information in pandemic times is not new, as Twitter has already been used to conduct an infodemiology study of the 2009 H1N1 outbreak. 3 In the context of the COVID-19 pandemic, Twitter has so far been used in several studies to identify users' emerging concerns, 4-7 misinformation spread 8 and general sentiment. 9 10 Studies exploring the attitudes of Twitter users towards COVID-19 vaccination seem to be very few and were focused on COVID-19 vaccination in general. [11] [12] [13] To the best of our knowledge, this is the first article assessing specific sentiment towards Pfizer/BioNTech, AstraZeneca/ Oxford and Moderna vaccines, as well as events that shaped it over time. The Twitter academic API (Application Programming Interface) was accessed using R (V.4.0.5) programming language with the function 'histor-ical_search()' 14 on 2 April 2021. Three separate searches were conducted for each vaccine of interest: AstraZeneca/Oxford, Pfizer/BioNTech and Moderna. All English-language tweets posted in Original research the time frame from 1 December 2020 to 31 March 2021 that corresponded to the search phrase were retrieved and included in the sentiment analysis. The search phrase for the AstraZeneca/ Oxford vaccine was ""AstraZeneca vaccine" OR "Oxford-AstraZeneca vaccine" OR "Oxford vaccine""; for the Pfizer/ BioNTech vaccine was ""Pfizer vaccine" OR "Pfizer-BioNTech vaccine" OR "Pfizer/BioNTech vaccine""; and for the Moderna vaccine was ""Moderna vaccine."" Retweets were not retrieved or analysed. In our sentiment analysis, we used the AFINN lexicon, 15 a tool specifically designed for sentiment analysis of microblog posts such as Twitter tweets. The lexicon contains 2477 words given a value from −5 (highly negative) to +5 (highly positive). Using the tidytext package, the retrieved tweets' text was tokenised to words using the 'unnest_tokens()' function and merged with the AFINN lexicon from which the average daily sentiment was calculated and graphically represented for each vaccine. Dated Google searches were conducted to identify potential events and news reports that had a temporal and likely causal relationship with changes in the average daily sentiment. Statistical analysis was conducted in the R programming language to compare changes in the sentiment for each vaccine over time but also comparatively between the vaccines in each month. Non-parametric Kruskal-Wallis and post hoc Games-Howell tests were used due to the non-normal distribution of the data in some months (Shapiro-Wilk test p<0.05), and a p value of less than 0.05 was considered statistically significant. The study was approved by Twitter and granted access to the Twitter academic API used to retrieve the tweets. All retrieved tweets are part of the public domain and are publicly available, so no ethics review was necessary. Nonetheless, the authors adhered to the highest ethical principles in dealing with the retrieved data; no individual tweets as such were analysed or in any way displayed in this article. Although a large number of tweets was retrieved, after calculating the average daily sentiment, all identifiable information, as well as individual tweet text, has been deleted. If necessary, the data may easily be again retrieved following the methods described earlier in the text. A total of 701 891 tweets have been retrieved and included in the daily sentiment analysis: 47.48% (n=333 234) mentioning the Pfizer/BioNTech vaccine, 36.75% (n=257 920) mentioning the AstraZeneca/Oxford vaccine and 15.78% (n=110 737) mentioning the Moderna vaccine. The country of origin was known for 19.79% (n=1 38 891) of tweets, and this is visually represented in figure 1. Most tweets with a known country of origin came from the English-speaking countries: USA (22.44%, n=31 168), UK (13.46%, n=18 690), Canada (10.17%, n=14 126), India (8.95%, n=12 429), Australia (6.31%, n=8764), Ireland (5.25%, n=7289) and Nigeria (4.15%, n=5763). The AstraZeneca/Oxford vaccine was mentioned in 257 920 tweets that have been retrieved and analysed. The daily number of tweets is shown in figure 2A , and substantial increases in the daily number of tweets can be seen on 30 December 2020, the day on which the AstraZeneca/Oxford vaccine was approved in the UK as well as throughout March 2021, corresponding to the reports of postvaccination thrombotic side effects. Figure 2B shows the daily average sentiment and events/news reports that correlate with spikes/changes in the sentiment. As can be seen by the trend line (blue) on figure 2B , the average daily sentiment of tweets mentioning the AstraZeneca/ Oxford vaccine seems to be in a downward trend. A total of 110 737 tweets mentioning the Moderna vaccine have been retrieved and analysed. The daily number of tweets can be seen in figure 4A , and a substantial increase in the daily number of tweets can be observed around 18 December 2020, which corresponds to the approval of the Moderna vaccine in the USA, and on 25 January 2021, the day when the Moderna company announced that its vaccine retained its neutralising activity against emerging UK and South African SARS-CoV-2 variants. Figure 4B shows the daily average sentiment and identifies events/news reports that could correlate with spikes/changes in the sentiment. As can be seen by the trend line in figure 4B and the statistical analysis results in table 1 and figure 3B, the sentiment of tweets mentioning the Moderna vaccine seems to be holding positive and stable, with no statistically significant differences between the ensuing months or when comparing December with March (p=0.986). The Pfizer/BioNTech vaccine was mentioned in 333 234 tweets that have been retrieved and analysed. The daily number of tweets is shown in figure 5A , and a substantial increase in the daily number of tweets can be seen around 2 December 2020, which would correspond to the approval of the Pfizer/BioNTech vaccine in the UK as well as around 9 December 2020 when the Medicines and Healthcare products Regulatory Agency (MHRA) issued its anaphylaxis warning. Figure 5B shows the daily average sentiment and identifies events/news reports that would correlate with spikes/changes in the sentiment. As can be seen by the trend line in figure 5B and the statistical analysis results in table 1 and figure 3C, the sentiment of tweets mentioning the Pfizer/BioNTech vaccine also seems to be remaining positive and stable, with no statistically significant difference between the subsequent months or when comparing December with March (p=1). A comparison of the three vaccines for each month can be seen in figure 3D-G and table 2. In December, the sentiment regarding the AstraZeneca/Oxford vaccine was higher than those of the Moderna (p=0.003, mean difference=0.325, 95% CI=0.0986 to 0.552) or the Pfizer/BioNTech (p<0.0000001, mean differ-ence=0.475, 95% CI=0.300 to 0.650) vaccine, whereas there were no significant differences between the sentiments of Moderna and Pfizer/BioNTech vaccines (p=0.287) (figure 3D). In January 2021, no significant difference between the two mRNA vaccines (p=1) or Moderna and AstraZeneca/Oxford (p=0.166) vaccines was observed, but the sentiment of the Astra-Zeneca/Oxford vaccine was significantly higher than that of the Pfizer/BioNTech vaccine (p=0.007, mean difference=0.199, 95% CI=0.0470 to 0.352) (figure 3E). In February, no significant difference between any of the vaccines was observed (figure 3F), whereas in March the sentiment regarding the Moderna vaccine was significantly higher than that of the AstraZeneca/Oxford (p<0.00000000001, mean difference=0.450, 95% CI=0.324 to 0.576) and Pfizer/BioNTech (p<0.001, mean difference=0.185, 95 % CI=0.0771 to 0.293) vaccines, while the Pfizer/BioNTech vaccine had a higher sentiment than the AstraZeneca/Oxford 3G ). Using a simple, inexpensive and elegant lexicon-based method, our Twitter sentiment analysis has produced relevant results regarding the sentiment towards AstraZeneca/Oxford, Moderna and Pfizer/BioNTech COVID-19 vaccines in 4 months. We have also identified a number of events/news reports that may help explain some of the sentiment changes. Moreover, the temporal correlation of the events with the sentiment gives a degree of validation to the results of this study. Comparing the sentiment between the three COVID-19 vaccines, our results indicate that the sentiment regarding Pfizer/ BioNTech and Moderna vaccines remained positively stable throughout the 4 months, whereas that of the AstraZeneca/ Oxford vaccine seems to be decreasing in positivity, reaching a slightly negative average in March 2021, most likely due to the thrombotic thrombocytopenia reports, but possibly also owing to the negative publicity caused by the supply issues AstraZeneca faced in the European Union. Pfizer/BioNTech and Moderna vaccines also experienced periods or spikes of negative sentiment corresponding to the reports of postvaccination anaphylaxis reactions; however, it seems that this did not have a long-term impact on the sentiment. The decrease in sentiment between December and March regarding the AstraZeneca/Oxford vaccine is worrisome, as it may indicate that the vaccine is now generally negatively perceived, which may increase vaccine hesitancy and lead to refusal to vaccinate with this specific vaccine. Although the European Medicines Agency (EMA) statement regarding the still favourable risk-benefit ratio of the AstraZeneca/Oxford vaccine had a positive impact on the sentiment, its impact seems to have been short-lived. One has to wonder whether the sentiment regarding the AstraZeneca/Oxford vaccine might have been more positively impacted if the message of a favourable risk-benefit ratio had been conveyed more clearly and convincingly and/or whether the EU countries did not experience supply problems with the AstraZeneca/Oxford vaccine. Interestingly, the sentiment regarding the AstraZeneca/Oxford vaccine was higher in December 2020 than that of Pfizer/BioN-Tech and Moderna vaccines. A possible explanation is that the AstraZeneca/Oxford vaccine was perceived as 'safer' due to the use of an adenoviral vector platform, which the public deemed as a more 'tried and true' technology in comparison with the 'new' mRNA platform. Furthermore, the fact that AstraZeneca/ Oxford committed to providing the vaccine to low-income and middle-income countries on a not-for-profit basis during the pandemic through the WHO COVAX programme likely also positively impacted the vaccine's perception. In contrast, the sentiment regarding Pfizer/BioNTech and Moderna vaccines was impacted by the anaphylaxis reports occurring mostly in December 2020. In addition, it seems that an important factor affecting the sentiment concerning all three vaccines is their efficacy against new and emerging SARS-CoV-2 variants. The interest of the Twitter community for this topic seems to be so strong, that we managed to identify two preprints 16 17 which were widely reported on by the news sites and which described the efficacy of the Pfizer/BioNTech vaccine against emerging coronavirus variants, thereby resulting in a significantly positive sentiment increase. Although the preprints had a positive impact and were eventually published in prestigious journals, 18 19 it may be worrisome that non-peer-reviewed preprints, where scientific accuracy at that point in time was still questionable, could have had such a significant impact on the sentiment towards vaccines. The importance of a fully transparent approach to all the data and potential questions raised by EMA and other regulatory agencies is also highlighted in the identified events. The publishing of full data regarding the Moderna vaccine by EMA and Health Canada resulted in an increase in the positivity of the sentiment towards this vaccine. In contrast, the leakage of data demonstrating that EMA questioned the stability of mRNA in the Pfizer vaccine published in the BMJ on 10 March 2021 20 led to a decrease in the sentiment, which could have perhaps been avoided if the data and questions raised were made publicly available from the start. Our study may serve as a proof of concept demonstrating that using a simply implemented method it is possible to track the sentiment towards vaccines almost in real time, allowing for the identification of events that shape it on a global or countryspecific level, especially in the English-speaking countries with a relatively large amount of Twitter users (although the AFINN lexicon is available also in Danish and Swedish and may relatively easily be translated into other languages as well). Such insight may prove valuable in enabling the planning and implementation of healthcare interventions aimed at increasing the uptake of COVID-19 vaccines and fighting vaccine hesitancy, and it may also serve to estimate the potential impact of such interventions. One of the limitations of our study is the fact that it is questionable whether the results of our analysis, and Twitter users as such, are representative of the general Englishspeaking population or country-specific population as such. Twitter is predominantly used by the scientific community, which may mean that scientific studies are more often shared and discussed and have a higher impact among Twitter users than among the general population. Also, one should be aware of the possibility of confounding events; an example would be a news report of a pharmacist destroying 500 Moderna vaccine doses, which caused a significant spike in negative sentiment in our analysis (caused in part also by the postvaccination anaphylaxis reports occurring at the same time). The negative sentiment around this event was not aimed at the vaccine as such, but rather at the loss of valuable vaccine doses. In addition, it is questionable whether the analysis of only English-language tweets could have had a confounding effect on the results of this study. Lexicon-based Twitter sentiment analysis is a valuable and easily implemented tool to track the sentiment regarding COVID-19 vaccines. High vaccine uptake is paramount for ending the pandemic, while identification of events that impact the sentiment around vaccines also allows for better planning and implementation of specific interventions. Finally, it is worrisome that the sentiment regarding the AstraZeneca/Oxford vaccine appears to be decreasing in positivity over time. In March 2021, it was on average negative, and if this trend continues, it may boost hesitancy rates towards this specific COVID-19 vaccine. Acknowledgements The authors would like to express their gratitude to Twitter for approving access to their academic API for this project (APP ID 20445325) and to associate Professor Joel W. Wood for writing and making his historical_search() function publicly available on GitHub. Contributors Both authors contributed equally to the conceptualisation and writing of this article. Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors. Map disclaimer The inclusion of any map (including the depiction of any boundaries therein) or of any geographic or locational reference does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied. Patient consent for publication Not required. Provenance and peer review Not commissioned; internally peer reviewed. ► Lexicon-based Twitter sentiment analysis is an elegant method with which it is possible to track the sentiment towards approved COVID-19 vaccines almost in real time, allowing for the identification of events that shape it on a global or country-specific level. ► While sentiment regarding Pfizer and Moderna vaccines appeared positive and stable, the sentiment regarding the AstraZeneca/Oxford vaccine seemed to have become negative, with a significant drop when comparing December 2020 with March 2021 (p<0.05, mean difference=−0.746, 95% CI=−0.915 to −0.577). ► Twitter message sentiment analysis may prove valuable in enabling planning and implementation of healthcare interventions aimed at increasing the uptake of COVID-19 vaccines. ► Machine learning and deep learning methods can also be used for the sentiment analysis of social media regarding SARS-CoV-2 vaccines; thus, several different approaches could be used and compared with the lexicon-based Twitter sentiment analysis model results as a baseline. ► Evaluate the impact of bots posting misinformation and thereby influencing social media sentiment towards vaccination. ► How can social media sentiment analysis regarding vaccine hesitancy be used for successful healthcare interventions and pandemic control? What is already known on the subject ► Planning health interventions can be difficult, as public attitudes towards vaccination change in response to recent events and differ between different COVID-19 vaccines. ► Traditionally, surveys were used to gather data on vaccination hesitancy; however, although surveying remains a valuable tool for information gathering, its implementation is often costly and time-consuming. ► Static representation of the real situation obtained by means of surveys on vaccination hesitancy makes this method impractical for tracking dynamic variables such as attitudes towards COVID-19 vaccines in real time. Will vaccination refusal prolong the war on SARS-CoV-2? Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak What are people concerned about during the pandemic? detecting evolving topics about COVID-19 from Twitter Health, psychosocial, and social issues emanating from the COVID-19 pandemic based on social media comments: text mining and thematic analysis approach Twitter-based analysis reveals differential COVID-19 concerns across areas with socioeconomic disparities Tracking COVID-19 discourse on Twitter in North America: Infodemiology study using topic modeling and aspect-based Sentiment analysis Thought I'd share first': An analysis of COVID-19 conspiracy theories and misinformation spread on twitter Examination of community Sentiment dynamics due to COVID-19 pandemic: a case study from a state in Australia Dashboard of sentiment in Austrian social media during COVID-19 Artificial Intelligence-Enabled analysis of public attitudes on Facebook and Twitter toward COVID-19 vaccines in the United Kingdom and the United States: observational study COVID-19 vaccine Hesitancy in Canada: content analysis of Tweets using the theoretical domains framework Twitter speaks: an analysis of Australian Twitter users' topics and Sentiments about COVID-19 vaccination using machine learning A new anew: evaluation of a word list for sentiment analysis in microblogs Neutralization of N501Y mutant SARS-CoV-2 by BNT162b2 vaccine-elicited sera Neutralization of SARS-CoV-2 lineage B.1.1.7 pseudovirus by BNT162b2 vaccine-elicited human sera Neutralization of SARS-CoV-2 lineage B.1.1.7 pseudovirus by BNT162b2 vaccine-elicited human sera Neutralization of SARS-CoV-2 spike 69/70 deletion, E484K and N501Y variants by BNT162b2 vaccine-elicited sera The EMA covid-19 data leak, and what it tells us about mRNA instability Data availability statement Data are available upon reasonable request. Data may be obtained from a third party and are not publicly available. Data may be obtained from Twitter and are publicly available. Data are available upon reasonable request. Aggregated data may be obtained from Twitter upon approval and are not publicly available.This article is made freely available for use in accordance with BMJ's website terms and conditions for the duration of the covid-19 pandemic or until otherwise determined by BMJ. You may use, download and print the article for any lawful, non-commercial purpose (including text and data mining) provided that all copyright notices and trade marks are retained. Robert Marcec http:// orcid. org/ 0000-0002-8750-2083 Robert Likic http:// orcid. org/ 0000-0003-1413-4862