key: cord-0877067-exohzq2s
authors: Li, Lingyao; Zhou, Jiayan; Ma, Zihui; Bensi, Michelle T.; Hall, Molly A.; Baecher, Gregory B.
title: Dynamic assessment of the COVID-19 vaccine acceptance leveraging social media data
date: 2022-03-21
journal: J Biomed Inform
DOI: 10.1016/j.jbi.2022.104054
sha: fc64b41966cb1222bd0e9e6d20c3da929195812d
doc_id: 877067
cord_uid: exohzq2s

Vaccination is the most effective way to provide long-lasting immunity against viral infection; thus, rapid assessment of vaccine acceptance is a pressing challenge for health authorities. Prior studies have applied survey techniques to investigate vaccine acceptance, but these may be slow and expensive. This study investigates 29 million vaccine-related tweets from August 8, 2020 to April 19, 2021 and proposes a social media-based approach that derives a vaccine acceptance index (VAI) to quantify Twitter users’ opinions on COVID-19 vaccination. This index is calculated based on opinion classifications identified with the aid of natural language processing techniques and provides a quantitative metric to indicate the level of vaccine acceptance across different geographic scales in the U.S. The VAI is easily calculated from the number of positive and negative Tweets posted by a specific users and groups of users, it can be compiled for regions such a counties or states to provide geospatial information, and it can be tracked over time to assess changes in vaccine acceptance as related to trends in the media and politics. At the national level, it showed that the VAI moved from negative to positive in 2020 and maintained steady after January 2021. Through exploratory analysis of state- and county-level data, reliable assessments of VAI against subsequent vaccination rates could be made for counties with at least 30 users. The paper discusses information characteristics that enable consistent estimation of VAI. The findings support the use of social media to understand opinions and to offer a timely and cost-effective way to assess vaccine acceptance.

The coronavirus disease 2019 (COVID-19) is a novel communicable disease. Along with nonpharmaceutical interventions to reduce human-to-human transmissions, the COVID-19 vaccine is the primary strategy to help contain infection, decrease morbidity, and reduce loss of life. The U.S. began its first vaccine distribution on December 14, 2020, and more than 172.2 million Americans (52.4% population) were fully vaccinated as of August 26, 2021 [1] . However, vaccine hesitancy has been an obstacle to reaching herd immunity [2] .

Government agencies and health authorities need information to assess and respond to public sentiment about vaccination, both acceptance and hesitancy. For simplicity, the term acceptance is used to encompass both acceptance and hesitancy. Prior studies have estimated vaccine hesitancy by asking respondents' willingness to receive vaccines [3] - [5] . Early survey-based studies used general wording that focused on certain groups to formulate an assessment for adolescent and childhood vaccine hesitancy [6] , [7] . With a deeper understanding of vaccine hesitancy, recent studies have integrated more aspects [4] , [5] and collected responses through a variety of techniques, such as online tools [8] , telephone interviews [9] , and systematic literature reviews [10] .

Among these survey-based models, the "3C" (confidence, complacency, convenience) model and the Working Group Determinants of Vaccine Hesitancy Matrix were widely applied, both of which were developed by the Strategic Advisory Group of Experts (SAGE) at the World Health Organization (World Health Organization (WHO) SAGE, 2014). The "3C" model focuses on measuring the "3C's lens" and identifies barriers within the community. In contrast, the Vaccine Hesitancy Matrix emphasizes external factors (e.g., culture, society, environment, and health aspects) derived from literature reviews and expert opinions. Other commonly applied indices include, for high-income countries, the Global Vaccine Confidence Index [12] and the Vaccine Hesitancy Scale [13] ; and, for the low-and middle-income countries, the Caregiver Vaccine Acceptance Scale [14] .

In the context of COVID-19, recent studies have demonstrated the usefulness of survey tools to measure the acceptance of vaccination [15] , [16] and to understand its driving factors. In particular, the U.S. Centers for Disease Control and Prevention (CDC) applied the Household Pause Survey (HPS) to evaluate vaccine hesitancy [17] . They used survey results to estimate state-level hesitancy rates and then applied a "downscaling" method to predict county-level hesitancy rates based on the Census Bureau's 2019 American Community Survey (ACS) [17] . The final result for each county is an estimate of the percentage of the population that may be vaccine-hesitant.

Survey-based methods have also been applied to assist the assessment of vaccine acceptance [5] , [18] - [20] . These may target specific groups (e.g., parents [7] , [21] , healthcare providers [22] ) and specific places. Therefore, these methods may provide limited insights on broad public perceptions of vaccine acceptance at different geographical scales. Developing survey-based indices for a wide application may demand a commitment of long-term investigation and a large effort to collect representative responses.

Collecting information from social media platforms, in contrast, can cover large geographical areas at comparatively low cost. Although these data may be of lower fidelity than survey questions, they can be obtained almost instantaneously. Throughout the COVID-19 pandemic, social media have been a forum for vaccine-related public discourse where opinions regarding the COVID-19 vaccines circulate rapidly [23] .

Opinions expressed on social media provide data resources that can be leveraged to rapidly assess vaccine acceptance.

Existing studies using social media data to assess opinions related to vaccines mainly evaluate vaccinerelated contents through semantic analysis or sentiment analysis [24] - [26] . They have captured online opinions (e.g., pro-or anti-vaccine campaigns) to help health organizations more readily recognize prevalent vaccination messages [27] - [29] . Other studies have discussed the role of social media in enhancing vaccine confidence [30] , [31] . For example, Rosen et al [31] found a positive correlation between social media engagement and HPV vaccine awareness. The study suggested that enhancing the connection between health organizations and the public through social media can help disseminate vaccine information strategically.

Studies that attempt to estimate vaccine acceptance using social media data are less common than survey studies. Piedrahita-Valdés et al. [32] applied sentiment analysis to estimate global vaccine hesitancy using Twitter data for all types of vaccines from 2011 to 2019. They found that the percentage of neutral tweets showed a decreasing tendency, while the percentage positive and negative tweets increased over time. Johnson et al. [33] investigated 100 million Facebook users expressing opinions regarding vaccination to study the evolution of pro-and anti-vaccine clusters from February to October 2019. To classify the clusters, they manually reviewed Facebook pages and identified whether each page was part of a pro-or anti-vaccination cluster. The study found that anti-vaccination clusters managed to become highly entangled and dominated in Facebook networks.

The broad set of studies seeking to leverage social media have demonstrated the potential of these data to investigate online messages regarding vaccine acceptance. However, most prior studies have applied sentiment and other textual analysis to indicate vaccine acceptance qualitatively, but not to estimate acceptance quantitatively, and across different geographical regions. Further, there has been little discussion of the reliability of social media data in assessing vaccine acceptance.

To fill the research gaps, the present study explores the utility of social media data to render quick indication of COVID-19 vaccine acceptance. The primary objectives are to, 1. Explore whether social media is a reliable indicator of vaccine acceptance as validated through later vaccination rates in the context of COVID-19.

2. Discuss specific circumstances in which social media data generate a reliable estimate of vaccine acceptance (e.g., length of time to collect data, number of users to include, granularity of the analysis).

The study addresses the above two objectives via the analysis of Twitter data related to COVID-19 vaccine in the U.S. Specifically, the study proposes a vaccine acceptance index (VAI) to quantify online opinions towards vaccination. The study includes temporal and spatial analysis to validate the reliability of conclusions from social media data against vaccination rates published by official channels. The intent is to monitor vaccine acceptance across different geographical areas with rapidity, broad spatial coverage, and cost-effectiveness.

The concepts of acceptance, hesitancy, and uptake appear in a breadth of vaccine literature. While the specific definitions used in a study may differ, the present study uses the following definitions. Vaccine acceptance is taken as the individual or group decision to accept or refuse when presented with an opportunity to vaccinate [34] . Vaccine hesitancy refers to situations where people have doubts or concerns toward vaccinations, without referring to actual vaccine receipt [34] . Vaccine uptake refers to the proportion of a population that has received a specific vaccine. Unlike vaccine uptake which is an objective measure, vaccine acceptance and vaccine hesitancy consider people's willingness to accept or refuse when presented the opportunity to vaccinate [35] .

The research framework is illustrated in Fig. 1 . It starts with data preparation, which involves collecting English-language tweets containing the keyword "vaccine" and appearing from August 9, 2020, to April 18, 2021. For consistency, this search word choice was maintained throughout the study window even if additional words and phrases began to be used later in the study. For example, the word "vaccination," might also be contained in tweets expressing people's opinions towards COVID-19 vaccine acceptance.

However, "vaccination" is typically used to refer to receiving or administering the vaccine and the COVID-19 vaccination efforts did not begin until December 14, 2020 [1] (approximately halfway through the study window).

To develop the training set, a set of the 15,000 most frequently occurring tweets were sampled from the tweet pool. These were manually labeled (i.e., classified into one of three categories) by the authors, as documented in Section 2.2. Then, a random sample of 2,000 unique tweets (distinct from the training tweet samples) was selected and manually labeled (using the same criteria for labeling as applied to the training samples) to build the testing set. Following the creation of the training and testing sets, a text augmentation technique was used to balance the training set, generating a larger augmented dataset with an approximately balanced sample distribution between three label categories, as described in Section 2.2.4.

Text classification pipelines were developed to classify a tweet into one of three categories of opinion toward the COVID-19 vaccine: positive, negative, or unrelated. The construction of the text classification pipelines is described in Section 2.3. The initial training set and the augmented training set were used to train nine candidate text classification pipelines (including two sentiment tools). Recall, Precision, F1-score, and Accuracy -defined in Section 2.3.4 -were used to measure the performance of each model on the testing set.

The best-performing pipeline was applied to the entire dataset to label all available tweets. Then, a vaccine acceptance index (VAI) was developed to measure the online level of vaccine acceptance in a geographic area during a particular period (Section 2.4). This VAI for a particular region requires information regarding the location of individual Twitter users. Section 2.5 explains the process of location identification. 

The Twitter Standard Search API [36] was used with the search term "vaccine" and geocode "39.8, -95.583, 2500 km" [37] to scrape vaccine-related tweets in the date range from August 9, 2020 to April 18, 2021 over the geographic area covered by the geocode. This geocode parameter value is specified by "latitude, longitude, radius," and it returns tweets by users located within a radius of 2,500 km of the center 39.8, -95.583. The study selected this date range after the announcement of the first COVID-19 vaccine made by the Russian government on August 8, 2020, and before the U.S. mass vaccination started on April 19, 2021. The search word "vaccine" would return tweets that mentioned COVID-19 vaccine, but some fraction of the returned tweets was not COVID-19 related (e.g., HPV vaccine, flu vaccine). The latter were labeled as unrelated. The downloaded tweets included original tweets, mentions (i.e., a tweet that quotes another user's username), replies (i.e., a comment to another user's tweet), and retweets (i.e., a reposting of a tweet) that contained the keyword "vaccine." The geographic search range was restricted to continental North America with the given geocode to investigate the U.S. public's response to the COVID-19 vaccine.

The resulting dataset contains 29,213,502 English language tweets.

To classify users' opinions towards the COVID-19 vaccine, the most frequently occurring 15,000 tweets were selected (i.e., the 15,000 unique tweets retweeted the most) and manually classified each tweet into a "positive (class 1)," "negative (class -1)," or "unrelated (class 0)" about COVID-19 vaccine.

Sentiment analysis was not used to classify the tweets because it was judged that sentiment score might not represent the user's opinion in the research context. For example, the tweet "Rich people did not experience the same pandemic as working-class people and now they get the vaccine first. It's actually twisted" is identified as negative using sentiment analysis, but it does not necessarily imply that this user was negative about the COVID-19 vaccine. A prior study also reveals the difference between sentiment and opinion mining in the text [38] .

The criteria for manual labeling are based on the CDC strategy to reinforce confidence in COVID-19 vaccines [39] . Table 1 illustrates the specific labeling criteria used and examples. Tweets discussing vaccine passport (the second example in the class "unrelated" in Table 1 ) were labeled as unrelated (class 0) given that a Twitter user could possibly oppose to the vaccine passport while accepting the vaccine. Rich people did not experience the same pandemic as working class people and now they get the vaccine first. It's actually twisted.

No vaccine passport. It doesn't get much more dystopian than being required to show your "health papers" wherever you go.

Even with a vaccine, I'll still be wearing a mask and continuing cleaning and social distancing. I ain't had a cold or nothing since March.

The manual classification process considered the 15,000 unique tweets that were retweeted the most, but subsequent analysis incorporated classifications for all retweets, given that online users who retweeted a tweet were presumed to share the same opinion as the original tweet. Each tweet in the training set was labeled by two annotators. When a tweet received the same label from both annotators, it was considered as the final label for this tweet. When there was a disagreement, the label was finalized through a short discussion by the annotators. As a result, there are 13,624 tweets (90.8% of training data) that received the same labels from two annotators, and the rate of variation between two annotators is 9.1% ((15000-13624)/15000). The manual labeling process resulted in a training set of 4,092 positive tweets (class 1), 1,783 negative tweets (class -1), and 9,125 unrelated tweets (class 0).

Next, 2,000 unique tweets were randomly selected (different from the training set), and the same label criteria was used to build the testing set. As a result, the rate of variation is 8.1% ((2000-1837)/2000), and the testing set contained 492 positive tweets (class 1), 224 negative tweets (class -1), and 1,284 unrelated tweets (class 0). The training and testing sets were used for building and assessing the performance of a text classification model. In particular, the manually labeled samples in the testing set were considered as the "ground-truth" values and used to measure the classification performance of different candidate models.

Before text classification, several steps were applied to clean the tweets. First, short URLs, user names (@username), the retweet prefix (RT @username), digits, emojis, and punctuations were removed. Next, each tweet was tokenized into a list of separate words and characters. Then the tokenized words were converted to their stemming forms (lemmatization). The Natural Language Toolkit (NLTK) python package was used to complete the text cleaning [40] .

Imbalance in the training set (9,125 class 0 tweets, 4,092 class 1 tweets, and 1,783 class -1 tweets) might pose an issue for the text classification process in that the trained model could classify a random tweet as class 0 to obtain a high testing accuracy. As a result, the trained model might not produce an adequate performance for the minority class. There are two practical methods to tackle this issue, downsampling and upsampling. Downsampling balances classes by training on a small subset of the majority class samples, while upsampling balances classes by increasing the size of the minority class samples. Since downsampling might lead to a loss of information for the model to capture "unrelated" tweet information, upsampling was used to balance the training samples.

The easy data augmentation (EDA) technique was selected to perform upsampling since it does not require a model pre-trained on an external dataset [41] . The EDA technique uses four operations to increase the text sample size: 1) synonym replacement (random words in a text are replaced by a synonym), 2) random insertion (random words are inserted into the text), 3) random swap (randomly swap two words in the text), and 4) random deletion (randomly delete words in the text) [41] . 

Sentiment classification, although it is different from opinion classification, was used to test the credibility of opinion classification. Sentiment classification tools use natural language processing (NLP)

to analyze conversations and classify text into a positive, negative, or neutral emotional category. This study applied two popular Python-based sentiment tools, Textblob and Vader (Valence Aware Dictionary and sEntiment Reasoner). Textblob is a free Python library for processing textual data and provides a simple integrated and convenient API for diving into sentiment analysis [42] .

Vader is an open-sourced and lexicon-based sentiment analysis tool under MIT license that is attuned to the sentiment expressed in social media [43] .

Text vectorization converts text into a vector or a matrix of vectors of numbers. Two text vectorization approaches were applied: Term Frequency-Inverse Document Frequency (TF-IDF) and Word Embedding methods. TF-IDF is a term weighting method used to perform text similarity, text classification, and information retrieval [44] . The key formula of TF-IDF is,

in which denotes the word 's weight in tweet ; represents the frequency of word in tweet ( , )

, is the total number of tweets; and is the number of tweets in which word appears. Although TF-;

IDF does not capture word position or semantic similarity, it is an efficient algorithm for matching words in a query to documents [45] . Due to its simplicity and fast computation, TF-IDF is useful when dealing with a large set of textual data.

Unlike TF-IDF, word embedding techniques can help capture semantic meanings [46] , which may be used to compute text similarity and perform text classification. Word embedding methods convert each word into a vector of real numbers. Two popular pre-trained word embedding methods were applied in this study: FastText [47] and Global Vectors (GloVe) [48] . FastText applied in this research was pre-trained on Wikipedia based on 300 dimensions [47] . GloVe was trained based on Twitter content and each word was mapped using 300-dimensionals [53] .

Multi-class classification involves more than two classes. In this study, the target domain contains three classes: class 1 (positive), class -1 (negative), and class 0 (unrelated). After each word in a tweet was converted to a vector using the TF-IDF method or word embedding methods, different machine learning classifiers were applied to classify each tweet into one of the three classes.

Classifiers used in this study were Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), and Long Short-Term Memory (LSTM). These are well machine learning classifiers and can be used in conjunction with the TF-IDF text vectorization method. Pipelines using these classifiers combined with the TF-IDF technique were constructed based on the Scikit-learn Python library [50] . The TF-IDF method converts a tweet into a vector of features; however, word embedding techniques convert each word in a tweet into an n-dimensional vector based on the pretrained dataset. Thus, each tweet sample is represented as a matrix, with each column denoting the word vector. Given that the input to a classification model is a matrix, it usually requires deep learning techniques (e.g., neural networks) to deal with such classification tasks. In this study, the LSTM model was applied for text classification in conjunction with word embeddings. The implementation was constructed with the Keras Python library [51] .

Ultimately, two sentiment-based classifiers (Textblob and Vader), five vector-based machine-learning based classifiers (TF-IDF + DT, TF-IDF + RF, TF-IDF + NB, TF-IDF + SVM, and TF-IDF + LR) and two deep-learning classifiers (FastText + LSTM and GloVe + LSTM) were considered as candidate models.

The nine candidate models were trained using the manual classified training data (with and without upsampling), and performance was assessed using the "unseen" 2,000 tweets contained in the testing set.

Each of these machine learning classifiers contains hyperparameters that control the learning process. The grid search method was applied to tune the hyperparameters and to search for the ideal model architecture.

The parameter range of grid search for different classifiers is presented in Table 2 . Precision, Recall, and F1-score were used to assess the classification performance on the testing data.

Precision measures the fraction of true positive cases over the retrieved cases a model predicts. Recall is the fraction of true positive cases over all the relevant cases that are actually identified. The F1-score is a rating of test accuracy, combining Recall and Precision [52] . They are defined as:

(2) = + Using grid search, different hyperparameters listed in Table 2 were tuned to achieve an idea architecture for each machine learning classifier. The best hyperparameter combination for each classifier was selected mainly based on F1-scores and testing accuracy. Performance on the testing dataset of these classification models with the best tuned hyperparameters is shown in Table 3 for both the original and upsampled training sets.

The TF-IDF + RF trained on an augmented training set offered the best performance overall and therefore was applied to the whole dataset in subsequent steps. The performance of some models improved but some reduced when dealing with the augmented training set, according to the F1-scores. One possible explanation is that the text augmentation added more samples to balance the classes but also introduced more information to the training process (e.g., more words with synonym change, and more training samples with the same classification). However, with sample balance, most models showed an improvement on class 1 and class -1. In particular, the performance of some models (built with NB and RF)

for class -1 greatly improved with sample balance, as demonstrated by a higher F1-score.

Second, pipelines using word embedding techniques were not consistently superior to those built on TF-IDF even those word-embedding methods are often considered more advanced. In applications such as those in this study, it is crucial to discriminate between words with similar meanings. However, the word embedding techniques may contain too much hidden information that is difficult to understand by the machine.

Third, the performance of sentiment classification tools was barely satisfactory, as demonstrated by lower F1-score and testing accuracy. As discussed, in applications of mining opinions from social media data, sentiment scores may not sufficiently reflect users' opinions, especially when involving complex context (e.g., irony, metaphor). 

The selected classification model TF-IDF + DT trained on the upsampled training dataset was applied to 29,213,502 tweets contained in the full dataset. A classification was assigned to each tweet and used to compute a user-based vaccine acceptance measure. The following index was proposed to measure vaccine acceptance in geographic area during a particular period. Vaccine acceptance was defined for a specific Twitter user during time-period ( ) as: , 

The regional vaccine acceptance index requires information regarding the location of individual Twitter users. There are two types of information that can be leveraged to identify a user's location: (1) a user posting geolocation information (latitude/longitude) as part of a public tweet, and (2) a user providing registration location (i.e., home location) as part of their public profile. While geolocation tied to a tweet provides the most robust location information, only a small proportion of tweets in the dataset (18,806 out of 29,213,502 or 0.06% of tweets) were associated with latitude and longitude. This does not provide sufficient data to reveal the temporal and spatial patterns of interest. However, Twitter allows users to register home locations in their user profile. More than half (16,334,206 out of 29,213,502 or 55.9%) of tweets in the dataset had registration locations indicating either U.S. state or city. Therefore, registration location was used to ensure sufficient data for temporal and spatial analysis.

To identify state and county information from registration profiles, a simple word mapping was applied based on a list of U.S. locations (cities, counties, state names, and state codes) obtained from SimpleMaps.com [53] . For example, the following registration locations, "California, USA," "CA," and "California," indicate the State of California. Other registration locations may contain more specific location information, such as city or county. For example, the following locations, "Los Angeles, CA" "New York City," "Orlando," and "Houston, TX," contain city information. Location information was used to identify the user's profile state and, if possible, the user's profile county. If a registration location contained only city information, it was referred to its corresponding county and state. For example, if a user's registration location is "Seattle," then the county information for this location was identified as "King County" and the state is identified as "Washington." Using this approach, a Python code was developed by the authors, and 16,334,206 records were extracted containing state information for nation-level and statelevel analysis and 8,885,549 records containing the county information for county-level analysis.

Some limitations of this application are noted. First, this approach cannot identify locations that were not presented in the list of U.S. locations obtained from SimpleMaps.com, such as "Times Square" or "Long Island." Second, this approach only returns the last location information when a user's profile contains multiple locations. For example, if the registration location includes two locations such as "LA and NYC," the method will only identify the last location "NYC" and refers it to the county and state information "New York, NY." Third, it cannot capture nouns of locality in the users' registration locations, such as "near," "between," "north to," and "close to." Last, this approach returns the county and state information with the most populated area when the user's input only contains the city name that may refer to multiple locations in the U.S. For example, a user's input of "Portland" can refer to a city in Oregon, Maine, Colorado, Texas, Tennessee, or Indiana. This approach generates the county and state information as "Multnomah, Oregon"

given that the Portland city in Oregon has the largest population among these cities.

The volume of positive-and negative-classified tweets over time is shown in Fig. 2a. Fig. 2b shows the national VAI computed each day and on a 7-day rolling average. For the daily values (blue line in Fig. 2b) , a clear rising trend in observed during the study period. This characterization appears consistent with survey results on 7,420 American adults indicating that vaccine hesitancy declined by 10.8% from October 2020 to March 2021 [15] . It is also consistent with national poll results reported by Nguyen et al. [16] which showed that the intent to receive COVID-19 vaccination increased from 39.4% to 49.1% among adults and across all priority groups from September to December 2020.

In addition to providing insights on overall trends, VAI provides insights regarding public reactions to vaccine-related events. Given that Twitter users responded to news events by posting personal opinions and event observations, the index is likely to reflect events and the availability of vaccine information (e.g., clinical trial results regarding vaccine safety and efficacy). Fig. 3 shows the same daily time-series with notable events annotated.

For example, the national VAI moved from negative acceptance to the positive range, and the volume of positive tweets increased in early November. Pfizer published its trial efficacy result on November 9, 2020. Before this date, the overall opinions on vaccine acceptance fluctuated between negative and neutral.

Other notable events or opinions potentially contributing to changes in vaccine acceptance might have included: (1) scientists worried that the government put political pressure on the FDA to approve the vaccine without referencing the clinical data, and (2) the U.S. rejected joining the WHO's COVID-19 vaccine initiative [54] . After the trial data became available from Pfizer, the VAI increased to 0.8 and remained consistently high throughout the study period (April 19, 2021) . Near the end of the study period, a significant negative spike was observed in Fig. 3 , which coincides temporally and thus may be associated with the pause of the Johnson & Johnson vaccine due to the report of blood clots on April 13, 2021 [55] . 

The relationship between the VAI and later vaccination rates was explored at the state level. As described in Section 3.4, tweets were extracted for state-level locations at which information could be obtained. Tweets were grouped based on extracted locations to calculate the state-level index using equation 7. The 7-day VAI in each state was compared with the national-level trend (Fig. 4a) . In each subplot, the grey dotted line is the rolling weekly national trend, and the blue line and yellow lines show state-level trend with the color denoting whether the state-level VAI is above (blue) or below (yellow) the national VAI. Detrended state-level 7-day VAI were obtained by subtracting the national VAI. This provides a clearer view of the changes in each state. In both Fig. 4a and Fig. 4b , the x-axis represents the study period, and the y-axis represents the VAI (Fig. 4a) or difference in VAI (state VAI minus national VAI) (Fig. 4b) . Other events potentially contributing to the evolution of vaccine acceptance include positive results from vaccine clinical trials, FDA's approval of the emergency use authorization [56] , vaccinations of frontline healthcare workers [57] , and quick vaccine distribution across the U.S. [58] . These interactions between the Twitter-derived national VAI and the events suggest that social media could serve as an effective tool to reflect online opinions and support near real-time tracking of vaccine acceptance. lower than the national level. Some states (e.g., SD, IA, AR, TN, MS, AL, and GA) had a higher acceptance index at the beginning but dropped afterward. Other states (e.g., WA, OR, CA, WI, and MT) were more negative in the first few months but later turned positive. Fig. 4 . 7-day rolling average of the state-level VAI (a) compared with the national VAI (y-axis is the VAI value and x-axis is the date across the full study period), and (b) with detrended from the national VAI (y-axis is the difference between state-level VAI and national VAI, and x-axis is the date across the full study period).

Correlations were calculated between the Twitter-derived VAI and the cumulative vaccination rate [1] at the state level. This aimed to: (1) measure the potential value of using the social media-derived index to understand future vaccination rates at the state level, and (2) determine conditions when the Twitter-derived VAI shows a strong correlation to vaccination rate. Fig. 5a -c shows the correlation coefficient calculated between the daily (Fig. 5a) , 7-day rolling average (Fig. 5b) , and 30-day rolling average (Fig. 5c) state-level VAI and the corresponding state's vaccination rates on June 22, 2021. The data for vaccination rates were obtained from CDC COVID-19 Data Tracker [1] . The regression line between time and the correlation coefficient is shown by the dotted trend line (with associated values applying to the whole regression line annotated on the graphs).

As expected, the VAI calculated using more recent data shows a stronger positive correlation with the June vaccination rates. It was further observed that calculating the VAI over a more extended time (monthly rolling average) generates a more stable relationship. The correlation is higher after the first vaccine distribution, which occurred on December 14, 2020, as indicated by the yellow boxes on Fig. 5c . This suggests that, at the state-level, the online opinions on vaccine acceptance expressed after the initial vaccine rollout are more strongly related to later vaccination rates.

To further explore the temporal change in correlation between vaccination rates and the 30-day VAI, the correlation analysis was repeated using vaccination rates on three dates: 

County-level VAI was computed as the average of the acceptance levels of all users from that county in a given period, where the county assigned to a particular user was determined from profile information.

However, increased spatial discretization means there were fewer data available for each spatial unit (county). Fig. 6a presents the distribution of Twitter users by county. Fig. 6b shows county-level VAI computed across the period after the first vaccine distribution (December 14, 2020, to April 18, 2021) for counties with more than 30 users. Fig. 6c . County-level vaccination rates for Texas and selected counties in California were not available. The change in correlation with number of users in a county was based on two scenarios: (1) using all the data in the study period, and (2) using data after the first vaccine distribution. The results are presented in Fig. 7a and Fig. 7b for scenarios 1 and 2, respectively. Compared to VAI, the survey-based downsampling method shows higher correlation. Still, the surveybased approach requires a larger commitment of time and resources and does not offer the same ability to track vaccine hesitancy over a long period or to see effects such as the impact of newsworthy events on opinions.

While vaccination is the most successful public health intervention to contain infectious diseases, strategies are needed to rapidly assess vaccine acceptance. Survey-based methods have been applied to index, which appears consistent with survey studies that indicate vaccine hesitancy in the U.S. declined from October 2020 to March 2021 [15] . The study further explores the correlation of VAI with vaccination rates at the state, and county levels.

From a theoretical perspective, the present study contributes to research focusing on information extraction from social media data. First, it provides insights into natural language processing tools to perform multi-class classification tasks on social media data. In this context, the TF-IDF method may be as effective as more advanced word embedding methods, possibly because word embedding methods may not discriminate minor distinctions among tweet data or bring too much hidden information to the training process (e.g., because each word is converted into a multi-dimensional vector).

Second, this study indicates differences between sentiment classification and opinion mining. A few studies have used sentiment analysis to assess vaccine hesitancy because sentiment could be a good indicator to reveal the overall emotions about vaccine confidence [32] . However, sentiment classification may result in a bias in opinion extraction in applications such as this study. This statement is supported by the comparatively low testing accuracy of sentiment classification compared to text classification models.

Third, the research proposes a vaccine acceptance index based on tweeters' opinions, which is a quantitative approach to measure online opinions. While this study is presented as a COVID-19 vaccine case study, the approach is applicable to similar issues and can be generalized as a framework with medicine and health relevance. For example, during the COVID-19 pandemic, online opinions on wearing masks and implementation of lockdown can be investigated using a similar research framework. It can also be generalized to study the vaccine acceptance of the flu vaccine and HPV vaccine or opinions on other important healthcare events or policies.

From a practical perspective, this study provides an approach to the rapid assessment of vaccine acceptance. At the national level, this Twitter-based index provides an instrument to track online opinions on vaccine acceptance over time. The national-level analysis also reflects that the index is subject to the dynamics of notable events and the availability of vaccine information (e.g., clinical trial results regarding vaccine safety and efficacy). At the state level, through the exploration of correlation with later vaccination rates, the study has shown the potential of social media to assess vaccine acceptance. Observations are made regarding conditions under which correlations are high. For example, data obtained after the first vaccination rollout yields higher positive correlations, and the use of a longer period of data collection (e.g., one month) for computing the VAI shows a higher correlation with later state-level vaccination rates. At the county level, since social media data may be biased towards higher population (urbanized) areas, the study explores the effect of sample size on the computed correlation between the index and county-level vaccination rates. The correlation analysis suggests that correlations stabilized when only considering counties for which data was available from at least 30 users.

The social media approach can benefit government agencies and health officials in the following ways.

First, the study reveals that on the use of social media data can contribute to additional insights on public reactions to governments' decisions on the vaccine. Such insights can usually be obtained via social media channels. Government agencies can also refer to social media analysis to assist decision-making related to the mechanisms for releasing information. Second, the study indicates conditions for using social media data for a granular assessment of vaccine acceptance. For example, calculating the state-level and countylevel index with data after the first vaccine distribution can generate a more reliable analysis. The countylevel assessment stabilizes when considering counties with more than 30 users. Third, this approach can help assess vaccine acceptance and help decision-makers to provide vaccine education for places where people feel hesitant to receive the vaccine.

Biases associated with this social media approach are worthy of note. First, relying on social media data can help eliminate the bias of those choosing to participate in a survey study, but it still has the bias of those choosing to leave a public comment about the vaccine. As described in the county-level analysis, previous studies have demonstrated that social media data are likely to be biased in a more granular level estimation (e.g., social media users are younger and from more urbanized areas). These intrinsic characteristics of social media users might affect the estimates, and this limitation can be difficult to overcome in gathering social media data [59] , [60] .

Two processes during the model development could also introduce model bias. First, this study selected the keyword "vaccine" to scrape tweets, mainly to ensure consistency throughout the study period. However, other keywords, such as "vaccination" or common vaccine-related misspellings and slang (e.g., "vax"), may also contain helpful information about vaccine acceptance. Second, manual classifications of tweets (including the "ground-truth" samples in the testing set) were based on the authors' interpretations of the COVID-19 vaccine acceptance expressed in the tweet. However, the authors might incorrectly identify a user's opinion described by a tweet.

When this social media-based index is applied to investigate vaccine acceptance, several limitations need to be addressed. First, this study used a general list of U.S. states and cities to match users' registration locations. Given that a Twitter user's input for the registration location could differ from the tweet location, there is no guarantee that the extracted information reflects a user's actual location. Some location descriptors were not considered in our analysis. For example, a user may input the location "Times Square,"

which suggests a place in New York City, but this situation is not considered in this study. Future work will consider better location descriptors and integrate them with the current pipeline to identify location (e.g., city and county).

Although the "simpler" algorithm TF-IDF technique obtained reasonable accuracy in this study, it may not be suitable for situations where tweet data differ from the training tweet data, especially when the size of the training set is small. TF-IDF is a bag-of-words method that converts each word into a number (a word count with the IDF score). If the training set does not include many words that appear in the whole dataset, the TF-IDF technique may not generate a satisfactory performance. However, word embedding may also fail to generate competent performance given that it is not pre-trained based on health-related databases or cannot capture minor discriminations of words between tweets. Future work will focus on the application of neural network architectures with text vectorization techniques to improve performance, such as Bidirectional Encoder Representations from Transformers (BERT).

The analysis completed in this study uses English tweets and English-based NLP techniques. When the VAI is applied to estimate vaccine acceptance in non-Egnlish speaking countries, tweets written in other languages need to be collected to train the model, and correspondingly, text classification models need to be developed using NLP techniques developed in other languages. Moreover, as previously mentioned, social media data may overrepresent the young, educated, and urbanized population, so the VAI may not be suitable for applications in those areas with few people using social media to communicate. Therefore, future work will also consider using NLP and machine learning tools trained in other languages and integrating data from surveys or models (e.g., CDC HPS model [17] ) to complement the social media estimation.

Ongoing and future work could build off the insights and approaches described in this study. For example, an extension of this study could be used to evaluate the impacts of campaigns or interventions by state or federal public and private sectors by tracking the dissemination patterns (e.g., retweets, mentions) of tweets from state-, federal-, and certain private-affiliated accounts. This can help to inform how those vaccine campaigns affect the online community. As another example, the proposed VAI can be investigated hours to days after the interventions applied to assess their real-time impacts on vaccine acceptance.

Another extension of this study could consider integrating data from other social media plaforms and comparing the results of VAI computed based on different social media data. This can help reveal the characteristics of different social media users when focusing upon policies or events with medicine and health relevance.

Other future work could explore the relations of the online vaccine acceptance with socioeconomic factors (e.g., education level, gender, age, employment rate) and other COVID-19 factors (e.g., death rate, hospitalization rate). This social media approach may provide valuable insights into the geographical dynamics of vaccine acceptance and aid health professionals in conducting vaccine interventions, such as vaccine education. Last, future work could consider the generalizability of this research framework by applying it to investigate the vaccine acceptance on other types of vaccine and in a broader geographical scale, such as the vaccine acceptance over HPV vaccine and flu vaccine in different countries.

The study investigated 29 million vaccine-related tweets from August 8, 2020 to April 19, 2021, and proposes a vaccine acceptance index (VAI) to assess COVID-19 vaccine acceptance. To classify the opinions regarding vaccine acceptance, 17,000 sampled tweets were manually labeled, and different pipelines were developed with the aid of NLP techniques. Significant findings include: (1) the national VAI moved from negative to positive in 2020 and maintained constant after January 2021, consistent with survey results, (2) the East Coast of the U.S. and selected Midwest states exceeded the national index most of the time, while states in the south were lower than the national level, (3) calculating VAI over a more extended time and based on data after the first vaccine distribution generated a more reliable analysis, (4) calculating the county-level VAI when only considering counties for which data were available from at least 30 users ensured a more consistent analysis.

These findings have demonstrated the usefulness of social media data for the dynamic assessment of vaccine acceptance and provided insights into the reliability of using social media data. This proposed social media approach provides potentials for fast and cost-effective assessment of vaccine acceptance across large geographical scales.

Nation Faces 'Hand-to-Hand Combat' to Get Reluctant Americans Vaccinated -The New York Times

Attitudes Toward a Potential SARS-CoV-2 Vaccine

Measuring vaccine hesitancy: The development of a survey tool

A survey instrument for measuring vaccine acceptance

The Vaccination Confidence Scale: A brief measure of parents' vaccination beliefs

Development of a survey to identify vaccine-hesitant parents

Likelihood of COVID-19 vaccination by subgroups across the US: postelection trends and disparities

Acceptance of the COVID-19 vaccine based on the health belief model: A population-based survey in Hong Kong

Systematic Literature Review on the Spread of Health-related Misinformation on Social Media

Report of the SAGE Working Group on Vaccine Hesitancy

The State of Vaccine Confidence 2016: Global Insights Through a 67-Country Survey

The vaccine hesitancy scale: Psychometric properties and validation

Development of a valid and reliable scale to assess parents' beliefs and attitudes about childhood vaccines and their association with vaccination uptake and delay in Ghana

Public Trust and Willingness to Vaccinate Against COVID-19 in the US From

COVID-19 vaccination intent, perceptions, and reasons for not vaccinating among groups prioritized for early vaccination

Vaccine Hesitancy for COVID-19

Willingness to vaccinate against COVID-19 in the US: Longitudinal evidence from a nationally representative sample of adults from

Validation of the Vaccination Confidence Scale: A Brief Measure to Identify Parents at Risk for Refusing Adolescent Vaccines

Determinants of COVID-19 vaccine acceptance in the US

Parental vaccine hesitancy in Italy -Results from a national survey

Exploring vaccine hesitancy among healthcare providers in the United Arab Emirates: a qualitative study

Social media study of public opinions on potential COVID-19 vaccines: informing dissent, disparities, and dissemination

Prospective associations of regional social media messages with attitudes and actual vaccination: A big data and survey study of the influenza vaccine in the United States

Understanding the messages and motivation of vaccine hesitant or refusing social media influencers

Social media and HPV vaccination: Unsolicited public comments on a Facebook post by the Western Cape Department of Health provide insights into determinants of vaccine hesitancy in South Africa

Chinese social media suggest decreased vaccine acceptance in China: An observational study on Weibo following the 2018 Changchun Changsheng vaccine incident

Semantic network analysis of vaccine sentiment in online social media

Vaccines for pregnant women…?! Absurd' -Mapping maternal vaccination discourse and stance on social media over six months

Social media use and influenza vaccine uptake among White and African American adults

Social media engagement association with human papillomavirus and vaccine awareness and perceptions: Results from the 2017 US Health Information National Trends Survey

Vaccine Hesitancy on Social Media: Sentiment Analysis from

The online competition between pro-and anti-vaccination views

Words matter: Vaccine hesitancy, vaccine demand, vaccine confidence, herd immunity and mandatory vaccination

Vaccine Hesitancy, Acceptance, and Anti-Vaccination: Trends and Future Prospects for Public Health

Standard search API | Docs | Twitter Developer Platform

Computational Vision and Bio-Inspired Computing ICCVBIC 2019. ICCVBIC: International Conference On Computational Vision and Bio Inspired Computing

Are they different? affect, feeling, emotion, sentiment, and opinion detection in text

Building Confidence in COVID-19 Vaccines | CDC

Natural Language Processing with Python

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

TextBlob: Simplified Text Processing

VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text

Mining of massive datasets

Text Classification Algorithms: A Survey

Text Classification Algorithms: A Survey

Enriching Word Vectors with Subword Information

Glove: Global Vectors for Word Representation

GloVe: Global vectors for word representation

Scikit-learn: Machine Learning in Python

Classification evaluation

US Cities Database | Simplemaps.com

US won't join global coronavirus vaccine initiative

Johns Hopkins Medicine

FDA Agrees to EUA for COVID-19 Vaccine from Pfizer, BioNTech

Starts Vaccine Rollout as High-Risk Health Care Workers Go First

United Begins Flying Pfizer's Covid-19 Vaccine

Understanding the Political Representativeness of Twitter Users

Twitter and Facebook are not representative of the general population: Political attitudes and demographics of British social media users

conceptualization, data curation, formal analysis, visualization, writing, review & editing

A. conceptualization, supervision, writing, review & editing

project administration, writing, review & editing. Declaration of Interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work

The authors thank Sanggyu Lee from the Department of Civil and Environmental Engineering at the University of Maryland for his assistance in tweet labeling. The authors also thank two anonymous reviewers whose suggestions helped improve and clarify this paper. 

Supplemental data for this article can be accessed: https://github.com/leon1219/covid19_vaccine.Restrictions apply to the availability of Twitter data, which are used under license for the current study, and so specific user information or tweets messages are not publicly available. The tweet IDs are however available from this published dataset, but the original tweets are available with permission of the Twitter Inc.

Highlights  Leverages social media data to assess COVID-19 vaccine acceptance in the U.S. Proposes a vaccine acceptance index (VAI) to quantify online expression of opinion  The VAI is calculated from the number of positive and negative Tweets posted by a specific users and groups of users  The VAI is compiled for regions such a counties or states to provide geospatial information  The VAI is tracked over time to assess changes in vaccine acceptance as related to trends in the media and politics