key: cord-0233884-5dghmdah authors: Malahy, Sean; Sun, Mimi; Spangler, Keith; Leibler, Jessica; Lane, Kevin; Bavadekar, Shailesh; Kamath, Chaitanya; Kumok, Akim; Sun, Yuantong; Gupta, Jai; Griffith, Tague; Boulanger, Adam; Young, Mark; Stanton, Charlotte; Mayer, Yael; Smith, Karen; Shekel, Tomer; Chou, Katherine; Corrado, Greg; Levy, Jonathan; Szpiro, Adam; Gabrilovich, Evgeniy; Wellenius, Gregory A title: Vaccine Search Patterns Provide Insights into Vaccination Intent date: 2021-11-22 journal: nan DOI: nan sha: 4df81adf4f7996d2bd1124769fbfcb080ffc9912 doc_id: 233884 cord_uid: 5dghmdah Despite ample supply of COVID-19 vaccines, the proportion of fully vaccinated individuals remains suboptimal across much of the US. Rapid vaccination of additional people will prevent new infections among both the unvaccinated and the vaccinated, thus saving lives. With the rapid rollout of vaccination efforts this year, the internet has become a dominant source of information about COVID-19 vaccines, their safety and efficacy, and their availability. We sought to evaluate whether trends in internet searches related to COVID-19 vaccination - as reflected by Google's Vaccine Search Insights (VSI) index - could be used as a marker of population-level interest in receiving a vaccination. We found that between January and August of 2021: 1) Google's weekly VSI index was associated with the number of new vaccinations administered in the subsequent three weeks, and 2) the average VSI index in earlier months was strongly correlated (up to r = 0.89) with vaccination rates many months later. Given these results, we illustrate an approach by which data on search interest may be combined with other available data to inform local public health outreach and vaccination efforts. These results suggest that the VSI index may be useful as a leading indicator of population-level interest in or intent to obtain a COVID-19 vaccine, especially early in the vaccine deployment efforts. These results may be relevant to current efforts to administer COVID-19 vaccines to unvaccinated individuals, to newly eligible children, and to those eligible to receive a booster shot. More broadly, these results highlight the opportunities for anonymized and aggregated internet search data, available in near real-time, to inform the response to public health emergencies. With the rapid roll out of vaccination efforts this year, the internet has become a dominant source of information (and misinformation 3 ) about COVID-19 vaccines, their safety and efficacy, and their availability. Prior studies have shown that internet search patterns based on anonymized and aggregated data can be used to predict the occurrence of Lyme disease and outbreaks of influenza; to nowcast COVID-19 cases, hospitalizations, and deaths; and to identify food establishments that would benefit from food safety inspections to limit the further spread of foodborne illness. [4] [5] [6] [7] Internet search patterns may similarly provide novel insights that could be used to inform public health efforts to increase vaccination uptake, but this hypothesis has not been examined in detail. Internet searches related to COVID-19 vaccines began rising in January 2021 and then rose further starting in March. 8 Early evidence suggests that internet search activity (aggregated to the state level) is associated with higher rates of vaccination in that state. 9 Google recently began publishing the COVID-19 Vaccination Search Insights (VSI) index, a publicly available dataset showing trends in Google searches related to COVID-19 vaccination from January 2021 through the present. We sought to evaluate whether patterns in internet searches across locations and over time could be used as a marker of population-level interest in receiving a vaccination and, if so, to explore how this information might be used by public health officials to identify geographic areas with particularly high amenability towards vaccination despite low uptake. We obtained publicly available data on the relative volume of searches related to COVID-19 vaccinations from Google's COVID-19 Vaccination Search Insights (VSI) (https://google-research.github.io/vaccination-search-insights). The main VSI index reflects the weekly proportion of searches in a given geographic area related to COVID-19 vaccination, indicating overall search interest in the topic. The VSI index is calculated as the weekly number of relevant searches within a given geographic region, normalized by the search activity in that region. The anonymization process for the VSI data is based on differential privacy and documented in detail elsewhere. 10 Importantly, the anonymization process limits the contribution to the VSI index of any one user such that searches for information by scientists, medical professionals, and other experts will have limited impact on the overall VSI index. The normalization procedure allows comparisons across locations and within the same location across time. We obtained and examined VSI data at the state, county, and ZIP code levels for the weeks beginning January 11, 2021 through September 6, 2021. We used multiple data sources to estimate the number of vaccinations administered within each US state, county, or ZIP code, since no single data source provides vaccination data at all three spatial scales for the entire time period of interest. We obtained state-level data from Our World in Data's (OWD) 11 daily measure of the cumulative number of individuals in each state with 1 or more vaccine doses for the weeks beginning January 11, 2021 through September 6, 2021, inclusive. We imputed the value for days with missing data by assuming a linear trend from the closest previous date without missing data to the closest following date without missing data. We obtained daily data for each US county from the CDC for the same time period ( https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-am qh). As with the state-level data, we imputed the value for days with missing data by assuming a linear trend. The CDC does not report daily data at the county level for Hawaii or Texas, thus these states are excluded from county-level, time-series analyses. We aggregated daily estimates of cumulative vaccination to a weekly time scale to match the weekly frequency on which the VSI index is available. We estimated the number of individuals receiving a first dose vaccination in a given week by taking a moving difference of the weekly cumulative vaccination estimates in each county. We excluded from analyses the first week of data in any given county and replaced any negative differences with zeros. Data on the number of vaccinations by ZIP code are not publicly available from the CDC website, but are reported directly by some US states. Accordingly, we conducted additional analyses at the ZIP code level in two example states that make such data publicly available: California and Texas. The California Department of Public Health provides vaccination rates by ZIP code tabulation area (ZCTA, an approximation of ZIP codes provided by the US Census Bureau) (https://data.ca.gov/dataset/covid-19-vaccine-progress-dashboard-data-by-zip-code), while the Texas Department of State Health Services provides vaccination rates by ZIP code (https://dshs.texas.gov/coronavirus/TexasCOVID19VaccinesbyZIP.xlsx). We converted the Texas ZIP code data to ZCTAs using the latest zip-to-ZCTA crosswalk available from the Uniform Data Systems (UDS) Mapper (https://udsmapper.org/wp-content/uploads/2020/09/Zip_to_zcta_crosswalk_2020.xlsx), in order to more easily link vaccination data with sociodemographic information, which is available for ZCTAs rather than ZIP codes. We additionally obtained data on the cumulative number of vaccinations given by county as of August 11, 2021 from Texas's Department of State Health Services (https://dshs.texas.gov/coronavirus/AdditionalData.aspx). We obtained estimates of the age distribution of the population in each county and state from the most recently available 2015-2019 American Community Survey (ACS) (https://data.census.gov/cedsci/table?q=United%20States%20age%20sex&tid=ACSST5Y2019. S0101). The ACS provides estimates of population size within specific age ranges (e.g., ages 5-9 years, 10-14 years, etc). We estimated the number of individuals aged 12 years or over (and thus eligible for vaccination) as the total population of a given region minus the number of people under 10, and minus 40% of the population aged 10-14 years. We additionally obtained 5-year 2019 ZCTA-level ACS estimates of the age distribution from the IPUMS NHGIS database. 12 The variables selected included information on population proportions by age (percent 65 years or older and percent under 18 years of age), race (percent American Indian or Alaska Native, percent Asian, percent Black or African American, percent Native Hawaiian or Pacific Islander, and percent White), ethnicity (percent Hispanic or Latino [hereafter percent Latinx]), socioeconomic status (percent with college degree or higher, median household income), and health and healthcare (percent without health insurance and percent of persons with a disability). In a first analysis we assessed the degree to which search interest in a given week relates to the number of first doses per 100,000 population aged 12 years or older (individuals eligible for vaccination) in the following 1 to 3 weeks. Specifically, for each week between January 11, 2021 through August 22, 2021, we fit a linear mixed-effects regression model to estimate the number of first-dose vaccinations administered over the following 3 weeks per 100,000 eligible individuals in a given county associated with a 10-unit increase in the VSI index. We included a random intercept for state to account for the correlation induced by the nesting of counties within states. In each week, analyses are restricted to the counties that have data on both VSI index and vaccinations, which range from 1,226 to 2,686 (Supplemental Table 1 ). We next assessed whether search interest early in the course of vaccination rollout in the US is predictive of the cumulative proportion of the population that is vaccinated at a later date. Specifically, we estimated the Pearson correlation coefficient between the mean VSI index in each state or county each month and the percent of the eligible population having received at least 1 vaccine dose in future months. We calculated the monthly mean VSI index based on the month of the first day of each week of data. For example, the VSI index reported for the week of August 30, 2021 contributes to the monthly average for August rather than September. To visualize how vaccine search interest spatially intersects with vaccination rates, we created bivariate choropleth maps that display the spatial overlap between the VSI index and vaccination rates (as the percent of population aged 12 years or older to have received at least one vaccine dose). This mapping technique allows for an easily interpretable visualization of locations that are relatively high in one variable and low in another, as well as locations that are high or low with respect to both measures. As illustrative examples, we performed separate analyses for California and Texas (using county-level data) and for the Los Angeles and Dallas/Fort Worth metropolitan areas (using ZCTA-level data). We chose these locations based on the availability of ZCTA-level vaccination data and the heterogeneity in vaccination rates. In each analysis, we created a 2x2 grid, resulting in identification of county and ZCTA overlaps by top and bottom tertiles in the following configurations: "high-high" (locations in the top tertile [67th to 100th percentile] of state-or city-specific vaccination rates and the top tertile for search interest), "low-low" (locations in the bottom tertiles [0th to 33rd percentile] of both variables), and "high-low" or "low-high" (locations in the top tertile of one variable but bottom tertile of the other). Locations in which one or both variables were either missing or in the middle tertile were classified as "not in high/low or N/A." For these analyses we used the VSI index for the week beginning August 9, 2021. Texas maps were made using vaccination data that were current as of August 11, 2021, and California maps were made using vaccination data that were current as of August 16, 2021. Finally, we report the sociodemographic characteristics of the different groupings of counties and ZCTAs at the intersection of tertiles of the VSI index and vaccination rates, based on the ACS data as described above. Statistical analyses were conducted in R (version 4.1.1) 13 and maps were created using ArcGIS Pro 2.8.3 (© Esri, Redlands, CA). Google search interest for COVID-19 vaccines has varied considerably across location ( Figure 1 ) and time (Figure 2 ). In late February 2021, search interest was highest in the Northeastern US, Florida, and the West Coast ( Figure 1A ). In contrast, in mid-July, search interest was highest in Michigan, Missouri, Arkansas, and Louisiana ( Figure 1B) . The time trend of each US state shows that interest was generally high through mid-April and then declined and remained low until late July (Figure 2A) . Concurrently, there was a rapid increase in the number of first-dose vaccinations administered through approximately the end of April, followed by a sustained but slower rate of first-dose vaccinations through the end of September. Table 1 , see Supplemental Table 2 for full statistical details). For example, the correlation between the VSI index in February 2021 and the proportion of the eligible population who have received at least one vaccination dose 6 months later was 0.71 (Figure 4) . The strongest correlation was between the VSI index in April and cumulative vaccination proportion 1-2 months later (r ≅ 0.89). Results were similar, though correlations were weaker, when we considered metrics at the county rather than state level (Supplemental Table 3 and Supplemental Figure 2 ). We repeated this analysis at a finer spatial scale for the Dallas/Fort Worth metropolitan area using metrics at the ZCTA level rather than county level (Figure 6) . We identified 15 ZCTAs that were simultaneously in the highest tertile of the VSI index and lowest tertile of vaccination proportion (Figure 6C) , again perhaps suggesting that higher rates of vaccination could be achieved in these areas specifically through improved access. An additional 31 ZCTAs were in the lowest tertile of both the VSI index and vaccination proportion, again potentially suggesting that improved outreach and engagement around vaccination could be beneficial. We additionally compared the sociodemographic characteristics of ZCTAs classified by the intersection of the VSI index and vaccination proportion ( Table 2) We performed analogous analyses for counties across the state of California (Supplemental Figure 3) and ZCTAs across the Los Angeles metropolitan area (Supplemental Figure 4 and Supplemental Table 4 ). As in Texas, the analysis in California highlights counties and ZCTAs where vaccination rates are low but search interest is high, suggesting that higher rates of vaccination may be achieved in these areas through improved access. Additionally, as in Texas, the analysis in California highlights counties and ZCTAs where vaccination rates and search interest are both low, perhaps indicating a need for novel approaches to engagement. Despite ample supply of COVID-19 vaccines in the US, the proportion of the population that has been fully vaccinated remains insufficient across much of the country. 14, 15 We evaluated whether Google's VSI index might serve as a useful tool to identify communities with relatively higher interest in vaccination where focused public health efforts might be most successful. We Although availability was a primary barrier to vaccine access in the spring of 2021, that is not likely the predominant barrier to COVID-19 vaccination today, although some vaccine deserts do remain (https://covid19vaccineallocation.org). 16 However, even in areas where vaccine availability is very high, the VSI index may identify locations where people are interested in being vaccinated but where other barriers to access -such as lack of transportation to vaccination sites, or insufficient time off work to either get vaccinated or to recover from potential side effects -could represent a significant deterrent to vaccination. If so, communities with high values of the VSI index and low vaccination rates relative to the larger area may benefit from targeted interventions to make vaccination more convenient or otherwise remove barriers to access. Moreover, the VSI index could be combined with data on vaccine distribution points (e.g., the Vaccine Equity Planner, https://vaccineplanner.org) to gain additional insights into potential barriers to access. Beyond availability, access, and convenience, Betsch et al. 17 These findings need to be interpreted in the context of several important limitations. First, the VSI index is based on anonymized and aggregated search activity and as such provides a measure of average search activity. As with any average, the VSI index may mask important heterogeneity within any given community. Second, the differential privacy algorithms applied to the search data 12 to ensure user privacy increase the variance in the data (i.e., reduce the signal to noise ratio), particularly in areas with smaller populations. Thus, we expect the VSI index to be more robust in counties and ZIP codes with relatively large populations. Third Supplemental Figure 1 : To assess whether the relationship between search interest and first dose vaccinations varied by state in April and July, we estimated a series of linear models using a generalized estimating equation in which the number of new vaccinations in the following 3 weeks (per 100,000 eligible individuals) was regressed on VSI index (in 10s of units) and week. A separate model was estimated for each state in April and July. Data were clustered at the county level and an independence correlation structure was specified. Estimates reflect how many first dose vaccinations (per 100,000 eligible individuals) were associated with a 10-unit change in VSI index per week and county. Error bars reflect +/-2 robust SEs. States names are colored by quintile of their April point estimate to facilitate comparing rank order between April and July. Supplemental Table 2 : Pearson correlation coefficient between the monthly average VSI index within each US state (N = 50) and percentage of the eligible state population having received at least one dose of a COVID-19 vaccine at varying monthly lags. COVID-19-associated hospitalizations among vaccinated and unvaccinated adults ≥18 years -COVID-NET, 13 states Monitoring Incidence of COVID-19 Cases, Hospitalizations, and Deaths, by Vaccination Status -13 Misinformation of COVID-19 on the Internet: Infodemiology Study Machine-learned epidemiology: real-time detection of foodborne illness at scale Lymelight: forecasting Lyme disease risk using web search data A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan What can digital disease detection learn from (an external revision to) Google Flu Trends? Vaccine hesitancy and anti-vaccination in the time of COVID-19: A Google Trends analysis COVID-19 internet vaccination information and vaccine administration: evidence from the United States Google COVID-19 Vaccination Search Insights: Anonymization Process Description A global database of COVID-19 vaccinations IPUMS National Historical Geographic Information System: Version R: A language and environment for statistical computing Towards achieving a vaccine-derived herd immunity threshold for COVID-19 in the U & CMMID COVID-19 Working Group. The potential for vaccination-induced herd immunity against the SARS-CoV-2 B.1.1.7 variant What are COVID-19 vaccine deserts? Why are they dangerous? Beyond confidence: Development of a measure assessing the 5C psychological antecedents of vaccination We are grateful to Elżbieta Brzóz, Bruno Delmonte, Tetiana Kedzierska, Jan Machowski, Don Metzler, Sarah Montgomery-Taylor, Arti Patankar, and Chris Scott, for their help and advice. Dr. Wellenius serves as a consultant to Google, LLC (Mountain View, CA). This work was funded in part by an unrestricted gift from Google, LLC to Boston University School of PublicHealth. In addition, Google, LLC provided support in the form of salaries for employees. Several of the study authors were also involved in the development and launch of Google's Vaccination Search Insights. Google did not have any additional role in the study design, data collection and analysis, or preparation of the manuscript. All manuscripts co-authored by employees of Google are reviewed prior to journal submission to ensure that they meet Google's standards. 2645