key: cord-0633658-xohsfvqi authors: Martins, Leonardo; Medeiros, Marcelo C. title: The Impacts of Mobility on Covid-19 Dynamics: Using Soft and Hard Data date: 2021-10-01 journal: nan DOI: nan sha: 1f22c17b22010b7e6547b30255ac87cfb22ae994 doc_id: 633658 cord_uid: xohsfvqi This paper has the goal of evaluating how changes in mobility has affected the infection spread of Covid-19 throughout the 2020-2021 years. However, identifying a"clean"causal relation is not an easy task due to a high number of non-observable (behavioral) effects. We suggest the usage of Google Trends and News-based indexes as controls for some of these behavioral effects and we find that a 1% increase in residential mobility (i.e. a reduction in overall mobility) have significant impacts for reducing both Covid-19 cases (at least 3.02% on a one-month horizon) and deaths (at least 2.43% at the two-weeks horizon) over the 2020-2021 sample. We also evaluate the effects of mobility on Covid-19 spread on the restricted sample (only 2020) where vaccines were not available. The results of diminishing mobility over cases and deaths on the restricted sample are still observable (with similar magnitudes in terms of residential mobility) and cumulative higher, as the effects of restricting workplace mobility turns to be also significant: a 1% decrease in workplace mobility diminishes cases around 1% and deaths around 2%. The Covid-19 pandemic has created a new dynamic in terms of social behavior. Its impacts over society are widespread through all fields, from psychological effects on individuals (as in Kontoangelos et al. (2020) ) up to economic effects over countries, as examined by International Monetary Fund (IMF) publications (e.g. Deb et al. (2020) ). In this paper we aim at a particular effect of the virus-spread: the impacts of restrictions to mobility over its effects on Covid-19 cases and deaths. We use a panel dataset at the municipal level in Brazil to measure the short-term impacts of reduction in mobility on the dynamics of Covid-19. Throughout the first year of the pandemic (2020), many countries adopted circulation restrictions with the objective of reducing the spread of the disease on the population. However, there is a large heterogeneity in terms of the restriction degrees over the countries: while New Zealand imposed a high-level centralized lockdown strategy, Brazil only imposed decentralized mobility restrictions. In this scenario, evaluating causal effects of mobility impacts on the infection proliferation is a challenging duty, as we are not able to divide regions in a pure randomized way and evaluate the effects of circulation restrictions. There is an additional complication of mapping behavioral (non-observable) variables (such as the usage of masks, social distancing and the adoption of better hygiene measures, among many others) that may affect mobility and infection levels, generating an omitted bias issue. On the other hand, when the rates of infection are high, people tend to comply more with restrictive measures, generating a simultaneity bias. This paper has the objective of analyzing empirically the effects of restrictions to circulation on the infection spread, without recurring to epidemiological models or theoretical formulation of individual behavior. The literature that approached the mobility problem has focused mainly on two points: (i) prediction of the impact of lockdown policies; (ii) evaluation of mobility restrictions in terms of Covid-19 cases and deaths. There is also a third branch that focus on analyzing the effects of Covid-19 on mobility (i.e. the converse of our causal identification), and we only provide some references about such studies. The first "prediction" group focus in the impact of lockdown policies in terms of evaluating Covid-19 spread. Inside this group, we also make two distinctions: (i) synthetic controls; (ii) alternative sources of data and models. On the synthetic control subgroup, we point out Carneiro et al. (2020) who adopted an Artificial Counterfactual (ArCo) approach to assess the impacts of the short-run evolution of number of cases (and deaths) in the US. The prediction suggests that, in absence of the restriction measures, the number of cases would be two times larger than observed. On the same line of using synthetic controls, we point out Bayat et al. (2020) , recurring to a synthetic control methodology to analyze the effects of lockdown measures and the potential impact of those policies on the development of the herd immunity. On the other subgroup, we focus on studies that used either machine learning models to assess the non-linearities intrinsic to the projection problem (as in Said et al. (2020) ) or the ones who have used alternative sources of data, as Google Mobility (Gerlee et al. (2021)) or large-scale mobility data from telecommunication providers (Schwabe et al. (2021) and Vespe et al. (2021) ). These last papers are somehow related to the alternative data sources that we have adopted for controlling the behavioral channel that is not directly measured by conventional variables. The second group constitutes a larger share of the empirical work and is based on different methodologies to assess the effects of mobility restrictions directly on the evolution of the infection. Based on the availability of Covid-19 infection data, the models are predominantly analyzed in a panel of weekly cases and deaths (to reduce noise effects on daily published data) between or within countries. In terms of methodology, Liu et al. (2021) suggest the usage of dynamic panel data model to generate forecasts for panel data to capture the inertial elements that affect the infection situation. The authors opt to model the growth rate of the infections, assuming that this variable can be represented by fluctuations around a downward sloping deterministic trend (with a break). This is also the case of Huang (2020) , where the author also makes use of the growth rate modelling based on counterfactual analysis to find that social distancing intervention is effective in reducing the weekly growth rate by 9.8% and deaths by 7.0% at state-level in the United States. Another example of panel estimation is Chen et al. (2020) based on a cross-country panel analysis to evaluate each non-pharmaceutical intervention in terms of reducing the reproduction number. In terms of Brazilian data, Resende and Maciel (2021) explored a panel-data regression for São Paulo municipalities using labor market dynamics, medical infrastructure and government transfers as controls. The authors found that an increase in 1% on social distancing reduces infections in 4.14% in a week and diminishes 2.8% deaths after two-weeks. Also in terms of country specific effects of mobility restrictions, Vespe et al. (2021) evaluate the effects of restriction on mobility in Italy using mobile network operator data and electricity consumption data to assess the impacts of the Covid-19 wave on the "three-tier" system. Similarly, Barboza et al. (2021) There are also some other studies that aim to analyze the effects of mobility due to Covid-19 as Engle et al. (2020) , the effects of non-pharmaceutical interventions in terms of Covid-19 spread as Kong and Prinz (2020) and also some country specific analyzes which focus on evaluating the effects on mobility after Covid-19 as Batty et al. (2021) for London, Janiak et al. (2021) for Chile and Benítez et al. (2020) for Latin-America Countries. The approach that we have adopted is inserted on the second causal category with some data elements of the prediction group, as we aim to identify how mobility (even in absence of a strict lockdown) affected the infection evolution by recurring to a causal relation framework. We focus on modeling the growth rate of Covid-19 cases and deaths as in Liu et al. (2021) and Huang (2020) . Our results are in line with Resende and Maciel (2021) , but there are three main distinctions in our approach: (i) we model growth rates instead of total number of infections (this avoids the non-stationarity in the infection series due to the high inertial 3 behavior of Covid-19 evolution); (ii) we use national data instead of focusing on a single state analysis; (iii) we adopted soft-data variables to control for non-observable behavioral actions. In terms of our modelling approach, we created a weekly based panel data for all Brazilian municipalities in order to evaluate the effects of restrictions in mobility in terms of effectiveness while affecting the pandemic evolution. Our paper contributes with the ongoing literature of causal identification of mobility effects based on panel-data evaluation by adding unstructured data (Google Trends and News-indexes) in order to generate proxies for non-observable behavioral variables that affects the Covid-19 spread. We considered a sample that comprehends the period of May, 2020 -August, 2021 (significantly wider than the studies that focused on the effects of mobility) and we also conduct a sub-sample analysis to capture only non-vaccination periods to evaluate locally the effects of restrictions to circulation. Estimation results suggest that increasing residential mobility (reducing overall mobility) diminishes significantly the number of cases (from 6.19% on the first week reaching a 3.02% reduction in four-weeks) and deaths (reducing 2.47% in one-week growing to a 6.51% effect in terms of overall reduction in deaths). For the sub-sample period (2020 only) the effects of reducing mobility are similar to the complete sample analysis, but the effects are accumulated with the effect of workplace mobility in terms of cases (deaths): increasing workplace mobility results in a increase in both cases (about 1%) and deaths (about 2%) over the reference horizon. The results have been shown to be robust to variations in terms of mobility variables added on the model, geographical aggregation of cases, vaccination campaign variables and Dynamic Panel specifications. The remainder of this paper is structured in four additional sections. The next section describes the identification strategy, the Direct Acyclic Graphs (DAG) approach, the fixedeffects and dynamic-panel models and all unstructured data that has been created to proxy for non-observable behavioral effects. The third section describes the data. The fourth section describes the estimation results for both all-sample period (2020 and 2021) and only 4 for non-vaccination period (2020 sub-sample). The last section concludes this paper. We motivate our identification strategy recurring to a Direct Acyclic Graph (DAG) approach, following Elwert (2013) . In Appendix A we provide an introduction to the concept of DAGs. In our specification, we want to estimate the impact of mobility on cases (deaths) due to Covid-19, represented by the β coefficient. However, there are many confounding factors that may affect mobility (or even the infection situation) that generates an omitted variable bias problem. To overcome such an issue, we specify carefully some of those factors following our hypothesis regarding the causal relation between the variables. Some prevention measures such as using masks, washing hands and using hand sanitizer may affect the virus infection, i.e. through individual behavior. However, such variable (denoted B, from now on) is non-observable. Therefore, we should include some control variables to capture some of this effect. We use Google Trends searches (gt-series) and News (n-index), both regarding Covid-19 prevention behavior, in order to capture this omitted effect, represented by coefficients γ b and η b , respectively. We also include lagged Covid-19 cases (deaths) to capture the lagged effects through tge behavioral channel, inducing a lag structure between Covid-19 spreading over time. In Figure 1 we plot the DAG representing the causal relation between the variables that we adopt in order to identify our model. We then structure the channels that are represented on the DAG of Figure 1 on the following manner: we consider that vaccination may affect number of cases (deaths), mobility and individual behavior (measured through Google Trends and News proxies with respect to Covid-19 related keywords and searches). We also consider that gt-series and n-index should affect only mobility and may not affect the spread of the disease through direct channels. Therefore, we should have that mobility is mainly determined by vaccination and individual behavior (measured by gt-series and n-index). Nonetheless, we consider that Covid-19 spread 5 is determined by mobility, individual behavior and vaccination. However, if we analyze solely the first year of the pandemic (2020), the vaccination variable turns out to be innocuous (as the vaccination only started in Brazil by 2021). Therefore, all the channels that relate vaccination with mobility measures, Google Trends searches and Covid-19 related news disappears from the DAG. The result is the DAG present in Figure 2 . Note that the identification of such model is mainly a reduced form of the overall sample model. After motivating the causal relation that we aim to identify, the econometric specification adopted is straightforward. As we deal with a panel of municipalities at weekly base, we recur to the following functional form: where: where β 0 , β, φ, γ, η, ν, α j and δ t are parameters to be estimated. Note that γ = (γ g , γ b ) and η = (η g , η b ) representing gt-series and n-index channels through general Covid-19 related searches or news (g) and through proxies for non-observable behavioral effects (b), respectively. The indexes j ∈ {1, · · · , 5570} represents municipalities, t ∈ {1, · · · , 67} is a weekly time-index and m ∈ {1, 2, 3, 4} is the lag structure that we impose over the regressors of the model 1 . Also, α j is an individual fixed-effect whereas δ t is a time fixed-effect. The constant β 0 represents a common trend trajectory for all municipalities 2 . 1 The lag structure is important to capture delayed effects of model's variables in terms of Covid-19 spreading. 2 All estimations include an intercept due to the normalization adopted by Stata. Instead of setting the intercept equal to zero, the program adopts the following normalization: In terms of estimation, we impose an within transformation with respect to each municipality and we include time-dummies to capture heterogeneous time-effects throughout time. As robustness check, We also estimate the model using a Dynamic Panel structure (Arellano-Bond transformation with four lags) to capture the effects induced by the inclusion of lagged dependent variables in terms of lag structure, considering the inertial behavioral spreading channel. Finally, the coefficients associated with mobility and vaccination represents elasticities, i.e. relates the effects of a one percentage change on mobility with the effects over the growth rate of Covid-19 cases (deaths). The coefficients associated with gt-series and the n-index represents semi-elasticities, as they relates the effects of one additional search (or news) with its associated growth-rate effects on Covid-19 spread. There are mainly two types of data (labeled as "hard data" and "soft data" from now on) that have been used to estimate our model. The main distinction between them is that the first set is disposed in an objective/highly organized format, whereas the second type relates subjective data that could (e.g. Google Trends) or not (e.g. News-Index) have been organized earlier. Soft data relies on text-related data (unstructured) based on counting measures (structured). The first set of hard indicators is the number of cases and deaths by Covid-19, which constitutes our dependent variables. Those have been extracted from SRAG data 3 disposable at the OpenDataSUS website 4 . The main difference between the construction of Covid-19 cases and deaths series regards the filtering date: for number of cases we have set the first symptom date as reference for aggregation, whereas for number of deaths we set the obit 3 SRAG is the acronym for "Vigilância de Síndrome Respiratória Aguda Grave", which consolidates all data related to harsh respiratory syndrome in Brazil, including Covid-19, for 2020 and 2021. Note that SRAG data only consider patients that effectively enter on hospitals due to a respiratory syndrome and therefore does not represents mild cases. 4 OpenDataSUS is an initiative of the Ministry of Health of Brazil. 7 date as reference. In both cases we construct series at the municipality level based on the residence and notification area of the Covid-19 cases (deaths). Such distinction produces different aggregations as they constitute different hypothesis in terms of disease contamination process. In Figure 3 we compare the effects of different types of date aggregation for number of cases and deaths. The next set of hard indicators comprehends mobility measures that constitute the object of interest in our estimations. These variables are break down in six mobility categories (workplace, residential, parks, transit, grocery and retail) 5 and compared to the 5-week baseline period of Jan 3 -Feb 6, 2020, in terms of percentage change. We extract mobility data from the Covid-19 Community Mobility Reports from Google. Regarding controls, we start by extracting vaccination hard-data regarding timing and immunization type (first or second dose) also from the OpenDataSUS website, which comprehends data at the individual level. As a robustness check, we also collect vaccination from the SRAG data set. The two data sets differs as the first considers the national vaccination campaign, whereas the second considers individuals that are actually inside the SRAG accounting. Both types of vaccination variables are consolidated into a weekly-based period at the municipality level. The other two variables, Google Trends and News-Index (gt-series and n-index), constitutes our set of soft-controls. Google Trends data have been extracted using the Google Trends API and reveals the number of searches of a given topic for a certain period. News-Index data has been generated based on news collected from G1 6 that possess an in-depth coverage of Covid-19 in Brazil 7 . These two set of controls relies on a subjective categorization of search terms and keywords selection that needs to be specified in order to generate data-series for our estimates. The formulation of the indexes and the categories/keywords 5 Residence mobility is measured in time spent in-locus, whereas the other five categories are measured in terms of number of visitors. 6 G1 is a local newspaper that belongs to Grupo Globo. 7 Only Covid-19 related news constitutes about 142,697 for the period of May 1, 2020 -August 1, 2021. The advantage of considering G1 news is that they are divided into sub-regions, i.e. news-data is locally stamped at the state level. 8 are presented in Appendix B. Finally, we constitute a weekly base panel data at Municipality level 8 starting on May 3, 2020 up to August 1, 2021 9 . Table 1 and 2 includes the descriptive statistics and correlation matrix, respectively, for all series used on our estimate. The results of the estimations are divided into two different sections. The first sub-section analyses the effects of mobility over Covid-19 cases (deaths) through all sample (2020-2021), whereas the second sub-section restricts our model to the sub-sample period of 2020. In both cases, restrictions in mobility tends to display relevant impact on the evolution of the Covid-19 infection rate. To estimate Equation (1), we need to make a consideration regarding the number of lags m ∈ {1, 2, 3, 4} used on our specification. In terms of Covid-19 cases, it is known that the average number of days taken up to the first symptom is about five days (see Cintra and Fontinele (2020)). Therefore, inside the 5-days window, the individual may spread the virus without knowing about his infection situation. In terms of deaths, we computed the median number of days taken from the first symptom to obit, represented in Figure 4 . The results suggest that, in the SRAG sample, the median is about seventeen days (third week), while the minimum is about eight days (second week) and the maximum is twenty two days (fourth week). As we consider weekly windows, the horizon that we may consider for cases is from one to four weeks (a month) after the infection and two up to four weeks (a month) for deaths. Regarding mobility, Table 1 reveals that only workplace mobility is present in all munic-8 Only soft data controls as Google Trends and our News-Index are disposed at State Level, whereas all other variables are disposed in more granular Municipality level 9 Two important points: (i) We start (May, 2020) and end our reference panel (August, 2021) to avoid issues regarding both tail data problems or post-publication of statistics from the government, i.e. lack of update of vintages; (ii) the panel data covers a full-week period starting by Monday of a reference week ending at Sunday of the same week. ipalities that are available in Google's Mobility website. Coming after workplace, residential mobility is also present in a high share overall municipalities. This is not the case for the the other four mobility measures. This generates a higher probability of measurement errors and missing observations in a non-random form. We therefore choose to run the regressions only on the first two measures to obtain a cleaner specification. We also present a full model with all six mobility measures and the results that we find are robust to this inclusion. The estimation results are displayed on Table 3 , based on fixed-effects estimations for Covid-19 cases (columns 1 to 4) and deaths (columns 5 to 8). In each column, we highlight the number of lags m growing from m = 1 up to m = 4 weeks for cases and from m = 2 to m = 4 for deaths. The coefficient of interest is the residential and workplace mobility, as they are associated with the highest number of observations for municipalities. The first important result is that increasing residential mobility (i.e. decreasing overall mobility) reduces the growth rate of both number of infections and deaths through the sample. The impact over Covid-19 cases is higher over the first reference week, where it diminishes cases at a 6.19% rate, and slowly decreases through the 4-week window, reaching 3.02% after one month. Regarding Covid-19 deaths, we also observe that reducing mobility affects negatively the growth rate of deaths. However, this effect grows from 2.47% reaching 6.51% at a month horizon. If we consider that this effect can be combined week-by-week (inducing an overestimate of the overall effect) we can determine that the overall (maximum) upper bound effect of a 1% decrease in mobility results in a reduction of 20.83% in cases and a reduction in 14.35% in deaths, both at a one-month horizon. The vaccination, gt-series and n-index controls displays important role while capturing the effects suggested on the DAGs. This is also observable with the inertial behavior that seems to be captured on the coefficients associated with the lagged dependent variables. The overall R-squared is at least 18% for cases and 23% for deaths (higher explanatory power), and the F-test rejects the null hypothesis that all coefficients are zero for all estimates. All lagged dependent variables display negative inertial effect on the evolution of both cases and deaths. On Appendix C we display first stage estimations with the objective of validating the DAG channels presented on Figure 1 assumptions displayed on Section 2. In Table 5 we display estimation results of mobility against gt-series, n-index and lagged dependent variables and we find high associated R-squared and null rejection of F-test. Table 6 relates cases (deaths) regression against vaccination campaign, finding mostly negative coefficients as expected. Table 7 relates mobility with the national vaccination campaign and with SRAG vaccination, finding high associated R-squared. It is important to notice that the SRAG data has higher explanatory power, but only consider vaccination on individuals inside the SRAG data set, which is restrictive in terms of causal determination. We also display a regression with all six mobility measures on Appendix D. The results present in Table 8 suggests some ambiguity: transit and grocery seems to display a negative impact on the growth rate of cases and deaths. We do not take these estimated coefficients as pure effects due to: (i) lack of observations in smaller counties; (ii) the effects are smaller than 0.5%, to little to be tacking into account. Additionally, Appendix D display three different estimation outputs. Table 9 uses notification area instead of residence place while aggregating Covid-19 cases (deaths). The results are similar to the ones that has been founded using residence area. Table 10 uses SRAG vaccination data (log of vaccines for SRAG) instead of national campaign vaccination data (that comprehends first and second dose breakdown), with similar results. Finally, Table 11 uses a Dynamic-Panel (Arellano-Bond with four lags) methodology (in line with Liu et al. (2021) ) to estimate recursively the effects of the lagged variable impact over the dependent variable and the residential mobility estimates still reveals a negative sign and with decaying (increasing) effects for cases (deaths). This section has the objective of estimating the impact of mobility on only at the first year of the Covid-19 pandemic. By limiting the sample solely for 2020, we can redefine the causal relation that we aim to identify as the vaccination only started in 2021. This results on the DAG that is presented on Figure 2 , in which we remove the vaccination node and solely focus on mobility-related variables. The estimation results are presented on Table 4 and two effects are directly observable: (i) the negative sign associated with residential mobility is still present; (ii) workplace mobility displays an important role while explaining number of cases (deaths) due to Covid-19 spread. The magnitude of the effects of reducing mobility and growth rate of number of cases is still on the order of 5.04% on a first-week horizon, decreasing to 4.08% at a month. For deaths, this effect frows from 3.31% over two weeks up to 6.25% after a month. The effects of increasing mobility, however, tends to increase over the estimation window for both cases and deaths growth rate, oscillating positively around 1-2%. By recurring to the same exercise of combining the effects in order to generate an upper bound for mobility effects, we can check that a 1% mobility increase for workplace generates a similar 20.73% and 16.36% reduction for cases and deaths, respectively. There is however the effect of the reduction on workplace mobility, that a 1% decrease can combined generate a 4.8% and 6.07% decrease in cases and deaths, respectively. In Appendix D we also display Dynamic-Panel estimates for the 2020 sub-sample and the coefficients still presents compatible magnitudes to the ones observed in Table 4 . In this paper we aimed in developing an empirical framework to be able to address the questions related to causal effects of mobility restrictions in terms of Covid-19 infections. By recurring to a DAG approach, we developed causal channels that required the design of "soft-data" proxies to capture non-observable effects of individuals, as prevention measures. The methodology adopted to estimate the effects is in line with the panel-data estimates by Liu et al. (2021) and Huang (2020) with even similar dimensions in terms of the mobility elasticity sizes: a 1% reduction in mobility (through an increase in residential mobility) reduces cases and deaths in a one-month horizon for both 2020-2021 sample and 2020-only. The combined upper bound effect is around a 20% reduction for cases and 15% reduction for deaths. There are some limitations in our analysis based on potential non-linearities on the effect of vaccination over mobility: vaccines can induce higher or lower mobility depending on the overall vaccination campaign or even on the Covid-19 infection level. We therefore avoid interpreting vaccination related coefficients as they can be misleading. The same happens while analyzing other mobility coefficients, due to the intrinsic measurement error in those variables. Finally, the paper suggests that restricting mobility is able to reduce the number of cases and deaths with particular robustness throughout sample and methodological evaluation. The objective of developing a causal framework based on DAGs is sustained by the estimation outputs and the mobility effects are solely determined by the causal hypothesis. Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Fixed-Effects Estimation over 2020 and 2021 sample, aggregating cases (deaths) by residence and using overall vaccination from OpenData SUS data set. R 2 denotes the R-squared, R 2 overall the overall R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Fixed-Effects Estimation for 2020 only, aggregating cases (deaths) by residence. R 2 denotes the R-squared, R 2 overall the overall R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. The complete version of the DAG suggests how mobility (X) interacts with vaccination (V ), Google Trends (G b ), News (N b ) and behavioral effects (B). Each arrow suggests a connection between two variables. Following Elwert (2013) , only missing arrows makes assumptions regarding causal relations. The circle around the behavioral variable (B) denotes a non-observable variable, while Y lag denotes a 4-dimensional vector of lagged observations of Covid-19 cases (deaths). Also, Google Trends general searches (G g ) and general News (N b ) are related to vaccination (V ) and mobility (X). The shrinked version of the DAG (only for 2020) shuts down the vaccination channel that affects mobility, behavior and cases (deaths). Such transformation would imply in a cleaner identification of mobility effect, but with local validation only, not extending to the subsequent second wave of Covid-19 spread. Comparison between first symptom date (cases) or obit date (deaths) and the notification date at daily basis. The notification date is much more volatile than the effective series for cases (deaths). Regarding the obit date, the series almost coincides with its associated 7-days rolling-window. The plot reveals the median number of days taken from first symptom up to obit from patients that are considered on the SRAG data over the period of May, 2020 up to August, 2021. We also represent a 7-days rolling window (R.W.) as a smoothing and the overall median (17 days). This time series is important while defining the number of lags to consider on the estimation. The usage of DAGs in causal inference have roots in Pearl (1995) and Pearl (2009) , providing causal interpretation based on variables relations. Based on our estimation, the principal objective is to generate identification of the causal effect of mobility on Covid-19 cases and deaths. Our aim is to isolate causal from noncausal associations. As described by Elwert (2013) , DAGs are a powerful tool to identify what control variables should we include and which we should not include to "achieve identification". The main idea of using DAGs consists in generating a clear and objective graph that should encode the main causal relations that we aim to describe in our model. In a glance, we should interpret a DAG based on three elements, following Elwert (2013): (i) Variables, that are represented in nodes; (ii) Arrows, suggesting possible direct causal impacts; (iii) Missing arrows, encoding "sharp assumptions" about absence of causality effects. Ideally, if we were able to observe mobility in a way that it is not affected by external elements, i.e. exogenously (e.g. a randomized version of mobility), we would retrieve the causal effect of mobility on Covid-19 variables in a direct way. This representation would imply in the Figure 5 Graph: Figure 5 : DAG representing causal chain between mobility and cases (deaths) X Y β As we do not observe this "artificial" measure of mobility, we cannot retrieve a causal effect merely by regressing solely those two variables. Therefore, we should include potential control variables that are related to mobility measures in order to remove the omission bias. As an example, take vaccination as a potential control: this variable affect both mobility and number of cases and deaths. Therefore, its inclusion as an regressor (estimating η) is necessary to correctly identify the causal effect of mobility, i.e. without vaccination we would bias the estimation of β through the channel of the η relation. This result on the Figure 6 DAG: Figure 6 : DAG representing causal chain between vaccination, mobility and cases (deaths) By proceeding in the same fashion, we identify potential variables that are useful controls in our identification strategy, as Google Trends searches and news related to Covid-19. The result is the complete DAG that we present on Section 2. To generate our soft data controls, we need to specify the Google Trends (gt-series) search terms chosen by individuals or the keywords used to constitute the News-Index (n-index). The objective is to be concise and precise while selecting the terms in order to capture a general overview of the Covid-19 situation at the individual level that may affect both mobility and number of cases (deaths), according to the DAG proposed in Figures 1 and 2 . We also focus in generating compatibility within search terms and keywords for both controls. There are two main categories for both gt and n-series that encode general effects (g) and behavioral effects (b). To capture general effects (g) of the Covid-19 pandemy, we selected words that refer to: (i) Covid-19 related terms; (ii) Fake news terms; (iii) Vaccination terms. The first topic has been chosen with the objective of getting the overall trend of the number of infections over time. The second topic has the objective of allowing the direct impact of fake news spread over Covid-19 evolution. The third topic is related to the vaccination campaign to correctly control the infection evolution. To capture the effects of the behavioral category (b), we selected prevention related terms. This category should embed individual behavior (not observed by hard data indicators) that may affect the infection evolution throughout time. As behavior is a non-observable variable, the absence of such terms would bias the effects of mobility on cases (deaths). Formally, as the sets of search words and keywords used to create gt and n-series are equal, we will define the indexes suppressing the gt or n marker. Recall that we have a panel data (at state level and weekly frequency) of each topic. Therefore, each observation is indexed by state marker j ∈ {1, · · · , 27} and a time stamp t ∈ {t 0 , · · · , T }. For each gt and n-series of controls, we have a vector g (j,t) = (g 1,(j,t) , g 2,(j,t) , g 3,(j,t) , g 4,(j,t) ) that represents the four above categories. For each category g i,(j,t) , for i ∈ {1, 2, 3, 4}, there are n i,(j,t) associated search words (for gt) or keywords (for n), denoted g i,w,(j,t) . We thus generate the indexes in the following manner: for each i ∈ {1, 2, 3, 4}, j ∈ {1, · · · , 27} and t ∈ {t 0 , · · · , T }. As an example, consider the News-Index (n) regarding prevention related terms, i.e. i = 4. In this case, we have n 4,(j,t) = 13 and w denotes a certain topic (e.g. w = mascara). Therefore, the g 4,mascara,(j,t) denotes the number of counts of "mascara", that belongs to prevention related terms, over each state j and each week t. Finally, the News-Index for prevention related terms is given by: Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Fixed-Effects Estimation over 2020 and 2021 sample, aggregating cases (deaths) by notification area. The first stage estimation has the objective of validating the causal relations suggested on the DAG present on Figure 1 . R 2 denotes the R-squared, R 2 adj the adjusted R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Fixed-Effects Estimation over 2020 and 2021 sample. We regress cases of Covid-19 taken by residence area against Covid-19 vaccination evolution also taken by residence area. The first stage estimation has the objective of validating the causal relations suggested on the DAG present on Figure 1 . R 2 denotes the R-squared, R 2 overall the overall R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Fixed-Effects Estimation over 2020 and 2021 sample. We regress deaths of Covid-19 taken by residence area against Covid-19 vaccination evolution also taken by residence area. The first stage estimation has the objective of validating the causal relations suggested on the DAG present on Figure 1 .R 2 denotes the R-squared, R 2 overall the overall R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Fixed-Effects Estimation over 2020 and 2021 sample, aggregating cases (deaths) by residence area and using overall vaccination from OpenData SUS data set. R 2 denotes the R-squared, R 2 overall the overall R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Fixed-Effects Estimation over 2020 and 2021 sample, aggregating cases (deaths) by notification area and using overall vaccination from OpenData SUS data set. R 2 denotes the R-squared, R 2 overall the overall R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Fixed-Effects Estimation over 2020 and 2021 sample, aggregating cases (deaths) by notification area and using vaccination data from the SRAG data set. In this case, the vaccination data reveals only series for individuals that has been vaccinated but entered on the hospital due to Covid-19 infection. R 2 denotes the R-squared, R 2 overall the overall R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Dynamic-Panel Estimation (Arellano-Bond using four lags) over 2020 and 2021 sample, aggregating cases (deaths) by residence area and using overall vaccination from OpenData SUS data set. R 2 denotes the R-squared, R 2 overall the overall R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. Standard errors in parentheses * p < 0.05, * * p < 0.01, * * * p < 0.001 a Results for Dynamic-Panel Estimation (Arellano-Bond using four lags) over the restricted 2020 sample, aggregating cases (deaths) by residence area and using overall vaccination from OpenData SUS data set. R 2 denotes the R-squared, R 2 overall the overall R-squared, N the total number of observations used on the estimation and p the F-test associated p-value. Measuring economic policy uncertainty. The quarterly journal of economics The role of mobility and sanitary measures on covid-19 in costa rica London in lockdown: Mobility in the pandemic city Synthetic control, synthetic interventions, and covid-19 spread: Exploring the impact of lockdown measures and herd immunity Responses to covid-19 in five latin american countries Lockdown effects in us states: an artificial counterfactual approach What works to control covid-19? econometric analysis of a cross-country panel Estimative of real number of infections by covid-19 in brazil and possible scenarios The economic effects of covid-19 containment measures Graphical causal models. In Handbook of causal analysis for social research Staying at home: mobility effects of covid-19. Available at SSRN 3565703 Predicting regional covid-19 hospital admissions in sweden using mobility data How effective is social distancing? Available at SSRN 3680321 Covid-19 contagion, economic activity and business reopening protocols Disentangling policy effects using proxy data: Which shutdown policies affected unemployment during the covid-19 pandemic Mental health effects of covid-19 pandemia: a review of clinical and psychological traits Panel forecasts of country-level covid-19 infections Causal diagrams for empirical research Social distancing and covid-19: Some evidence at the municipality level in brazil. Available at SSRN 3881417 A deeplearning model for evaluating and predicting the impact of lockdown policies on covid-19 cases Predicting covid-19 spread from large-scale mobility data Mobility and economic impact of covid-19 restrictions in italy using mobile network operator data