key: cord-0276523-w5cab1ye
authors: Spiliopoulos, L.
title: On the effectiveness of COVID-19 restrictions and lockdowns: Pan metron ariston
date: 2021-07-07
journal: nan
DOI: 10.1101/2021.07.06.21260077
sha: d7cad73ccd00538f797186097f6c40935f89c268
doc_id: 276523
cord_uid: w5cab1ye

I examine the dynamics of confirmed case (and death) growth rates conditional on different levels of severity in implemented NPIs, the mobility of citizens and other non restrictive policies. To account for the endogeneity of many of these variables, and the possibility of correlated latent (unobservable) country characteristics, I estimate a four structural model of the evolution of case growth rates, death growth rates, average changes in mobility and the determination of the severity of NPIs. There are strongly decreasing returns to the stringency of NPIs, especially for extreme lockdowns, as no significant improvement in the main outcome measures is found beyond NPIs corresponding to a Stringency Index range of 51-60 for cases and 41-50 for deaths. A non-restrictive policy of extensive and open testing has half of the impact on pandemic dynamics as the optimal NPIs, with none of the associated social and economic costs resulting from the latter. Decreases in mobility were found to increase, rather than decrease case growth rates, consistent with arguments that within-household transmission, resulting from spending more time at residences due to mobility restrictions, may outweigh the benefits of reduced community transmission. Vaccinations led to a fall in case and death growth rates, however the effect size must be re-evaluated when more data becomes available. Governments conditioned policy choice on recent pandemic dynamics, and were found to de-escalate the associated stringency of implemented NPIs more cautiously than in their escalation, i.e., policy mixes exhibited significant hysteresis. Finally, at least 90% of the maximum effectiveness of NPIs can be achieved by policies with an average Stringency index of 31-40, without restricting internal movement or imposing stay at home measures, and only recommending (not enforcing) closures on workplaces and schools, accompanied by public informational campaigns. Consequently, the positive effects on case and death growth rates of voluntary behavioral changes in response to beliefs about the severity of the pandemic, generally trumped those arising from mandatory behavioral restrictions. The exception being more stringent mandatory restrictions on gatherings and international movement, which were found to be effective. The findings suggest that further work should be directed at re-evaluating the effectiveness of NPIs, particularly towards empirically determining the optimal policy mix and associated stringency of individual NPIs.

The ancient Greek dictum pan metron ariston asserts that moderation is best, implying that extreme measures, regardless of which end of the spectrum they are located, are unlikely to be the best policy. With regards to the SARSCoV-2 pandemic, the one extreme, of no intervention and restrictions was ruled out by most governments early on. The other extreme, of severe restrictions and even complete lockdowns, has not proven to be a bête noire for many governments. Severe lockdowns were imposed during the rst wave of the pandemic, when uncertainty regarding the transmission and mortality of Covid-19 was maximal.

Such an approach can be justied as a maxmin reaction in the face of uncertain events, minimizing the worst possible outcome. As we learn more about Covid-19 and uncertainty is reduced, better estimates of both the probability distribution and magnitude of possible outcomes can be inferred, allowing for a more nuanced approach aligned with an expected value calculation of costs and benets. However, severe NPIs remain in place in many countries even during the second and third waves, despite the fact that we are now arguably more informed. This can be seen in the evolution of the stringency of government-imposed NPIs (the Stringency Index, SI, on a scale from 0 to 100) plotted in Figure 1 , which includes the median SI value across countries and the 10th and 90th percentiles. Median SI peaked during the rst wave in the month of April, reaching a maximum of 84. The median SI then trended slowly downwards and leveled o to a range of 5560, whilst slowly trending up again as subsequent waves led to a resurgence of the pandemic. As of April 2021, while NPIs had not reached the peak levels of the rst wave, the median SI across countries at the last datapoint (14/4/2021) was still relatively high, 64 (10th perc.=31, 90th=81). Another important observation is that after the peak in April 2020, governments have followed increasingly heterogeneous NPIs as the dierence in the 10th and 90th percentiles increased from roughly 30 SI points to oscillating around 50 points since late 2020.

The aim of this study is to exploit this divergence in government responses for improved identication of the eects of the strictness of implemented NPIs on the pandemic dynamics. I will extend earlier work based on limited data from the rst wave (Haug et al. 2020; Islam et al. 2020; Brauner et al. 2021; Bo et al. 2021) and also deal with certain econometric issues with prior analyses. Does the accumulated data from over a year support the continued imposition of stringent NPIs? I will test the whole spectrum of Covid policies in terms of varying degrees of severity to determine whether metron is indeed ariston. If so, it is crucial that we know where exactly the metron lies within the spectrum of policy responses because restrictions and lockdowns have an immense eect on economic and psychological well being, translating into negative health outcomes in the future (Miles, Stedman and Heald 2020; Susskind and Vines 2020; Altman 2020) . The latter are of course dicult to quantify, leading to a propensity to focus instead solely on the positive, immediate eects of NPIs. However, tradeos are everywhere and we ignore them at our perilopen debates allowing the free exchange of ideas are of paramount importance (e.g., Melnick and Ioannidis 2020). Tradeos in Covid-19 research Tradeos also exist in the choice of pandemic research methodologies, both in terms of the type of model and the data it is estimated on. For example, models may be estimated on data at the national or subnational level. Working at the level of the latter has the benet of better controlling for country infection dynamics and avoiding possible issues arising from dierences in pandemic accounting standards across countries (May 2020) . Since data at the national level is broader in terms of country coverage than sub-national data, analyses at the level of the former may carry more external validity and will be less likely to fall prey to overtting due to small sample noise. Finally, the tradeo between aggregated (assuming homogeneity or pooled) or disaggregated (individual) modeling depends strongly on The impact of policy interventions can be examined under the prism of three types of dierent models with their own set of tradeos: a) detailed models of epidemiological processes such as SIR-based models (Visscher 2020; Dehning et al. 2020; Davies et al. 2020) , b) agent based modeling of transmission in a population (e.g., Aleta et al. 2020 ) and c) reduced form models that abstract away from these epidemiological processes and agent interactions at the micro-level, by using simple econometric models based on empirical data, rather than inference of complex epidemiological parameters. Epidemiological models, while desirable as they directly model the underlying mechanisms, are prone to overtting in the presence of scant or low-quality, noisy data, leading to non-robustness (see Chin et al. 2020; Soltesz et al. 2020 ) since they require estimates of key parameters that are highly uncertain, and whose impact may reverberate signicantly in highly non-linear, exponential growth models. No single approach dominates the rest, especially at the current stage of our understanding of the pandemic. Dierent approaches must be simultaneously pursued whilst acknowledging the limitations and advantages of each in an attempt to consolidate the ndings.

As the literature on Covid-19 is burgeoning, I briey survey the most similar studies, primarily those employing reduced-form equations. Early discourse was heavily inuenced by the epidemiological modeling of the Imperial College COVID-19 Response Team (Flaxman et al. 2020) , which concluded that lockdowns were a crucial and necessary strategy for the subduement of the pandemic. Sub-national data from six countries revealed that while not all specic NPIs have a signicant impact, overall the implementation of NPIs signicantly reduced infections (Hsiang et al. 2020) , without however, explicitly accounting for how 3 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint restrictive individual NPIs were. Dichotomizing the range of NPIs into less-restrictive (e.g., social distancing, discouragement of international and domestic travel, and large gatherings bans) and more-restrictive (e.g., stay-at-home and business closure orders) reveals no evidence of additional gains associated with the latter in ten countries at the subnational level (Bendavid et al. 2021) . Similarly, stringent lockdowns in 24 European countries over the rst half of 2020 did not signicantly improve pandemic dynamics over less stringent lockdowns (Bjørnskov 2021) . This was conrmed using data from 108 countries until May, 2020 , (Bonardi et al. 2020 with the implication that NPIs may have a signalling eect leading to voluntary behavioral changes by citizens. Less stringent NPIs already provide a strong enough signal, so that there is nothing to gain from more stringent NPIs. The importance of voluntary behavioral changes is exemplied by the signicant changes in mobility found to occur earlier than mandatory restrictions on movement (Abouk and Heydari 2021; Farboodi, Jarosch and Shimer 2020) , and by the fact that mobility did not return to previous levels after the easing of lockdown restrictions (Gapen et al. 2020; Allcott et al. 2020) .

I seek to combine the advantages of reduced-form approaches, whilst alleviating some of the drawbacks in their implementations to date. Specically, earlier studies: a) eschewed behavioral components and agents' incentives, b) estimated a single regression disavowing variable endogeneity, and c) were estimated on datasets from the rst few months of the pandemic without the benet of the currently accumulated data covering other developments such as newer variants.

The importance of behavioral models Models can be classied according to whether they are behavioral or not, i.e., are the behavioral responses of citizens and governments allowed to vary endogenously or are they assumed to be xed? Agent-based models are by construct behavioral, however standard SIR models are not, and reduced-form work typically estimates single-equation regressions of eects of various NPI variables on either the conrmed case (and or death) growth rate or mobility data, thereby implicitly assuming exogeneity of these variables. However, it reasonable to believe that mobility is dependent on the severity of the NPIs in place, i.e., it may mediate the impact of NPIs on Covid-19 dynamics. Furthermore, NPI stringency may also be endogenous if governments base their policy decisions on recent epidemiological data and trends, e.g., infection growth. Consequently beyond modeling only SARS-CoV-2 dynamics, agents' adaptive reactions to the situation merit attention: those of citizens (Cowling et al. 2020 ) (through mobility, precautionary and voluntary measures and other behavioral responses) and those of governments through the choice and timing of policies.

Beyond the foundational use of behavior in agent-based models, behavioral components can be introduced both to epidemiological and reduced form models. The former is signicantly more common than the latter, which I will pursue in this study. The incorporation of behavioral components in SIR models leads to very dierent long-run predictions than models based on standard non-behavioral SIRs (Kermack and McKendrick 1927) . A common nding is that the system tends to an equilibrium reproduction rate of 1 (Gans 2020) , sometimes with oscillations if behavioral components operate with lags (Cochrane 2020; Atkeson 2021) . These studies highlight the importance of modeling the evolution of behavioral responses, as they can lead to important qualitative, not just quantitative, changes in pandemic dynamics. The canonical behavioral response depends on the perceived risk of infection and severity; e.g., consumers' behavioral changes in shopping habits were predominantly of their own volition rather than through the imposition of legal restrictions (Goolsbee and Syverson 2021) . Behavioral incentives can have a stabilizing eect on the infection growth rate, as the higher it is, the more likely citizens are to react by adjusting their behavior to decrease the chances of infection. However, it is important to also bear in mind that individual behavior 4 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint may become more reckless as risks are mitigatedsee the risk-compensation literature (Peltzman 1975) . For example, as the pandemic starts to wane, citizens may adopt more risky practices, thereby slowing down the decrease in the infection rate. Similarly, extensive testing and vaccinations may elicit adverse behavioral responses if citizens believe they are less likely to contract Covid-19 and infect others.

The importance of beliefs and expectations Importantly, there may be latent (unobservable) variables, such as culture (Laliotis and Minos 2020) , at the country-level that jointly inuence key variables that are typically assumed to be exogenously determined instead. For example, low social capital and trust in government may aect case growth rates directly and indirectly, through other explanatory variables.

Suppose that low-trust countries also tend to have inadequate public health systems. This will have a direct impact on case and mortality growth due to inadequate health care, but also an indirect eect. The government, knowing that the health system is weak (e.g., scarce intensive care units), may impose more stringent NPIs compared to countries with better health systems, in a attempt to prevent them from lling to capacity. Expectations and beliefs of citizens and governments can further introduce endogeneity. Low trust between citizens and government institutions may lead citizens to underestimate the severity of the situation as presented by governments, and the government, expecting this, may act to impose stricter restrictions in anticipation of the weaker behavioral response by its citizens.

Simultaneously modeling endogeneity and unobservable variables, behavioral incentives and beliefs Acknowledging that conrmed case and death growth, mobility and government policies are endogenous requires a system of four equations to fully model these interactions and avoid biases resulting from simpler regressions that implicitly impose exogeneity. I complement the literature by employing econometric procedures, namely structural multi-equation modeling including unobservable or latent variables and their eects, to improve upon the external validity of prior analyses. Such a system of equations can be viewed through the lens of game theory, that is, the modeling of dierent agents, who react to each others strategies and expectations thereof. I propose, an admittedly rudimentary, model of the adaptive interactions of these three agents. 1 My approach complements existing studies by investigating a large number of countries, with the associated benets of pooling estimates for robustness, but other disadvantages such as the assumption of homogeneity across countries and the need to focus on aggregated measures of key endogenous variablesrather than their individual components (Haug et al. 2020) to avoid a computationally-intractable and unidentiable system of equations. A central question is whether there are decreasing returns to the eectiveness of NPIs with severity, and if so, to estimate the approximately optimal severity of government policies. This remains a critical question not only for future pandemics, but also because NPIs may be valuable even during the vaccination phase (Rella et al. 2021) .

National-level data from 132 countries covering the time period from 15th February 2020 to 14th April 2021 was compiled, extending previous analyses to data including the appearance and spread of the B.1.1.7 and B.1.351 variants detected late 2020, and the more recently discovered variants such as P.1. Conrmed case and death counts, vaccinations, and tests, were download from the Covid-19 Data Hub (Guidotti and 1 Unfortunately, while very interesting, a more rigorous game-theoretic analysis would require signicantly more data to eectively infer or observe expectations of agents and the multitude of ways to react to said information. 5 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint Ardia 2020), which also included the implementation of NPIsa composite score or Stringency Indexsourced originally from the Oxford COVID-19 Government Response Tracker (Hale et al. 2021) . Mobility data from the Google Community Mobility Report (Google 2020) was merged with the datale from the aforementioned database.

The growth in conrmed cases (deaths) was calculated as the log dierence in the cumulative conrmed cases (deaths) for two consecutive days multiplied by 100 (i.e., they can be interpreted as approximate percentage growth rates). The summary statistics for growth in conrmed cases are: # of obs. 53,279, mean=2.74%, stdev.=8.87% and for deaths: # of obs. 49,366, mean=1.997%, stdev.=6.84%. To allow for a non-linear relationship between the Stringency Index and case/death growth, a semi-parametric approach was implemented by subdividing the SI (ranging continuously from 0 to 100) into a baseline of no restrictions (0) and the deciles (110, 1120, ..., 91100). The set of (non-restrictive) policies that we examine are: the testing policy (TP) variable, which takes on three levels (l) of 0 (no testing) through to 3 (open testing), contact tracing (CT) policy takes on levels of 0 through 2, the proportion of the population tested per day (TPop) and the cumulative percentage of the number of vaccinations compared to the country's population (V)this can be greater than 100% due to some vaccines requiring more than one dose. I address the issues identied above by simultaneously estimating four generalized structural equations to model the complex inter-relationships between variablessee Figure 2 for a graphical representation of the causal structure and equations 14. The dependent variables for each equation are: (eq. 1) the growth rate of conrmed cases (Ċ ), (eq. 2) the growth rate of conrmed deaths (Ḋ), (eq. 3) the seven-day moving-average of mobility M i,t and (eq. 4) the ordered categorical variable S l i,t derived from the stringency index.

Equations 14 below completely dene the econometric model. It incorporates lagged relationships between the case (and death) growth rates, and other variables arising from the approximate delay in symptom onset and case conrmation and the timing of exposure to the virus (Rothan and Byrareddy 2020). The lags for the growth rate in deaths are longer than those for cases (by 14 days) to reect the fact that on average deaths occur at a much later time after infection, e.g., due to deaths occurring after lengthy hospitalization in intensive care units. The exact determination of the lags is not critical, as there is a high degree of autocorrelation of the key lagged variables. To simplify the model, all lags and moving-averages were multiples of 7 days to remove the eects of systematic daily variations.

The 7-day moving average of the percentage change in mobility (compared to the pre-Covid baseline for each country) is denoted by M i,t this was constructed by an equal-weighting of three individual variables separately measuring mobility towards groceries and pharmacies, transit stations and workplaces. Since this variable is normalized at a dierent baseline per country, additional random eects at the country level were not implemented in equation 3, in contrast to all other equations. 6 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint 

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint

In eq. 4, SI l i,t is an ordered variable of the stringency index ranging in levels from 0 to 10, in equation 1 it enters as a set of dummy variables. An indicator function captures government hysteresis or pathdependence I Ċ i,t−7 −Ċ i,t−14 ,which is equal to 1 if the case growth rate has been increasing or zero if it has been decreasing. If the estimated coecient δ 2 > 0 then this means that for the same level of case growth, governments will on average be more likely to impose more stringent NPIs if the case growth trend is negative than if it were positive.

The initial date used in the estimation varied by country as it was to set to the rst day with a conrmed case. Any missing datapoints for the conrmed cases and deaths, number of tests and vaccinations at time t were set equal to the value at time t − 1; note, this was done on the raw data of these variables, which were coded as cumulative sums until time t. Country-level unobservable characteristics were modeled using unique random eects λ e i for equations e =1, 2 and 4, allowing for covariance between all the random eects to capture the joint impact of unobservable country characteristics. Fixed-eects for the day of week are denoted by W l t .

Possible hysteresis, or path-dependence, in governments' determination of the level of NPI stringency is also modeled, conditional on whether the case growth rate has recently been declining or increasing. The hypothesis is that governments are slower at scaling back stringent NPIs as the pandemic threat recedes than they are in implementing them during outbreaks. Fixed-eects accounted for the possible inuence of the day of the week on case and death rates. Cluster-robust standard errors were employed to account for heteroskedasticity and within-panel dependence of errors. Of course, the usual disclaimers hold regarding the use of observational data and the causal assumptions embedded in the chosen structural equations and variable relationships.

Detailed regression results can be found in Tables 2 and 3 , below I present the main estimates graphically.

Highly correlated latent (unobservable variables) inuence both government policy and conrmed case/death growth rates The estimated covariances (and associated correlation) between the country-level latent variables in equations 1, 2 and 4 are all positive and signicantly dierent from zero at the 0. All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint Higher NPI severity restricts non-residential mobility Non-residential mobility is clearly impacted by the SI (see the estimates in Table 2 and Figure 3 below), as the null hypothesis that all SI dummy variables are equal to zero is rejected (χ 2 (10) = 428.38, p < 0.0001). Furthermore, it is monotonically decreasing in the ten SI ranges.

Lower non-residential mobility increases case and death growth rates The theoretical motivation behind restricting mobility through lockdown measures is that its reduction will lead to a fall in the number of social interactions, with the hope that this will reduce transmission and infection. However, a reduction in non-residential mobility must, by denition, be mirrored by an increase in the time spent within residencesindeed, they are highly negatively correlated in the Google mobility data; the median correlation within panels is -0.89. Consequently, reductions in non-residential mobility may have a detrimental eect on case/death growth as although transmission outside the home may be reduced, within-household transmission may be enhanced as people spend increasingly more time within the connes of a closed space with others (Sun et al. 2021 ). Since the two eects work in opposite directions, whether restricting non-residential mobility reduces case/death growth or not must be resolved empirically. Our nding that the coecient of The direct link between SI and case/death growth rates Turning to the direct eect of the stringency index of restrictions on case growth rates, the hypothesis that the set of β c 2,l estimates are all equal 9 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint to zero is rejected (χ 2 (10) = 137.28, p < 0.0001). Furthermore, there is evidence of a nonlinear relationship, with strongly decreasing returns to the eectiveness of restrictions with increasing severity (see Table 2 , and note that the conclusions are similar for death growth rates).

The total eect of NPI stringency on case/death growths and the optimal level of SI The total eect of NPI stringency on case growth can be computed by adding the direct β 2,l and indirect paths γ 1,l × β 1 for each stringency level of l. This combined eect of SI onĊ i,t andḊ i,t is documented in Table 4 and presented graphically below in Figure 4 .

The maximum impact of SI on case growth is observed for the SI range of 6170. We test the dierence in eectiveness of all other levels against the most eective range 6170, correcting for multiple comparisons using the Sidak correction.

2 There is no dierence in eectiveness for the ranges 5160 and 7180 (see Table   5 for the test statistics); the ranges (8190, 91100) are signicantly less eective than the range of 6170.

Consequently, there are no further gains to be achieved beyond the SI range of 5160. The socially optimal SI range, however, must account not only for the positive eects of NPIs, but also for the signicant economic and other health-related costs that result from restrictions (for example, see Miles, Stedman and Heald 2020; Susskind and Vines 2020; Altman 2020) . While this would require a full cost-benet analysis (Rowthorn and Maciejowski 2020; Appleby 2020) that is beyond the scope of this paper, we can derive the approximate upper bound of the socially optimal level of the SI with a single assumption about the cost prole of dierent SI levels. Namely, I only assume that the costs are monotonically increasing in the SI level. Consequently, without the need to quantify costs, I conclude that the upper bound of the socially optimal SI level, SI* is 5160, i.e., the minimum SI range that is not signicantly dierent from the maximum eect at 6170.

While the quantication of the exact costs is an important endeavour, it is fraught with many diculties, such as converting non-economic outcomes into monetary terms and would require many assumptions about highly uncertain possible eects, many in the distant future. By contrast, this upper-bound on SI* is extremely robust, albeit less informative. The above derivation of SI* is based on statistical signicance, however it is possible that lower SIs may be statistically signicantly dierent from SI*, but that the dierence 2 Note, if anything this underestimates the possible range of non-signicantly dierent SI ranges, as it ignores the uncertainty associated with whether the range 6170 is truly the most eective range. 10 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi. org/10.1101 org/10. /2021 in the eect size is practically of little importance. Indeed, the ratio of the eect sizes of an SI range of 3140 relative to the average of the 5180 range, is 91% [85%, 97%]. The relative eectiveness of the even laxer 2130 range is 72% [57%, 86%]. If the costs of NPIs increase quickly with NPI stringency, it is conceivable that moderately severe policy responses in the 3040 SI range (and possibly, even 2130 SI) may in fact be close to the socially optimal SI* (arising from a full cost-benet analysis), as it already achieves 91% eectiveness without accounting for costs.

Similarly, the maximum eectiveness on death growth rates is also observed for an SI range of 6170.

We test the dierence in eectiveness of all other levels against this range. There is no signicant increase in eectiveness for levels beyond 4150 (see Table 7 ). In terms of the eect size relative to the average of the 4190 range, an SI range of 3140 achieves 93% [87%, 98%] of the eectiveness of the former and the laxer 2130 range achieves 73% [58%, 88%].

Finally, while the SI index aggregates various individual NPI policies, examining the median values of the individual policies in the dataset for each SI range can be informativesee Table 1 . Note, that for the 3140 SI range, which achieves at least 90% of the maximum impact, the median policies do not include any level of restriction on transport and internal movement, and no stay-home restrictions. Furthermore, only recommendations for closing schools and workplaces and cancellation of public events were issued, i.e., these were not mandatory. The only stringent individual policies typically arising in the 3140 SI range were quarantining high-risk cases from international travel and restrictions on gatherings of 100-1000 people; however, these two policies are not reective of citizens' everyday behavior. Consequently, voluntaryrather than mandatorybehavioral changes are more important drivers of the impact of NPIs on case (and death growth). 3 This is consistent with other studies that also conclude that the attening of NPI eectiveness with increasing stringency reects a relatively stronger voluntary rather than mandatory component to behavioral changes (e.g., Allcott et al. 2020; Bjørnskov 2021; Bonardi et al. 2020; Bendavid et al. 2021 ).

Whence arises this voluntary behavioral change? As we showed above, part of it is from citizens' expectations of the risk of infection and severity as captured byγ 2 in equation 3, whose eect size, however, was found to be relatively small. The rest, and majority of the voluntary behavioral change, is likely due to the signalling value of policy decisions, i.e., citizens can also use information about the stringency of the government measures to infer the severity of the pandemic. This implies that recommendations by governments, for SI ranges of up to 40, were heeded by citizens, who signicantly changed their behavior in ways beyond those captured by mobility in eq. 3, e.g., preventative measures such as diligent hand washing and mask-wearing, eective self-isolation when infected et cetera.

Extensive public testing signicantly reduces case and death growth, contact tracing does not Figure 5 presents the estimated coecients and associated 95% condence intervalssee Table 2 for detailed regression results. All three levels of the testing regime are jointly signicantly dierent from zero (χ 2 (3) = 66.03, p < 0.0001), leading to progressively greater declines in case growth as testing becomes more extensive (robust to multiple comparison Sidak corrections). 4 Note, that the most extensive testing policy has an impact of -7.39 [-9.926,-4.854] , which is 51% [27%,76%] of that of the most impactful SI range (6170). 3 I infer that the restriction on gatherings of 100-1000 people is not the main driver of mandatory behavioral change for the following reason. The median level of restrictions on gatherings for the immediately less stringent range, 2130, is zero, i.e., no restrictions whatsoever. Yet the mean estimate of NPI impact for the 3140 range is -12.945 compared to -10.173 for the 2130 rangesome of this dierence will also be due to other policies that are stricter on average in the former compared to the latter. 4 That is, comparing the baseline of no testing to the rst level (χ 2 (1) = 12.42, p = 0.0013), the rst to the second level (χ 2 (1) = 38.55, p < 0001) and the second to the third level (χ 2 (1) = 7.83, p = 0.0153).

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint Contact tracing of any level does not have a signicant impact on conrmed case growth (χ 2 (2) = 1.93, p = 0.38) or death growth (χ 2 (2) = 1.31, p = 0.52). One should keep in mind that contact tracing may still be eective if the number of daily new cases is small, when ecient tracing is more manageable.

However, there remain important challenges to scaling contact tracing (Chowdhury et al. 2020; Quilty et al. 2021; Kretzschmar et al. 2020 ) that could hinder its eectiveness during signicant outbreaks.

The proportion of the population tested daily does not signicantly aect cases and deaths While both estimates are negative, as expected, increasing the proportion of daily tests does not signicantly reduce the case growth,β 5 = −0.18 [−0.403, −0.035, p = 0.1], nor the growth rate in deaths, -0.126 [−0.315, 0.063, p = 0.191] . Note, that an earlier analysis with datapoints till the end of January 2021 only, revealed a statistically signicant impact. The inclusion of dierent levels of the ordinal testing policy variable may partially capture this eect, as they will be correlated to some degree with the proportion tested, i.e., extensive testing as coded in the ordinal variable would likely imply more testing. Finally, this may be the result of implicitly assuming exogeneity, however testing may also be endogenously determined as governments are likely to step up testing during phases with higher transmission.

Vaccination reduces case and death growth rates Despite few datapoints where vaccination was already well underway, each 1% point increase in the cumulative vaccination % results in a reduction of the case growth of -0.017% points [−0.032, −0.002, p = 0.022] and in the death growth rate -0.0187% points [−0.035, −0.002, p = 0.026]. Note that the median (non-zero) cumulative vaccination % across countries is only 2.6% and the 10th and 90th percentiles are 0.07% and 19.8%, respectively. Consequently, these relatively low estimates should not be assumed to extrapolate for higher vaccination levels, especially since this is likely to be a nonlinear relationship in reality, i.e., the eect of vaccinations will increase at an accelerating rate as it approaches the herd immunity level. 12 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; Government policy is endogenous and exhibits hysteresis Government policy is strongly endogenous, in contrast to the common implicit assumption of exogeneity. The seven-day lagged conrmed growth rate coecientδ 1 = 0.048 [0.035, 0.061, p < 0.0001] is positively related to NPI severity. Furthermore, I nd signicant hysteresis in the de-escalation of NPIs, that is for the same case growth, NPIs are signicantly more stringent if the case growth has been falling recently, than the opposite:δ 2 = 0.531 [0.453, 0.609, p < 0.0001].

A four-equation structural model of multiple agents (SARS-CoV-2 virus, citizens and government) capturing the basic dynamics of their endogenous evolution revealed the following. Recall that the Stringency Index of NPIs ranges from 0 (no measures taken) to a maximum of 100. For conrmed case growth rates, there were no signicant gains to be had beyond an SI range of 5160; moreover, 91% of this eect size can be achieved with an SI range of 3140. For death growth rates, no signicant gains were to be achieved beyond an SI range of 4150 and 93% of this eect size can be accomplished with an SI range of 3140. An open testing policy has approximately half the benets of the optimal NPIs without incurring the societal costs associated with long-term restrictions. Furthermore, the nding that decreases in non-residential mobility, and therefore increases in its complement, time spent at residences, increase the growth rate of conrmed cases and deaths is aligned with contact tracing analyses of heightened transmission risk within a household, compared to the wider community (Sun et al. 2021 ) and earlier work concluding that shelter-in-place orders did not reduce Covid-19 infection and mortality rates (Berry et al. 2021 ).

Interpretation and implications for policy What implications does this have for policy? I report in Table 1 the median values of the individual NPI constituents of the SI index for these upper bounds on the socially optimal level of NPI stringency for both conrmed cases (5160) and deaths (4150), and the range of 3140, which is near-optimal. Note, that there is signicant heterogeneity across individual NPIs in terms of their severity. The most severe implemented restrictions are those on gatherings and international movement. At the other extreme, regarding transport and internal movements, no restrictions or recommendations were made in these three SI ranges. Stay-home restrictions and recommendations are also absent with the exception of a recommendation to stay home associated with the 5160 range.

Moderately severe restrictions are typically implemented for schools and workplaces, that is, the optimal upper bound for conrmed cases enforces partial closing (targeted, not across the board) whereas that for death rates and the near-optimal SI range (3140) call only for recommendations to work from home.

These ndings are generally aligned with studies nding that more severe restrictions were not signicantly more eective than less restrictive policies (Bendavid et al. 2021; Haug et al. 2020; Bjørnskov 2021; Bonardi et al. 2020) . However, they stand in contrast to others that concluded that strict lockdowns were eective (Flaxman et al. 2020; Islam et al. 2020) , whilst diering in some policy interventions such as school and workplace closures, but agreeing on others such as mass gathering restrictions (Islam et al. 2020) . Dierences with these studies may arise due to their more limited time-series and/or the lack of explicit behavioral modeling of citizens' reactions to the pandemic. The majority of the eect of NPIs on case and death growth rates could be attributed to voluntary behavioral changes rather than mandatory, government-imposed imperatives. Note also, that the 3140 range typically includes a coordinated public informational campaign aimed at inuencing voluntary behavior. This highlights the importance of modeling the behavioral incentives of both governments and citizens in conjunction with the pandemic dynamics.

13 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint Strengths and limitations of this study The simultaneous modeling of pandemic dynamics with behavioral models of citizens' behavioral adaptation to the pandemic, along with a model of governments' decision-making processes regarding policy implementation is the primary strength of this study with respect to earlier work. Identication of this more sophisticated model with behavioral components was made possible by the larger amount of accumulated data including testing and vaccination rates. Nonetheless, certain simplications were still necessary to ensure identication and to rein in the computational complexity of the estimation processes. These simplications included examining the eects of nations' stringency index compiled from individual NPIs, rather than examining each NPI separately. Similarly, an average measure of the change in mobility was used instead of disaggregated sub-measures of the type of mobility.

Furthermore, while random-eects allowed for variation across countries in unobservable variables, estimates of the variables of interest (NPIs and other interventions) were pooled across countries.

Unanswered questions and further research As more data becomes available over time, future research should be directed towards relaxing some of the acknowledged limitations of the current modeling.

For example, allowing for heterogeneity in the variable estimates across countries would be desirable rather than pooling estimates across countries, as would the use of sub-national data. The analysis of vaccinations should extended once data is available for higher levels of vaccination to properly estimate the likely nonlinear eect beyond the currently low levels reported in this dataset.

Finally, the conclusions reached herein must be placed within the context of and validated by other methodological approaches, such as SIR and agent-based models. However, I have presented signicant evidence that strict NPIs provide no further benets over less stringent ones, and that the latter function primarily as signals for signicant voluntary changes in behavior rather than mandatory changes.

14 All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted July 7, 2021. ; https://doi.org/10.1101/2021.07.06.21260077 doi: medRxiv preprint 

The Immediate Eect of COVID-19 Policies on Social-Distancing Behavior in the United States

Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19

What Explains Temporal and Geographic Variation in the Early US Coronavirus Pandemic? NBER Working Paper Series 27965

Smart thinking, lockdown and Covid-19

Tackling covid-19: are the costs worth the benets?

A Parsimonious Behavioral SEIR Model of the 2020 COVID Epidemic in the United States and the United Kingdom

Assessing mandatory stay-at-home and business closure eects on the spread of COVID-19

Evaluating the eects of shelter-in-place policies during the COVID-19 pandemic

Did Lockdown Work? An Economist's Cross-Country Comparison

Eectiveness of non-pharmaceutical interventions on COVID-19 transmission in 190 countries from 23

Fast and local: How lockdown policies aect the spread and severity of covid-19

Inferring the eectiveness of government interventions against COVID-19

Eects of non-pharmaceutical interventions on COVID-19: A Tale of Three Models. medRxiv

COVID-19 Contact Tracing: Challenges and Future Directions

An SIR model with behavior

Impact assessment of non-pharmaceutical interventions against coronavirus disease 2019 and inuenza in Hong Kong: an observational study

Eects of non-pharmaceutical interventions on COVID-19 cases, deaths, and demand for hospital services in the UK: a modelling study

Inferring change points in the spread of COVID-19 reveals the eectiveness of interventions

Internal and External Eects of Social Distancing in a Pandemic

Estimating the eects of non-pharmaceutical interventions on COVID-19 in Europe

The economic consequences of R = 1: Towards a workable behavioural epidemiological model of pandemics

Assessing the eectiveness of alternative measures to slow the spread of COVID-19 in the United States

Google COVID-19 Community Mobility Reports

Fear, lockdown, and diversion: Comparing drivers of pandemic economic decline 2020

COVID-19 Data Hub

A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker)

Ranking the eectiveness of worldwide COVID-19 government interventions

The eect of large-scale anti-contagion policies on the COVID-19 pandemic

Physical distancing interventions and incidence of coronavirus disease 2019: natural experiment in 149 countries

A contribution to the mathematical theory of epidemics

Impact of delays on eectiveness of contact tracing strategies for COVID-19: a modelling study

Spreading the disease: The role of culture

Lockdown-type measures look eective against covid-19

Should governments continue lockdown to slow the spread of covid-19?

Living with COVID 19: Balancing costs against benets in the face of the virus

The Eects of Automobile Safety Regulation

Quarantine and testing strategies in contact tracing for SARS-CoV-2: a modelling study

SARS-CoV-2 transmission, vaccination rate and the fate of resistant strains. medRxiv

The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak

A costbenet analysis of the COVID-19 disease

Sensitivity analysis of the eects of non-pharmaceutical interventions on COVID-19 in Europe. medRxiv

Transmission heterogeneities, kinetics, and controllability of SARS-CoV-2

The economics of the COVID-19 pandemic: an assessment

45*** -14.96*** 3140 -12.21***

Tpop i,t−28

Table 7: Dierence between the maximum eectiveness level of restrictions for deaths (6170) Relative to 6170

Other tables