key: cord-0849130-fe456in0
authors: Brauner, Jan M.; Mindermann, Sören; Sharma, Mrinank; Johnston, David; Salvatier, John; Gavenčiak, Tomáš; Stephenson, Anna B.; Leech, Gavin; Altman, George; Mikulik, Vladimir; Norman, Alexander John; Monrad, Joshua Teperowski; Besiroglu, Tamay; Ge, Hong; Hartwick, Meghan A.; Teh, Yee Whye; Chindelevitch, Leonid; Gal, Yarin; Kulveit, Jan
title: Inferring the effectiveness of government interventions against COVID-19
date: 2020-12-15
journal: Science
DOI: 10.1126/science.abd9338
sha: 1544aebcab8459f785ece3aa8f59ca4ce495c441
doc_id: 849130
cord_uid: fe456in0

Governments are attempting to control the COVID-19 pandemic with nonpharmaceutical interventions (NPIs). However, the effectiveness of different NPIs at reducing transmission is poorly understood. We gathered chronological data on the implementation of NPIs for several European, and other, countries between January and the end of May 2020. We estimate the effectiveness of NPIs, ranging from limiting gathering sizes, business closures, and closure of educational institutions to stay-at-home orders. To do so, we used a Bayesian hierarchical model that links NPI implementation dates to national case and death counts and supported the results with extensive empirical validation. Closing all educational institutions, limiting gatherings to 10 people or less, and closing face-to-face businesses each reduced transmission considerably. The additional effect of stay-at-home orders was comparatively small.

Worldwide, governments have mobilized resources to fight the COVID-19 pandemic. A wide range of nonpharmaceutical interventions (NPIs) has been deployed, including stay-athome orders and the closure of all nonessential businesses. Recent analyses show that these large-scale NPIs were jointly effective at reducing the virus' effective reproduction number (1) , but it is still largely unknown how effective individual NPIs were. As more data become available, we can move beyond estimating the combined effect of a bundle of NPIs and begin to understand the effects of individual interventions. This can help governments efficiently control the epidemic, by focusing on the most effective NPIs to ease the burden put on the population.

A promising way to estimate NPI effectiveness is datadriven, cross-country modeling: inferring effectiveness by relating the NPIs implemented in different countries to the course of the epidemic in these countries. To disentangle the effects of individual NPIs, we need to leverage data from multiple countries with diverse sets of interventions in place. Previous data-driven studies (table S8) estimate effectiveness for individual countries (2) (3) (4) or NPIs, although some exceptions exist [ (1, (5) (6) (7) (8) ; summarized in table S7]. In contrast, we evaluated the impact of several NPIs on the epidemic's growth in 34 European and seven non-European countries. If all countries implemented the same set of NPIs on the same day, the individual effect of each NPI would be unidentifiable. However, the COVID-19 response was far less coordinated: countries implemented different sets of NPIs, at different times, in different orders (Fig. 1) .

Even with diverse data from many countries, estimating NPI effects remains a challenging task. First, models are based on uncertain epidemiological parameters; our NPI effectiveness study incorporates some of this uncertainty directly in the model. Second, the data are retrospective and observational, meaning that unobserved factors could confound the results. Third, NPI effectiveness estimates can be highly sensitive to arbitrary modeling decisions, as shown by two recent replication studies (9, 10) . Fourth, large-scale public NPI datasets suffer from frequent inconsistencies (11) and missing data (12) . Hence, the data and the model must be carefully validated if they are to be used to guide policy decisions. We have collected a large public dataset on NPI implementation dates that has been validated by independent double entry, and extensively validated our effectiveness estimates. This is a crucial, but often absent or incomplete, element of COVID-19 NPI effectiveness studies (10) .

Our results provide insight on the amount of COVID-19 transmission associated with various areas and activities of public life, such as gatherings of different sizes. Therefore, they may inform the packages of interventions that countries implement to control transmission in current and future waves of infections. However, we need to be careful when interpreting this study's results. We only analyzed the effect NPIs had between January and the end of May 2020, and NPI effectiveness may change over time as circumstances change.

Lifting an NPI does not imply that transmission will return to its original level and our window of analysis does not include relaxation of NPIs. These and other limitations are detailed in the Discussion section.

We analyzed the effects of seven commonly used NPIs between the 22nd of January and the 30th of May 2020. All NPIs aimed to reduce the number of contacts within the population (Table 1 ). If a country lifted an NPI before the 30th of May, the window of analysis for that country terminates on the day of the lifting (see Methods). To ensure high data quality, all NPI data were independently entered by two of the authors (independent double entry) using primary sources, and then manually compared with several public datasets. Data on confirmed COVID-19 cases and deaths were taken from the Johns Hopkins CSSE COVID-19 Dataset (13) . The data used in this study, including sources, are available online on GitHub (14) .

We estimated the effectiveness of NPIs with a Bayesian hierarchical model. We used case and death data from each country to infer the number of new infections at each point in time, which is itself used to infer the (instantaneous) reproduction number Rt over time. NPI effects were then estimated by relating the daily reproduction numbers to the active NPIs, across all days and countries. This relatively simple, data-driven approach allowed us to sidestep assumptions about contact patterns and intensity, infectiousness of different age groups, and so forth, that are typically required in modeling studies. It also allowed us to directly model many sources of uncertainty, such as uncertain epidemiological parameters, differences in NPI effectiveness between countries, unknown changes in testing and infection fatality rates, and the effect of unobserved influences on Rt. The code is available online on GitHub (14) .

Our model enabled us to estimate the individual effectiveness of each NPI, expressed as a percentage reduction in R t . We quantified uncertainty with Bayesian prediction intervals, which are wider than standard credible intervals. These reflect differences in NPI effectiveness across countries among several other sources of uncertainty. Bayesian prediction intervals are analogous to the standard deviation of the effectiveness across countries, rather than the standard error of the mean effectiveness. Under the default model settings, the percentage reduction in Rt (with 95% prediction interval; Fig.  2 ) associated with each NPI was: limiting gatherings to 1000 people or less: 23% (0 to 40%); to 100 people or less: 34% (12 to 52%); to 10 people or less: 42% (17 to 60%); closing some high-risk face-to-face businesses: 18% (−8 to 40%); closing most nonessential face-to-face businesses: 27% (−3 to 49%); closing both schools and universities in conjunction: 38% (16 to 54%); and issuing stay-at-home orders (additional effect on top of all other NPIs): 13% (−5 to 31%). Note that we were not able to robustly disentangle the individual effects of closing schools and closing universities since these NPIs were implemented on the same day or in close succession in most countries [except Iceland and Sweden, where only universities were closed (see also fig. S21 )]. We thus reported "schools and universities closed in conjunction" as one NPI.

Some NPIs frequently co-occurred, i.e., were partly collinear. However, we were able to isolate the effects of individual NPIs since the collinearity was imperfect and our dataset large. For every pair of NPIs, we observed one without the other for 504 country-days on average (table S5). The minimum number of country-days for any NPI pair is 148 (for limiting gatherings to 1000 or 100 attendees). Additionally, under excessive collinearity, and insufficient data to overcome it, individual effectiveness estimates would be highly sensitive to variations in the data and model parameters (15) . Indeed, high sensitivity prevented Flaxman et al. (1) , who had a smaller dataset, from disentangling NPI effects (9). In contrast, our effectiveness estimates are substantially less sensitive (see below). Finally, the posterior correlations between the effectiveness estimates are weak, further suggesting manageable collinearity ( fig. S22 ).

Although the correlations between the individual estimates were weak, we took them into account when evaluating combined NPI effectiveness. For example, if two NPIs frequently co-occurred, there may be more certainty about the combined effectiveness than about the effectiveness of each NPI individually. Figure 3 shows the combined effectiveness of the sets of NPIs that are most common in our data. In combination, the NPIs in this study reduced Rt by 77% (67 to 85%). Across countries, the mean R t without any NPIs (i.e., the R 0 ) was 3.3 (table S4) . Starting from this number, the estimated R t likely could have been brought below 1 by closing schools and universities, high-risk businesses, and limiting gathering (14) .

We performed a range of validation and sensitivity experiments (figs. S2 to S19). First, we analyzed how the model extrapolated to countries that did not contribute data for fitting the model, and found that it could generate calibrated forecasts for up to 2 months, with uncertainty increasing over time. Multiple sensitivity analyses showed how the results changed when we modified the priors over epidemiological parameters, excluded countries from the dataset, used only deaths or confirmed cases as observations, varied the data preprocessing, and more. Finally, we tested our key assumptions by showing results for several alternative models [structural sensitivity (10) ] and examined possible confounding of our estimates by unobserved factors influencing Rt. In total, we considered NPI effectiveness under 206 alternative experimental conditions (Fig. 4A) . Compared with the results obtained under our default settings (Figs. 2 and 3), median NPI effectiveness varied under alternative plausible experimental conditions. However, the trends in the results are robust, and some NPIs outperformed others under all tested conditions. While we tested large ranges of plausible values, our experiments did not include every possible source of uncertainty. We categorized NPI effects into small, moderate, and large, which we define as a posterior median reduction in Rt of less than 17.5%, between 17.5 and 35%, and more than 35% (vertical lines in Fig. 4 ). Four of the NPIs fell into the same category across a large fraction of experimental conditions: closing both schools and universities was associated with a large effect in 96% of experimental conditions, and limiting gatherings to 10 people or less had a large effect in 99% of conditions. Closing most nonessential businesses had a moderate effect in 98% of conditions. Issuing stay-at-home orders (i.e., in addition to the other NPIs) fell into the "small effect" category in 96% of experimental conditions. Three NPIs fell less clearly into one category: Limiting gatherings to 1000 people or less had a moderate-to-small effect (moderate in 81% of conditions) while limiting gatherings to 100 people or less had a moderate-to-large effect (moderate in 66% of conditions). Finally, closing some high-risk businesses, including bars, restaurants, and nightclubs had a moderate-to-small effect (moderate in 58% of conditions). Limiting gatherings to 1000 people or less was the NPI with the highest variation in median effectiveness across the experimental conditions ( Fig.  4A ), which may reflect the NPI's partial collinearity with limiting gatherings to 100 people or less.

Aggregating all sensitivity analyses can hide sensitivity to specific assumptions. We display the median NPI effects in four categories of sensitivity analyses (Fig. 4 , B to E), and each individual sensitivity analysis is shown in the supplementary materials. The trends in the results are also stable within these categories.

We used a data-driven approach to estimate the effects that seven nonpharmaceutical interventions had on COVID-19 transmission in 41 countries between January and the end of May 2020. We found that several NPIs were associated with a clear reduction in Rt, in line with mounting evidence that NPIs are effective at mitigating and suppressing outbreaks of COVID-19. Furthermore, our results indicate that some NPIs outperformed others. While the exact effectiveness estimates vary with modeling assumptions, the broad conclusions discussed below are largely robust across 206 experimental conditions in 11 sensitivity analyses.

Business closures and gathering bans both seem to have been effective at reducing COVID-19 transmission. Closing most nonessential face-to-face businesses was only somewhat more effective than targeted closures, which only affected businesses with high infection risk, such as bars, restaurants, and nightclubs (see also Table 1 ). Therefore, targeted business closures can be a promising policy option in some circumstances. Limiting gatherings to 10 people or less was more effective than limits of up to 100 or 1000 people and had a more robust effect estimate. Note that our estimates are derived from data between January and May 2020, a period when most gatherings were likely indoors due to weather.

Whenever countries in our dataset introduced stay-athome orders, they essentially always also implemented, or already had in place, all other NPIs in this study. We accounted for these other NPIs separately and isolated the effect of ordering the population to stay at home, in addition to the effect of all other NPIs. In accordance with other studies that took this approach (2, 6), we found that issuing a stay-athome order had a small effect when a country had already closed educational institutions, closed nonessential businesses, and banned gatherings. In contrast, Flaxman et al. (1) and Hsiang et al. (3) included the effect of several NPIs in the effectiveness of their stay-at-home order (or "lockdown") NPIs and accordingly found a large effect for this NPI. Our finding suggests that some countries may have been able to reduce Rt to below 1 without a stay-at-home order (Fig. 3) by issuing other NPIs.

We found a large effect for closing schools and universities in conjunction, which was remarkably robust across different model structures, variations in the data, and epidemiological assumptions (Fig. 4) . It remained robust when controlling for NPIs excluded from our study ( fig. S9 ). Our approach cannot distinguish direct effects on transmission in schools and universities from indirect effects, such as the general population behaving more cautiously after school closures signaled the gravity of the pandemic. Additionally, since school and university closures were implemented on the same day, or in close succession in most of the countries we study, our approach cannot distinguish their individual effects ( fig. S21 ). This limitation likely also holds for other observational studies that do not include data on university closures and estimate only the effect of school closures (1) (2) (3) (5) (6) (7) (8) . Furthermore, our study does not provide evidence on the effect of closing preschools and nurseries. Previous evidence on the role of pupils and students in transmission is mixed. Although infected young people (aged ca. 12 to 25) are often asymptomatic, they appear to shed similar amounts of virus as older people (17, 18) , and might therefore infect higher-risk individuals. Early data suggested that children and young adults had a notably lower observed incidence rate than older adults-whether this was due to school and university closures remains unknown (19-22). In contrast, the recent resurgence of cases in European countries has been concentrated in the age group corresponding to secondary school and higher education (especially the latter), and is now spreading to older age groups as well as primary-school-aged children (23, 24) . Primary schools may be generally less affected than secondary schools (20, (25) (26) (27) (28) , perhaps partly because children under the age of 12 are less susceptible to SARS-CoV-2 (29) .

Our study has several limitations. First, NPI effectiveness may depend on the context of implementation, such as the presence of other NPIs, country demographics, and specific implementation details. Our results thus need to be interpreted as the effectiveness in the contexts in which the NPI was implemented in our data (10) . For example, in a country with a comparatively old population, the effectiveness of closing schools and universities would likely have been on the lower end of our prediction interval. Expert judgement should thus be used to adjust our estimates to local circumstances. Second, Rt may have been reduced by unobserved NPIs or voluntary behavior changes such as mask-wearing. To investigate whether the effect of these potential confounders could be falsely attributed to the observed NPIs, we performed several additional analyses and found that our results are stable to a range of unobserved factors ( fig. S9 ). However, this sensitivity check cannot provide certainty and investigating the role of unobserved factors is an important topic to explore further. Third, our results cannot be used without qualification to predict the effect of lifting NPIs. For example, closing schools and universities in conjunction seems to have greatly reduced transmission, but this does not mean that reopening them will necessarily cause infections to soar. Educational institutions can implement safety measures such as reduced class sizes as they reopen. However, the nearly 40,000 confirmed cases associated with universities in the UK since they reopened in September 2020 show that educational institutions may still play a large role in transmission, despite safety measures (30). Fourth, we do not have data on some promising interventions, such as testing, tracing, and case isolation. These interventions could become an important part of a cost-effective epidemic response (31), but we did not include them because it is difficult to obtain comprehensive data on their implementation. In addition, although the data are more readily available, it is difficult to estimate the effect of mask-wearing in public spaces because there was limited public life as a result of other NPIs. We discuss further limitations in the supplementary text, section E.

Although our work focused on estimating the impact of NPIs on the reproduction number Rt, the ultimate goal of governments may be to reduce the incidence, prevalence, and excess mortality of COVID-19. For this, controlling R t is essential, but the contribution of NPIs toward these goals may also be mediated by other factors, such as their duration and timing (32), periodicity and adherence (33, 34) , and successful containment (35) . While each of these factors addresses transmission within individual countries, it can be crucial to additionally synchronize NPIs between countries, since cases can be imported (36) .

Many governments around the world seek to keep Rt below 1 while minimizing the social and economic costs of their interventions. Our work offers insights into which areas of public life are most in need of virus containment measures so that activities can continue as the pandemic develops; however, our estimates should not be taken as the final word on NPI effectiveness.

We analyzed the effects of NPIs (Table 1) in 41 countries (37) (see Fig. 1 ). We recorded NPI implementations when the measures were implemented nationally or in most regions of a country (affecting at least three-fourths of the population). We only recorded mandatory restrictions, not recommendations. Supplementary text section G details how edge cases in the data collection were handled. For each country, the window of analysis starts on the 22nd of January and ends after the first lifting on an NPI, or on the 30th of May 2020, whichever was earlier. The reason to end the analysis after the first major reopening (38) was to avoid a distribution shift. For example, when schools reopened, it was often with safety measures, such as smaller class sizes and distancing rules. It is therefore expected that contact patterns in schools will have been different before school closure compared to after reopening. Modeling this difference explicitly is left for future work. Data on confirmed COVID-19 cases and deaths were taken from the Johns Hopkins CSSE COVID19 Dataset (13) . The data used in this study, including sources, are available online on GitHub (14) .

We collected data on the start and end date of NPI implementations, from the start of the pandemic until the 30th of May 2020. Before collecting the data, we experimented with several public NPI datasets, finding that they were not complete enough for our modeling and contained incorrect dates (39). By focusing on a smaller set of countries and NPIs than these datasets, we were able to enforce strong quality controls: We used independent double entry and manually compared our data to public datasets for cross-checking. First, two authors independently researched each country and entered the NPI data into separate spreadsheets. The researchers manually researched the dates using internet searches: there was no automatic component in the data gathering process. The average time spent researching each country per researcher was 1.5 hours.

Second, the researchers independently compared their entries to the following public datasets and, if there were conflicts, visited all primary sources to resolve the conflict: the EFGNPI database (40) and the Oxford COVID-19 Government Response Tracker (41).

Third, each country and NPI was again independently entered by one to three paid contractors, who were provided with a detailed description of the NPIs and asked to include primary sources with their data. A researcher then resolved any conflicts between this data and one (but not both) of the spreadsheets.

Finally, the two independent spreadsheets were combined and all conflicts resolved by a researcher. The final dataset contains primary sources (government websites and/or media articles) for each entry.

When the case count is small, a large fraction of cases may be imported from other countries and the testing regime may change rapidly. To prevent this from biasing our model, we neglected case numbers before a country has reached 100 confirmed cases and death numbers before a country has reached 10 deaths. We included these thresholds in our sensitivity analysis ( fig. S13 ).

In this section, we give a short summary of the model (Fig.  5 ). The detailed model description is given in the supplementary text section A. In short, our model uses case and death data from each country to "backward" infer the number of new infections at each point in time, which is itself used to infer the reproduction numbers. NPI effects are then estimated by relating the daily reproduction numbers to the active NPIs, across all days and countries. This relatively simple, data-driven approach allowed us to sidestep assumptions about contact patterns and intensity, infectiousness of different age groups, and so forth that are typically required in modeling studies. Code is available online on GitHub (14) .

Our model builds on the semi-mechanistic Bayesian hierarchical model of Flaxman et al. (1) , with several additions. First, we allow our model to observe both case and death data. This increases the amount of data from which we can extract NPI effects, reduces distinct biases in case and death reporting, and reduces the bias from including only countries with many deaths. Second, since epidemiological parameters are only known with uncertainty, we place priors over them, following recent recommended practice (42). Third, as we do not aim to infer the total number of COVID-19 infections, we can avoid assuming a specific infection fatality rate (IFR) or ascertainment rate (rate of testing). Fourth, we allow the effects of all NPIs to vary across countries, reflecting differences in NPI implementation and adherence.

We now describe the model by going through Fig. 5 from bottom to top. The growth of the epidemic is determined by the time-and country-specific reproduction number Rt,c, which depends on (i) the (unobserved) basic reproduction number in country c, R 0,c , and (ii) the active NPIs at time t. R 0,c accounts for all time-invariant factors that affect transmission in country c, such as differences in demographics, population density, culture, and health systems (43).

Following Flaxman et al. and others (1, 6, 8) , each NPI is assumed to independently affect R t,c as a multiplicative factor ( ) ∏ where x i,t,c = 1 indicates that NPI i is active in country c on day t (x i,t,c = 0 otherwise), I is the number of NPIs, and α i,c is the "effect parameter" for NPI i in country c. The multiplicative effect encodes the plausible assumption that NPIs have a smaller absolute effect when R t,c is already low. We assume that the effect of each NPI on R t,c is stable across time but can vary across countries to some degree. Concretely, the effect parameter of intervention i in country c is defined as α i,c = α i + z i,c , where α i represents the mean effect parameter, and sponds to the degree of cross-country variation in the effectiveness of NPI i and is inferred from the data. This partial pooling of NPI effect parameters minimizes bias from country-specific sources while also reflecting that NPI effectiveness is likely different across countries. We define the "effectiveness" of NPI i as the percentage reduction in R t associated with NPI i across countries. This effectiveness, displayed in Figs place an asymmetric Laplace prior on α i that allows for both positive and negative effects but places 80% of its probability mass on positive effects, reflecting that NPIs are more likely to reduce R t,c than to increase it. In the early phase of an epidemic, the number of new daily infections grows exponentially. During exponential growth, there is a one-to-one correspondence between the daily growth rate and R t,c (44). The correspondence depends on the generation interval (the time between successive infections in a chain of transmission), which we assume to have a gamma distribution. The prior on the mean generation interval has a mean of 5.06 days, derived from a meta-analysis (45).

We model the daily new infection count separately for confirmed cases and deaths, representing those infections which are subsequently reported and those which are subsequently fatal. However, both infection numbers are assumed to grow at the same daily rate in expectation, allowing the use of both data sources to estimate each αi. The infection numbers translate into reported confirmed cases and deaths after a delay. The delay is the sum of two independent distributions, assumed to be equal across countries: the incubation period and the delay from onset of symptoms to confirmation. We put priors over the means of both distributions, resulting in a prior over the mean infection-to-confirmation delay with a mean of 10.92 days (45), see supplementary text section A.3. Similarly, the infection-to-death delay is the sum of the incubation period and the delay from onset of symptoms to death, and the prior over its mean has a mean of 21.8 days (45). Finally, as in related models (1, 6) , both the reported cases and deaths follow a negative binomial output distribution with separate inferred dispersion parameters for cases and deaths.

Using a Markov chain Monte Carlo (MCMC) sampling algorithm (46) , this model infers posterior distributions of each NPI's effectiveness while accounting for cross-country variations in effectiveness, reporting, and fatality rates as well as uncertainty in the generation interval and delay distributions. To analyze the extent to which modeling assumptions affect the results, our sensitivity analysis included all epidemiological parameters, prior distributions, and many of the structural assumptions introduced above. MCMC convergence statistics are shown in fig. S19 . doi:10.1126/science.abc5096 Medline 37. The countries were selected for the availability of reliable NPI data at the time when we started data collection and modelling (April 2020); and for their presence in at least one of the public datasets that we used to cross-validate our collected data. We excluded countries with fewer than 100 cases (or 10 deaths) by March 31, as our model neglects new cases and deaths below these thresholds. We also excluded a small number of countries if there were credible media reports casting doubt on the trustworthiness of their reporting of cases and deaths. Finally, we excluded very large countries like China, the United States, and Canada, for ease of data collection, as these would require more locally fine-grained data. Of the 41 included countries, 33 are in Europe. As a result, the NPI effectiveness estimates may be biased toward effects in Europe, and NPI effectiveness may have been different in other parts of the world. 38. Concretely, the window of analysis extended until 2 days after the first reopening for confirmed cases, and 10 days after the first reopening for deaths. These durations correspond to the 5% quantiles of the infection-to-case-confirmation and infection-to-death distributions, ensuring that less than 5% of the new infections on the reopening day or later were observed in the window of analysis. 39. We evaluated the following datasets: the Oxford COVID-19 Government Response Tracker (OxCGRT), the Epidemic Forecasting Global NPI Database, and the ACAPS #COVID19 Government Measures Dataset. Note that these datasets are under continuous development. Many of the mistakes found will already have been corrected. We know from our own experience that data collection can be very challenging. We have the fullest respect for the people behind these datasets. In this paper, we focus on a more limited set of countries and NPIs than these datasets contain, allowing us to ensure higher data quality in this subset. Given our experience with public datasets and our data collection, we encourage fellow COVID-19 researchers to independently verify the quality of public data they use, if feasible. 

Gatherings limited to 1000 people or less A country has set a size limit on gatherings. The limit is at most 1000 people (often less), and gatherings above the maximum size are disallowed. For example, a ban on gatherings of 500 people or more would be classified as "gatherings limited to 1000 or less," but a ban on gatherings of 2000 people or more would not. Gatherings limited to 100 people or less A country has set a size limit on gatherings. The limit is at most 100 people (often less). Gatherings limited to 10 people or less A country has set a size limit on gatherings. The limit is at most 10 people (often less). Some businesses closed A country has specified a few kinds of face-to-face businesses that are considered "high risk" and need to suspend operations (blacklist). Common examples are restaurants, bars, nightclubs, cinemas, and gyms. By default, businesses are not suspended. Most nonessential businesses closed A country has suspended the operations of many face-to-face businesses. By default, face-to-face businesses are suspended unless they are designated as essential (whitelist).

A country has closed most or all schools. Universities closed A country has closed most or all universities and higher education facilities. Stay-at-home order An order for the general public to stay at home has been issued. This is mandatory, not just a recommendation. Exemptions are usually granted for certain purposes (such as shopping, exercise, or going to work) or, more rarely, for certain times of the day. Whenever countries in our dataset introduced stay-at-home orders, they essentially always also implemented, or already had in place, all other NPIs in this table. All these are encoded as distinct NPIs in the data. In our results, we thus estimate the additional effect of a stay-at-home order on top of all other NPIs. and 95% prediction intervals shown. Prediction intervals reflect many sources of uncertainty, including NPI effectiveness varying by country and uncertainty in epidemiological parameters. A negative 1% reduction refers to a 1% increase in R t . "Schools and universities closed" shows the joint effect of closing both schools and universities in conjunction; the individual effect of closing just one will be smaller (see text). Cumulative effects are shown for hierarchical NPIs (gathering bans and business closures) i.e., the result for "Most nonessential businesses closed" shows the cumulative effect of two NPIs with separate parameters and symbols-closing some (high-risk) businesses, and additionally closing most remaining (non-high-risk, but nonessential) businesses given that some businesses are already closed. The mean effect parameter of NPI i is α i , and the country-specific effect parameter is α i,c . On each day t, a country's daily reproduction number R t,c depends on the country's basic reproduction number R 0,c and the active NPIs. The active NPIs are encoded by x i,t,c , which is 1 if NPI i is active in country c at time t, and 0 otherwise. R t,c is transformed into the daily growth rate g t,c using the generation interval parameters, and subsequently is used to compute the new infections (C)

Imperial College COVID-19 Response Team, Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions

The effect of large-scale anti-contagion policies on the COVID-19 pandemic

Effect of non-pharmaceutical interventions to contain COVID-19 in China

The impact of non-pharmaceutical interventions on SARS-CoV-2 transmission across 130 countries and territories

Impact of nonpharmaceutical interventions on documented cases of COVID-19

Physical distancing interventions and incidence of coronavirus disease 2019: Natural experiment in 149 countries

Scenario analysis of non-pharmaceutical interventions on global COVID-19 transmissions

On the sensitivity of non-pharmaceutical intervention models for SARS-CoV-2 spread estimation

How robust are the estimated effects of nonpharmaceutical interventions against COVID-19?

COVID-19 government response event dataset (CoronaNet v. 1.0)

Oxford COVID-19 Government Response Tracker (OxCGRT) (2020)

An interactive web-based dashboard to track COVID-19 in real time

epidemics/COVIDNPIs: Inferring the effectiveness of government interventions against COVID-19

Multicollinearity and model misspecification

An analysis of SARS-CoV-2 viral load by patient age

Culture-competent SARS-CoV-2 in nasopharynx of symptomatic neonates, children, and adolescents

COVID-19 National Emergency Response Center, Epidemiology and Case Management Team, Contact tracing during coronavirus disease outbreak

Coronavirus infections in children including COVID-19: An overview of the epidemiology, clinical features, diagnosis, treatment and prevention options in children

Age different analysis of COVID-19 second wave in Europe reveals highest incidence amongst young adults

School openings across globe suggest ways to keep coronavirus at bay, despite outbreaks

Cluster of COVID-19 in northern France: A retrospective closed cohort study

A large COVID-19 outbreak in a high school 10 days after schools' reopening, Israel

Transmission heterogeneities, kinetics, and controllability of SARS-CoV-2

Modelling the health and economic impacts of population-wide testing, contact tracing and isolation (PTTI) strategies for COVID-19 in the UK

Centre for the Mathematical Modelling of Infectious Diseases

The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: A modelling study

The impact of COVID-19 and strategies for mitigation and suppression in low-and middle-income countries

Centre for the Mathematical Modelling of Infectious Diseases COVID-19 working group, Effects of non-pharmaceutical interventions on COVID-19 cases, deaths, and demand for hospital services in the UK: A modelling study

Reconstruction of the full transmission dynamics of COVID-19 in Wuhan

Assessing the impact of coordinated COVID-19 exit strategies across Europe

modeling purposes: a rapid review and meta-analysis. medRxiv 2020.06.17

The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo

The reproductive number of COVID-19 is higher compared to SARS coronavirus

Code for modelling estimated deaths and cases for COVID-19 from report 13 published by MRC Centre for Global Infectious Disease Analysis

Serial interval of SARS-CoV-2 was shortened over time by nonpharmaceutical interventions

Estimating individual and household reproduction numbers in an emerging epidemic

Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing

The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application

The early phase of the COVID-19 outbreak in

Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data

Model checking and improvement

Understanding predictive information criteria for Bayesian models

Collinearity: A review of methods to deal with it and a simulation study evaluating their performance

Reconstructing the early global dynamics of under-ascertained COVID-19 cases and infections

A new framework and software to estimate time-varying reproduction numbers during epidemics

Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome

Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models

Causal inference using regression on the treatment variable

Gaussian processes in machine learning

Epidemiological data from the COVID-19 outbreak, realtime case information

Worldwide effectiveness of various non-pharmaceutical intervention control strategies on the global COVID-19 pandemic: A linearised control model. medRxiv 2020.04.30

Neural network aided quarantine control model estimation of global COVID-19 spread

Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures

CMMID COVID-19 working group, Quantifying the impact of physical distance measures on the transmission of COVID-19 in the UK

Flasche; Centre for Mathematical Modelling of Infectious Diseases COVID-19 working group, Early dynamics of transmission and control of COVID-19: A mathematical modelling study

A spatiotemporal epidemic model to quantify the effects of contact tracing, testing, and containment

Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China

How effective has been the Spanish lockdown to battle COVID-19? A spatial analysis of the coronavirus propagation across provinces

CMMID COVID-19 working group

The effect of inter-city travel restrictions on geographical spread of COVID-19: Evidence from Wuhan, China. medRxiv 2020.04

Villas-Boas, V. Villas-Boas, Are we #StayingHome to flatten the curve?

We thank T. Groemer, G. Krönke, and M. Herrmann for advice and mentorship. Funding: J.M.B. was supported by the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems (EP/S024050/1) and by Cancer Research UK. S.M.'s funding for graduate studies was from Oxford University and DeepMind. M.S. was supported by the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems (EP/S024050/1). G.L. was supported by the UKRI Centre for Doctoral Training in Interactive Artificial Intelligence (EP/S022937/1). V.M. contributed in his personal time while employed at DeepMind. L.C. acknowledges funding from the MRC Centre for Global Infectious Disease Analysis (reference MR/R015600/1), jointly funded by the UK Medical Research Council (MRC) and the UK Foreign, Commonwealth & Development Office (FCDO), under the MRC/FCDO Concordat agreement and is also part of the EDCTP2 program supported by the European Union; and acknowledges funding by Community Jameel. Y.W.T. is also a principal research scientist at DeepMind. The paid contractor work helping with the data collection, the development of the interactive website, and the costs for cloud compute were funded by the Berkeley Existential Risk Initiative

Competing interests: No conflicts of interests. L.C. has acted as a paid consultant to Pfizer and the Foundation for Innovative New Diagnostics, outside of the submitted work. Y.G. has received a research grant (studentship) from GlaxoSmithKline, outside of the submitted work. J.K. has advised several governmental and nongovernmental entities about interventions against COVID-19. Data and materials availability: All data and code are available in the paper or publicly online at (14). This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

, t c N with the relevant delay distributions. Our model uses both case and death data: it splits all nodes above the daily growth rate g t,c into separate branches for deaths and confirmed cases. We account for uncertainty in the generation interval, infection to case confirmation delay, and the infection to death delay by placing priors over the parameters of these distributions.