key: cord-1029728-f96qs295
authors: Leech, G.; Rogers-Smith, C.; Sandbrink, J. B.; Snodin, B.; Zinkov, R.; Rader, B.; Brownstein, J. S.; Gal, Y.; Bhatt, S.; Sharma, M.; Mindermann, S.; Brauner, J. M.; Aitchison, L.
title: Mass mask-wearing notably reduces COVID-19 transmission
date: 2021-06-18
journal: nan
DOI: 10.1101/2021.06.16.21258817
sha: 86fbc94e042883e9abd782f3f96c3e736fc8c8bb
doc_id: 1029728
cord_uid: f96qs295

Mask-wearing has been a controversial measure to control the COVID-19 pandemic. While masks are known to substantially reduce disease transmission in healthcare settings (Howard et al 2021), studies in community settings report inconsistent results (Brainard et al 2020). Investigating the inconsistency within epidemiological studies, we find that a commonly used proxy, government mask mandates, does not correlate with large increases in mask-wearing in our window of analysis. We thus analyse the effect of mask-wearing on transmission instead, drawing on several datasets covering 92 regions on 6 continents, including the largest survey of individual-level wearing behaviour (n=20 million) (Kreuter et al 2020). Using a hierarchical Bayesian model, we estimate the effect of both mask-wearing and mask-mandates on transmission by linking wearing levels (or mandates) to reported cases in each region, adjusting for mobility and non-pharmaceutical interventions. We assess the robustness of our results in 123 experiments across 22 sensitivity analyses. Across these analyses, we find that an entire population wearing masks in public leads to a median reduction in the reproduction number R of 25.8%, with 95% of the medians between 22.2% and 30.9%. In our window of analysis, the median reduction in $R$ associated with the wearing level observed in each region was 20.4% [2.0%, 23.3%]. We do not find evidence that mandating mask-wearing reduces transmission. Our results suggest that mask-wearing is strongly affected by factors other than mandates. We establish the effectiveness of mass mask-wearing, and highlight that wearing data, not mandate data, are necessary to infer this effect.

Face masks are one of the most prominent interventions against COVID-19, with very high uptake in most countries [7] . However, as of June 2021, global mask-wearing has begun to decline, even in countries with low vaccination rates ( Figure 1) . Given that only a minority of the global population is projected to be vaccinated in 2021 [8] −and given novel variants of concern that are highly transmissible and escape acquired immunity [9] −establishing the effectiveness of mask-wearing in community settings is critical. We now review past work on the effectiveness of mask-wearing in different settings and at different scales.

In the context of healthcare, N95 masks work well when worn properly by trained users−reducing transmission of coronaviruses including SARS-CoV-2 by at least half [1, 2] . Cheng et al. (2021) [3] find that ideal surgical masking of a non-infected person corresponds to a 65-75% reduction in their risk of COVID-19.

Clinical studies in community settings are summarised in four meta-analyses covering SARS, COVID-19 and other respiratory infections [1, [4] [5] [6] . For fitted surgical masks, individual results from the meta-analyses range from a 7% increase in infection risk to a 61% decrease in infection risk. The meta-analytic mean decreases in infection risk vary from 4% to 15%, with large uncertainty.

One of the few RCTs on mask recommendations (not mandates) found a nonsignificant and low effect [11] .

Masks have at least two effects: preventing transmission to non-infected mask-wearers ('wearer protection'), and preventing infected wearers from infecting others ('source control'). With the

Sep'20 Jan'21 May'21 40% 60% 80% 100% % wearing median 50% CI Fig. 1 . Median mask-wearing across countries in the UMD / Facebook survey [7] in which the proportion of people vaccinated as of 5th June 2021 was less than 40% (from [10] ). Percentage is the proportion of people who reported that, over the last week, they wore masks most or all of the time in public spaces. exception of [3] , the studies listed above estimate individual wearer protection, rather than the most policy-relevant quantity: the ecological effect of mass mask-wearing including all relevant factors. These factors include source control with average mask quality [3] , the nonlinear scaling of group protection [3, 12] , and risk compensation [2] . Additionally, clinical studies may not reflect the actual distribution of protection; for instance, none of the studies detailed above include cloth masks, one of the most common types [13, 14] . Finally, while mask-wearing is known to be strongly mediated by cultural factors [15] [16] [17] , most studies are conducted in a specific social context and may have limited external validity.

In this study, we aim to infer the ecological effect of a large proportion of the population wearing average masks, with average fit, in the average non-residential venue, averaging across many cultures. We call this the mass mask-wearing effect. Our study is observational, and caution is required when making causal interpretations (see Robustness).

Many studies use the timing of mask mandates as a proxy for sharp changes in the level of mass mask-wearing. Studying 41 countries, [18] infer an inconclusive mandate effect on COVID-19 transmission centred around zero. [19] is a regional study of 7 European countries [19] which finds an overall 7% to 17% (95% CI) reduction in transmission associated with mandates. In a mixed study of mask recommendations, mask mandates, cultural norms favouring masks, and self-reported wearing data, Leffler et al. (2020) [16] find a 26% decrease in COVID-related mortality associated with their mixed proxy for mass mask-wearing. Other studies analyse a single country: Lyu and Wehby (2020) [20] use natural experiments between US states and find a 2% absolute decrease in case growth rate after three weeks. Mitze et al. (2020) [21] study mandates in several regions of Germany and find a relative reduction in cases of 47%. Van Dyke et al. (2020) [22] exploit natural experiments between Kansas counties mandating mask-wearing and find a qualitative difference in mandating counties. In their study of US states, Chernozhukov et al. (2021) [23] attribute a relative ∼10% reduction in case growth rate to mandates for public-facing employees. Also studying US states, [24] find no statistically significant change in cases following mandate implementations.

Society-level studies of non-pharmaceutical interventions (NPIs) often use the timing of mask mandates as a proxy for wearing uptake. If mandates do not correlate with large changes in maskwearing−for instance, due to voluntary wearing, noncompliance, or the correlation of mandate timing with (prior) support for mask-wearing−using mandate data in lieu of wearing data will lead to poor estimates of mask-wearing effectiveness.

Surprisingly, we find that national mandates may be a poor proxy for actual wearing. While Betsch et al. (2020) [15] find a ∼40% increase in wearing after local mandates in Germany, no other study finds a comparably large increase. In their study of US mandates, Rader et al. (2021) [25] did not find a statistically significant relationship between mandates and subsequent wearing. In their study of 4 US states, Adjodah et al. (2021) [26] find an average 23% post-mandate increase in wearing. Maloney (2020) [27] finds a 13% post-mandate increase, in US states (in the proportion 'frequently' or 'always' wearing masks). We confirm the weak correlation between mandates and subsequent wearing in 92 regions across 56 countries; see Results.

Instead of using mask mandates as a proxy for wearing, we use a large (n=19.97 million) global survey of mask-wearing [7] . (Our wearing covariate is also a proxy, because respondents self-report whether they have been wearing masks.) Two other studies take this approach: in their study of 24 countries, Aravindakshan et al. (2020) [28] use YouGov wearing data to infer an overall 3.9% to 10% relative decrease in case growth rate, for a 0-100% increase in wearing. Rader et al. (2021) [25] study US states using a novel SurveyMonkey wearing dataset to infer a ∼10% decrease in transmission between the lowest and highest empirical quartiles of wearing (a 50-75% increase in wearing).

Our analysis goes further than past work in the quality of wearing data−100 times the sample size, with random sampling and post-stratification−the geographical scope, the sophistication of our infection model, the incorporation of the uncertainty in epidemiological parameters, and the robustness of our results (123 sensitivity experiments). Mask mandates are typically encoded as binary indicators that signal whether mask-wearing was required in at least some shared spaces [16, 19, [29] [30] [31] . We draw mandate data from the OxCGRT NPI database [32] . We estimate the effect of two mandate covariates and display their combined effect throughout this manuscript. The first covariate represents whether masks were 'required in some shared spaces, outside the home with other people present, or some situations when social distancing [was] not possible' (field H6 from OxCGRT, level 2 [32] ). The second covariate has the same conditions, but masking is required in all shared spaces (field H6 from OxCGRT, level 3 or higher). Wearing estimates are from the University of Maryland / Facebook COVID-19 World Symptoms Survey [7, 33] and (for the US) the COVIDNearYou / SurveyMonkey dataset [25] . Our covariate 'percentage of region wearing masks' is the weighted percentage of people who said that, over the past 7 days, they wore masks in public most or all of time. The weights correct for non-response bias and for demographic imbalance [33] . Figure 2 shows the average wearing trend before and after the implementation of mandates. Most of the uptake in wearing occurs pre-mandate. In our window, the Spearman correlation coefficient between mask-wearing and mandates is = 0.32, with p-value < 0.001. This is of medium strength for a correlation between social factors [34] , but inadequate if mandate data is to serve as a reliable proxy of mask-wearing. This does not show that mandates do not cause mask-wearing, nor that there were high levels of noncompliance; it instead shows that voluntary uptake in wearing was more popular, and came earlier, than assumed in past work.

Our sources of wearing data begin after April 2020−that is, after the initial transition to maskwearing in some countries. Since it is possible that earlier mandates had persistent effects on wearing, we investigate the correlation during the first wave using an earlier YouGov wearing survey (see Appendix A). In regions with available data, most of the increase in mask-wearing occurred before the earliest national government mandates, with 64% average wearing on the day the mandate was enacted and 75% three weeks following the mandate. However, assessing the true correlation with the available data is difficult−see Discussion for details.

Using data from May to September 2020, we separately estimate the effects of mask mandates and mask-wearing in 92 regions (Table 5 ) with a state-of-the-art Bayesian hierarchical model ( Figure 5 ). The model links wearing levels (or mandates) to the number of reported cases in each region via the instantaneous reproduction number . Our model is similar to [29] , but in addition to adjusting for other NPIs, we also account for changes in mobility. We model many sources of uncertainty through prior distributions: epidemiological properties of the virus, differences in transmission between countries, the lag between an infection and the registration of a COVID-19 case, and the effect of unobserved influences on . To obtain wearing and mandate effect estimates, we run this model twice, changing only the feature used to represent masks; the priors and functional form are kept the same. Our model shares information across all countries to produce a statistically robust estimate, and thus measures the international mass wearing and mandate effects. Figure 3A shows the effects we infer for wearing and mandates in the form of percentage reductions in . We find that the difference between zero mask wearing and 100% self-reported mask-wearing (most or all of the time) corresponds to a 24.6% [6%, 43%] reduction in transmission. For mandates we see no reduction: 0.0% [−8.8%, 8.2%]. A more comparable measure is the probability of a positive reduction: for wearing this is 99%, while for mandates it is 46%. Together, these results suggest that mask-wearing is associated with a notable reduction in SARS-CoV-2 transmission, while analysis with mask mandate data yields no reduction. Figure 3B shows the distribution of mask-wearing effects across the regions we study, using the observed median wearing percentage in each region. In this window of analysis, we infer a median reduction in transmission of 20 All code and data used are available via Github: https://github.com/g-leech/masks_v_mandates.

Results that are sensitive to alternative plausible modelling assumptions offer only weak evidence and pose a risk of misinforming policy decisions. As such, we verify the robustness of our results by performing 123 experiments across 22 sensitivity analyses (Table 1 ). Figure 3C shows how the median effect of wearing or mandates changes as we vary epidemiological priors, delay distributions, covariate effect priors, the model structure, and the data. Each point in Figure 3C is the median effect of a different experimental condition. Our results are robust to these changes−95% of the median reductions fall between 22.7% and 31.3%.

However, as this study is observational rather than experimental, caution is necessary when making causal interpretations. Unobserved factors may influence , and if their timing coincides with the timing of mask-wearing and mandates, reductions in from unobserved factors may be wrongly attributed to mask-wearing or mandates [35] −our observed factors will be confounded. For instance, other protective behaviours may potentially confound our estimates [1, 15] . We investigate the susceptibility of our results to such confounding in four sensitivity analyses. In the first three ( Figures 12, 14 , 16), we assess how much estimates change when we exclude previously observed factors: we exclude each NPI in turn, all NPIs at the same time, and the mobility covariate. The small difference between our adjusted and unadjusted estimates suggests that, unless the confounding from unobserved factors greatly exceeds the confounding from our previously-observed factors (that is, NPIs and mobility), our results are unlikely to be meaningfully affected by confounding [36] . Lastly: over our window of analysis, mask-wearing increases while transmission decreases (in many regions). Our final analysis aims to assess whether this correlation is a spurious contributor to the substantial apparent wearing effect. We test this hypothesis by creating a fake wearing variable for each region. Each variable has the same start and end wearing value as the true wearing percentage and linearly interpolates between these values to capture the trend in wearing in that region. We infer a small and uncertain effect for the fake wearing variable 7.6% [-20.2%, 30.0%] (see Figure 15 ). This implies that the wearing effect we infer does not rely solely on the correlation between transmission and the overall wearing trend in this period.

We find that mask-wearing is associated with a notable reduction in SARS-CoV-2 transmission. Moreover, using data on mandates fails to infer any reduction in transmission. Our results suggest that national (and US state-level) mandate data are insufficient to model the effect of mass maskwearing. Figure 2 illustrates several ways mandates can fail to correlate with wearing: South Korea's mandate came after voluntary wearing had already plateaued at 94%; conversely, in the Netherlands and Switzerland, few people were wearing masks, even three weeks into the mandate period; finally, in the Czech Republic, wearing eventually increased, but only long after the mandate was implemented.

Against mandate data, not mandates In our window, national mandates correspond to an average 8.3% increase in the number of people who say that they are likely to wear masks most or all of the time in public spaces; however, this may underestimate the effect of mandates on wearing. This could be the case if mandates encourage people to wear masks in public all the time instead of most of the time, or if there is large sub-national heterogeneity in mandate timing and wearing uptake.

Inferring mandate effects is also difficult with currently available data. We model the effect of mandates as an instantaneous change in the reproduction number. This does not capture changes in wearing behaviour following the announcement of a mandate but before its enforcement [21] . Nor does it account for gradual change in behaviour after the implementation of a mandate.

The variation in results discussed in the Introduction is in part due to not controlling for mask properties and wearing behaviour. These include mask quality [37] ; mask fit [37] ; the venue of wearing (e.g. in shops, schools, or public transport) [37] ; mask reuse [38] ; risk compensation [39] ; and cultural norms [16, 37, 39] . More research into these factors is required to further reduce our uncertainty about mask-wearing effects. We estimate the effect of mass mask-wearing, averaging over mask properties and behaviour. Given that, in this window, most masks in use were the least effective types (cloth or otherwise unrated masks) [1, 13, 14, 38, 40] , the effectiveness of mass wearing is likely stronger than we estimate. Finally, we report the average international effect of mandates and do not rule out their effectiveness in particular contexts; for example, strong correlations between mandates and wearing were observed in Ireland ( Figure 2 ) and in Germany (the April 2020 local mask mandates [15, 21] ). Our results should be adjusted to local circumstances by public health experts.

Unobserved factors NPI leave-out, Figs. 12,13 Each observed NPI is excluded in turn, and then all NPIs are. Mobility leave-out, Fig. 14 Exclude mobility from our model.

Fake wearing covariate, Fig. 15 Use a synthetic covariate in place of wearing that captures the wearing trend in each region. Mobility and wearing only, Fig. 16 Exclude all NPIs from the model.

Starting : mean of hyperprior mean, Fig. 17 Mean of the prior over the mean of the distribution of country-specific basic reproduction numbers.

Starting : scale of hyperprior mean , Fig. 18 Scale of the prior over the mean of the distribution of country-specific basic reproduction.

Starting : scale of prior scale, Fig. 19 Scale of the prior over the noise on country-specific basic reproduction numbers. Random walk noise scale, Fig. 20 Scale of the prior over the size of the random walk step.

Generation interval prior mean, Fig. 21 Mean of the prior over the mean generation interval. Confirmation delay mean, Fig. 22 Mean of the distribution of case confirmation delays. Confirmation delay dispersion, Fig. 23 Dispersion of the distribution of case confirmation delays.

Covariate priors NPI prior, Fig. 24 Prior over the NPI effects (not including mask-wearing and mask mandates). Wearing effect prior scale, Fig. 25 Scale of the prior over the wearing effect.

Mandate effect prior scale, Fig. 26 Scale of the prior over the mandate effect.

Mobility effect prior mean, Fig. 27 Mean of the prior over the mobility effect parameter. Mobility effect prior scale, Fig. 28 Scale of the prior over the mobility effect parameter.

Wearing parameterisation, Fig. 29 The functional form of the mask-wearing effect on R.

Random walk period, Fig. 30 Number of days between random walk steps.

Sample 92 regions with replacement from our set of 92.

Mandate leave-on, Fig. 34 A persistent mandate effect that lasts beyond the point the mandate is lifted.

Mandate thresholds, Fig. 35 Remove the less stringent mask mandate feature. Window of analysis, Fig. 36 Shorter periods of analysis. Table 1 . Experiments in our sensitivity analysis . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint

Our results are based on the period from May to September 2020. While we find similar results for different (shorter) windows of analysis ( Figure 36 ), mass wearing effectiveness will likely differ with larger changes in circumstances. In particular, our period has features that may not characterise other settings: most regions began with NPIs already active (besides mandates); public behaviour had already changed following the formal and informal instructions of the first wave; summer months are thought to have lower transmission [41, 42] ; and a tiered regional approach to containment was not yet implemented in most regions. However, a short window implicitly holds many factors constant. This is useful for internal validity: when estimating a specific quantity such as the effects of mask-wearing, a short window reduces the scope for distribution shift and unobserved confounders.

Mask-wearing surveys are still a proxy for actual wearing behaviour, and social desirability bias in survey responses may inflate wearing estimates [43] . In a Kenyan study, the disparity between self-reported wearing and observed wearing was 77% [44] −though this survey was not anonymous, which may have lead to more over-reporting than anonymous surveys such as COVIDNearYou−SurveyMonkey. If data sources over-estimate mask-wearing, then our estimate for the effect of 100% of people wearing masks (most or all of the time) will actually correspond to the effect of less than 100% of people wearing masks. Consequently, we would expect the true effect of 100% of people mask-wearing to be larger than we estimate, in proportion to the amount of over-reporting. Further, the operational definition of 'mask-wearing' used in the UMD survey is not stringent: it can be applied both to a person who wears a cloth mask, only on public transport, slightly more than half of the time; and to a person who always wears an N95 respirator when outside their home [7] . This implies that there is scope for more and better mask-wearing, even in regions reporting extremely high levels of wearing in our data.

One concern for observational NPI studies is endogeneity: when cases are rising, people are more likely to voluntarily mask and governments are more likely to mandate wearing [45] . However, in our window, the correlation between new cases and mask-wearing percentage is low, = 0.05, which limits the scope of this concern.

Conclusion At a time where mask-wearing is decreasing and mask mandates are being lifted, we find that mask-wearing is associated with a notable reduction in transmission, and that factors other than mandates must have contributed to the worldwide uptake of mask-wearing in 2020. This presents a difficulty for policy-makers: if wearing works but mandates are not strongly associated with wearing, what other levers are available? Some options include free mask distribution, domestic supply guarantees, fit training, mask quality guidelines, targeted mandates by venue, and openness about the benefits of masks [2, 46] .

All data and code used can be downloaded via: https://github.com/g-leech/masks_v_mandates. The preprocessing is derived from [29] .

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint 

Our analysis is on the national (or US state) level, since this is the finest resolution available for all countries in the OxCGRT NPI dataset. Table 2 summarises the modelling set, and Figure 4 shows its component datasets. The beginning of our window of analysis is determined by our datasets: the UMD project begins reporting in late April 2020 [7] . We end on the 1st September 2020, at the beginning of the second wave, a period in which national NPIs fragment into regional responses, making national analyses less informative [19] .

Daily national estimates of mask-wearing are derived from the University of Maryland (UMD) / Facebook COVID-19 World Symptoms Survey [7] , which randomly samples from all active Facebook users, and which post-stratifies to correct for nonresponse bias and demographic imbalance [33] . The mean number of individual responses per region-day is 1131. UMD does not cover the US, so we supplement this dataset with the US data of [25] , which in our window represents n=558,670 responses.

Daily confirmed COVID-19 cases are drawn from the Johns Hopkins CSSE COVID-19 Data Repository, which collates official statistics from around the world [47] .

OxCGRT NPIs

Rader wearing UMD wearing Fig. 4 . The components of our modelling set. 'Google' [48] ; 'JHU' [47] ; 'OxCGRT' [32] ; 'UMD' [7] ; 'Rader' [25] .

We use the Google COVID-19 Community Mobility Reports to index mobility changes in each region [48] .

See Appendix A for full data details, including preprocessing steps and country selection .

We develop a hierarchical Bayesian model based on prior work [19, 29, 49] to infer the effectiveness of mask wearing and mask mandates on COVID-19 transmission. We use the number of reported cases in each country to infer the number of later-ascertained infections on each day. Given the . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint The this version posted June 18, 2021. ;  dynamics of daily, later-ascertained infections in each region over time, we infer the instantaneous reproduction number . Finally, the covariate effects are estimated by relating the to the observed level of each covariate. Figure 5 shows the model in schematic form. The Bayesian approach allows us to explicitly model sources of uncertainty, such as the values of epidemiological parameters, which are known with uncertainty. We proceed by outlining the inputs of our model. 

We use to denote the country/region in question, and to index time. = 0 corresponds to May 1st, 2020. NPIs are indexed by .

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint 

• Non-pharmaceutical interventions (NPIs): , , ∈ {0, 1}. , , = 1 if NPI is active at time in region ; otherwise, , , = 0.

• NPI reopenings: Across our regions, there are NPIs that were active at the start of our period. We treat these NPIs, in the relevant regions, as 'reopening' NPIs. If NPI is active in region at = 0 (i.e. we have ,0, = 1), we subtract 1 from the feature to form , , . Therefore, at the start of the window, , , = 0 and the effect of the NPI is absorbed into , .

When the NPI lifts, we would have , , = −1, reflecting that NPI lifting has the opposite effect to NPI closing, which is denoted as , , = 1. As such, we can more easily set a prior over ∅, (see below).

• Mask mandate: We have two mandate covariates: ma1, , and ma2, , . The first covariate, ma1, , , represents whether masks were 'required in some or all shared spaces, outside the home with other people present, or some situations when social distancing not possible' (field H6 from OxCGRT, level 2 [32] ). The second covariate, ma2, , , has the same conditions, but masking is required in all shared spaces (field H6 from OxCGRT, level 3 or higher). For each mandate type, ma, , = 1 if a mask mandate corresponding to the description above is active at time in region ; otherwise, ma, , = 0. ma1, , = 1 whenever ma2, , = 1, so the correct interpretation of the effect associated with ma2, , is the additional effect of mandating masks in all shared spaces, given that mask mandates were already required in some shared spaces.

• Mask wearing: The percentage of people in each region that self-report as likely to/always wear masks in public, , ∈ [0, 1].

• Mobility: Reduction in mobility relative to 2019 levels, , ∈ [−∞, 1], represented as a multiplicative factor. , = 1 represents a 100% decrease in mobility while , = 0 represents no change from 2019 level.

• Cases: New confirmed cases observed on day : , .

In the following sections, we introduce several variables without explicitly defining them. They are defined in the section on Prior Distributions below.

The instantaneous reproduction number , is the expected number of infections that would arise from each infection at time in region , all else equal. We model as a product of the several terms: (i) the regional starting reproduction number ∅, ; (ii) a product of our effect estimates for that region-day for each of the reopening NPIs , , mask-wearing , or mask mandates Ma , (mask-wearing is shown), and mobility ( , ) − ; (iii) a weekly latent random walk per region , .

Latent reproduction number. The latent, unobserved reproduction number in region at = 0, assuming no mask-wearing and no active mask mandates, is represented by ∅, .

NPIs. We assume that the introduction or lifting of an NPI leads to an instantaneous, multiplicative change in transmission. Each NPI contributes exp(− · , , ) to , . Note that this also works for reopening NPIs−if the NPI effect ( ) is positive, a reopening ( − , , = −1) increases :

Mask mandates. In the mandate model, , is replaced with Ma , = exp − ma1 ma1, , · exp − ma2 ma2, , .

Mask-wearing. , = exp − , . We use the exponential form in our base model because it is consistent with the form of the mandate effect on . However, we test the sensitivity of our results to two alternative mask-wearing parameterisations and find similar results (see Appendix C).

Mobility. We parameterise the Google mobility data as in [50] :

At 2019 levels of mobility ( = 0), the multiplicative factor ( ) = 1, leading to no effect on .

To set a principled prior for ∅, , we zero-center the mobility by subtracting the initial level (see the section on Prior Distributions):

The weekly random walk is computed as:

otherwise where ( ) = ⌊( − 14)/7⌋ and ∼ Normal(0, RW ). The random walk starts after 2 weeks to avoid unidentifiability between ∅, and the random walk terms at the beginning of the period. Following [51] , the resulting estimate is then transformed to daily growth using the generation interval distribution, which describes the time between success infection events in a transmission chain. , represents daily infections that are later ascertained, and we have , = −1, · −1, i.e., we multiply the infections on the previous day by the daily growth rate. Then, given an initial (latent) infection count, we have:

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint 

Infections at time are only observed as reported cases after a delay. Therefore, we convolve the later-ascertained cases with a delay vector to produce¯, , which is the expected number of reported cases on day in region .¯,

The forward-delay vector T (defined in Prior Distributions, below) defines the delay between the two quantities. Finally, the observed number of reported cases, , , follows a Negative Binomial distribution:

where Ψ is the case-reporting overdispersion parameter (see below).

We place prior and hyperprior 2 distributions over several parameters. Our Bayesian approach not only captures uncertainty in unknown parameters, but allows our beliefs about certain parameters to be adjusted if warranted by the data. We now detail the priors we use in this work.

• Region-specific ∅ : ∅, ∼ Normal( , ); .

• ∅ hyperpriors: The Epidemic Forecasting group [52] produces estimates for , using methodology from [53] . The empirical mean and variability of these estimates across our regions at the start of our period is = 1.07, = 0.32. We use these estimates to initialise our hyperpriors over the mean and variability of ∅, :

= TruncatedNormal( = 1.07, = 0.2, lower = 0.1), = HalfNormal( = 0.4). The median of under this prior is 0.32.

• NPI effect: ∼ AsymmetricLaplace( = 0, = 0.5, = 30), following [29] . is the location, is the asymmetry, and is the scale. This prior places 80% of its mass on positive NPI effects (i.e. on reductions of ).

• Wearing effect:

∼ Normal( = 0, = 0.4). Unlike the NPIs above, the prior for wearing has equal mass on positive and negative effects. This uninformative choice reflects past uncertainty about the efficacy of mask-wearing.

• Mandate effect: ma ∼ Normal( = 0, = 0.08). The wearing prior reflects our prior beliefs about the effect of going from 0-100% of people likely to wear masks. But in our window, the range of , averages only ∼ 20% across our regions. Accordingly, we choose a prior for the effect of mandates that has 1/5th of the prior predictive effect as the wearing prior. In our sensitivity analysis we modify the scale of this prior to match that of the wearing prior (among other values), and find very similar results.

• Mobility effect:

∼ Normal( = 1.704, = 0.44). Mobility prior values are derived from the 'overall average mobility' estimate in [50] . Note that each above is not a direct reduction in ; they are transformed into a reduction via a specific functional form (see above). We produce a forward-delay vector

i.e., a negative binomial distribution, truncated at 31 days and normalised. Note that the Negative Binomial parameter denotes the dispersion, not the variance, 2 = + 2 .

• Observation noise dispersion, chosen as in [29] Ψ ∼ HalfNormal( = 0, = 5).

We thank Swapnil Mishra for cloud infrastructure and moral support; we thank Tomáš Gavenčiak for help debugging and plotting. We thank Jan Kulveit for strategising, and Joshua Monrad for preternatural style advice.

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint 

We take daily confirmed case counts from the Johns Hopkins University Center for Systems Science and Engineering COVID-19 Global Cases dataset, which collates official statistics from hundreds of world regions.

Many countries fail to report case numbers over the weekend (or report weekly), which leads to spurious periodicity. In addition, severe reporting errors (day-to-day spikes of 1000% or troughs of less than 10% in countries with hundreds or thousands of daily cases) occur in 23 regions. We manually mask these errors (Table 3 ), preventing the model from learning from those days.

We take NPI data from the Oxford COVID-19 Government Response Tracker, which collects data at the national-level and US state-level [32] . From these we select the 'containment' policies, i.e. direct attempts to reduce transmission.

Importantly, OxCGRT cannot be used for national modelling without imputation. OxCGRT reports only one value per country-day, even if policies differ between regions. The dataset reports the maximum stringency of each NPI, whether or not this is implemented in all regions. This leads to the national stringency value being "hidden" behind the highest regional value, where any region has stronger measures. As a result, when a policy is strengthened in only part of a country, we impute the previous national value.

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint We process the NPI data as follows:

• We filter to rows with national coverage (that is, 'Flag' columns = 1).

• We threshold the ordinal values as in (4), creating a feature for the first mandatory level of each policy and additional features for higher levels of school closing, workplace closing and restrictions on gatherings. This yields 10 NPI features. • When a policy is strengthened in only part of a country, we impute the previous national value. is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint The this version posted June 18, 2021. ;

We use the University of Maryland Centre for Geospatial Information Science-Facebook Research survey as our main source of daily, self-reported wearing data [7, 33] . This is by far the largest-scale survey of COVID mask-wearing (with 19.97 million individual responses in our window, or 1,500 individual responses per region-day). The survey uses stratified random sampling of all active Facebook users to ensure demographic balance in each region, and also guarantees at most one response per month per Facebook user.

An alternative survey, the Imperial College London-YouGov COVID-19 Behaviour Tracker [58] is one hundred times smaller than UMD, uses nonrandom sampling, and has most days missing, and is as such less suitable for modelling.

The UMD dataset does not include US wearing data, while the respective CMU / Facebook US survey [59] does not begin reporting until after our window of analysis. We supplement UMD with data from Rader et al. [25] .

The Rader data are individual survey responses on a reverse Likert scale, weighted to correct for demographic imbalance in the sample. To convert this to the UMD scale, we take the mean of the grocery shopping and workplace features, threshold at ≤ 2 (likely or very likely) and take the percentage of rows in each state passing this threshold, and smooth over a 7 day window. This results in a percentage-wearing feature which is within 1% of the Facebook US data [59] for the period where the two datasets overlap.

We use the Google COVID-19 Community Mobility Reports to index mobility changes in each region [48] . We form a single feature by averaging the indoor public components (retail and recreation, grocery and pharmacy, transit, and workplaces). We parameterise mobility similarly to Unwin et al. [50] .

To validate our model estimates, and for the initialisation of 0 , we use country estimates from the Epidemic Forecasting group [52] . The estimates are calculated using a nonparametric approach from [53] . US state-level estimates are taken from https://rt.live/.

The OxCGRT dataset has 184 countries, or 235 counting US territories. 81 countries are missing from the UMD wearing data, and are thus dropped when joining to OxCGRT. We manually drop 32 countries with frequent extreme periodicity in case reporting, 16 countries that have fewer than 5000 cumulative cases in our window, 10 countries not contained in the Google Mobility dataset, and 4 countries that are missing more than 3 consecutive weeks of wearing data. Included countries are shown in Table 5 .

The YouGov survey [58] begins in Jan 2020 for some locations, which enables us to check the mandate-wearing relationship in the first wave, at the time of the earliest mandates. Figure 6 displays the estimates against mandate date (including some countries with multiple mandates). The average reported level of mask-wearing in Jan 2020 was 32.7%. This increased to an average of 64.2% before the first national mandate implementations in March and April. There was an average Table 5 . Regions included in the analysis, by continent post-mandate increase in wearing of 11%, similar to in our modelling set (an 8.3% post-mandate increase).

We follow past work in timing mandates with the beginning of the nominal legal enforcement of wearing. Our source of NPI data [32] also contains an indicator for whether a non-mandatory government recommendation to wear masks was in place. To see if this less stringent, but generally earlier, policy has stronger correlations with subsequent mask-wearing, we repeat the exploratory analysis from above. The correlation between wearing percentage and any form of recommendation or mandate is weaker than before, Spearman's = 0.235, < 0.001, compared to the mandate correlation of 0.32.

We use PyMC3's implementation of Hamiltonian Monte Carlo with the No-U-Turn sampler (NUTS) [60] . The following outputs result from running the default model with the wearing feature.

The Gelman-Rubin diagnosticˆtests for convergence of the sampler. Whenˆis close to 1 (i.e. < 1.01 [61] ), the MCMC sampling algorithm is commonly considered to have converged [62] . Figure 7 . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint (left) therefore suggests that our MCMC sampler has converged, and that our posterior may be used to draw valid inferences.

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint We used 1000 tuning samples and 500 posterior samples for each of 4 chains, giving 6000 samples in total. As shown in Figure 7 (right), the relative effective sample size exceeds 30% for the majority of parameters, indicating low autocorrelation. Posterior predictive distributions Figure 9 displays predicted cases during and 3 weeks beyond our window of analysis. All 92 country panels can be found on Github. Figure 10 shows the posterior correlations between the attributed R reductions for each modelled effect.

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint The this version posted June 18, 2021. We can use these correlations to diagnose excessively strong collinearity in our data; collinearity would manifest as strong posterior correlations [63] . However, almost all of the pairwise correlations are −0.2 < < 0.1, which indicates that collinearity is manageable in our dataset. Notable negative effect correlations exist between different levels of the same NPIs:

• Restrictions on gatherings < 100 people and Restrictions on gatherings < 1000 (-0.57);

• Restrictions on gatherings <10 and Restrictions on gatherings <100 (-0.23);

• School reopening (some schools) and School reopening (all schools) (-0.51); All other pairwise covariate correlations have an absolute value less than 0.2. Figure 11 displays inferred against covariate values for selected countries. All 92 country panels can be found on Github.

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint epifor Rt estimate Fig. 11 . Summary plots of selected region covariates and estimates, summer 2020. Top-left: instantaneous from our model. Bottom-left: instantaneous estimates from EpidemicForecasting [52] . Bottom-right: overall NPI stringency from OxCGRT [32] . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint 

Sensitivity analysis reveals the extent to which results depend on uncertain parameters and modelling choices, and can diagnose model misspecification and excessive collinearity [63] . We vary many of the components of our model and recompute the NPI effectiveness estimates. Overall, we perform 22 sensitivity analyses with 123 experimental conditions. Table 1 summarises our sensitivity analyses and their categories.

The effect sizes inferred for the other NPIs are smaller than in other work [19, 29, 49] because they measure a different effect: in this window, most regions begin with interventions active, and changes in NPI status are most often reopenings/lifting of bans. Such reopenings often result in an increase in transmission that is smaller in magnitude than the decrease in transmission from the initial policy implementation-for example, due to improved safety procedures [19] .

Our data do not capture all of the government NPIs that were implemented, and we only measure two forms of voluntary behaviour change: mask wearing and mobility. Unobserved factors may influence , and if their timing correlates with the timing of mask wearing or mandates, reductions in from unobserved factors may be wrongly attributed to mask-wearing or mandates [35] -our observed factors will be confounded. For instance, observational estimates like ours are potentially confounded by the correlation between mask-wearing and other protective behaviours [1, 15] . We investigate this phenomena by assessing how much effectiveness estimates change when previously observed factors are excluded, following Sharma et al. [18] . Figures 12 and 13 show NPI effectiveness estimates when each observed NPI is excluded in turn. Figure 14 shows the sensitivity of our effect estimates to excluding mobility from our model. Reducing mobility has a large effect on , so it is encouraging to see that our effects are robust to excluding mobility from our model.

One objection to our methodology is that mask-wearing increases over our window of analysis while transmission decreases in many regions. It is therefore possible that this correlation is a spurious contributor to the substantial apparent wearing effect. We test this hypothesis by creating a fake wearing variable for each region. Each variable has the same start and end wearing value as the true wearing percentage and linearly interpolates between these values to capture the trend in wearing in that region. We infer a small and uncertain effect for the fake wearing variable 7.6% [-20.2%, 30.0%] (see Figure 15 ). This implies that the wearing effect we infer does not rely solely on the wearing trend in this period. Figure 16 shows the sensitivity of our effect estimates to excluding all NPIs from our wearing model. Figure 17 shows the sensitivity of our effect estimates to¯, the mean of the prior over iñ ∅, ∼ ( , 2 ), where ∼ TruncatedNormal(0.1,¯, ). Recall that˜∅ , is the reproduction number at the start of the window of analysis, supposing mandates are not active and no one is wearing masks. Figure 18 shows the sensitivity of our effect estimates to the scale of the prior over in˜∅ , ∼ ( , 2 ), where ∼ TruncatedNormal(0.1,¯, ). Figure 19 shows the sensitivity of our effect estimates to , the scale of the prior over in˜∅ , ∼ ( , 2 ), where ∼ HalfNormal( ). Figure 20 shows the sensitivity of our effect estimates to the prior over the random walk noise scale.

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint Figure 21 shows the sensitivity of the effect estimates to the mean of the distribution of the generation interval. Figures 22 and 23 show the sensitivity of the effect estimates to the mean and dispersion of the distribution that represents the delay between infection and case reporting. Figure 24 shows the sensitivity of our effect estimates to the prior over the NPI effects. Figure 25 shows the sensitivity of our effect estimates to the scale of the prior over the wearing effect. Figure 26 shows the sensitivity of our effect estimates to the scale of the prior over the mandate effect. Figure 27 shows the sensitivity of our effect estimates to the mean of the prior over the mobility effect. Figure 28 shows the sensitivity of our effect estimates to the scale of the prior over the mobility effect. Figure 29 shows the sensitivity of our effect estimates to the parameterisation of the wearing effect. The wearing parameterisations are defined as follows:

• Exponential (base model):

exp , = exp − , . We use this form in our base model because it is consistent with the form of the mandate effect on .

• Linear: L , = ReLu 1 − , , where ReLU is the Rectified Linear Unit. The ReLU function preserves positive inputs and maps negative inputs to zero. We include the linear form because it is the simplest way to approximate wearing's effect on transmission.

• Quadratic:

The quadratic form is based on a simple model: suppose two people interact, and there is a fixed, independent probability that each of them wears a mask. Then the reduction in the probability of transmission due to mask-wearing is quadratic in the probability that each wears a mask. The two parameters correspond to source control and wearer-protection. Figure 30 shows the sensitivity of our effect estimates to the period of the random walk. For a period of N days the value of , may change without a change of covariates every N days. Figure 34 shows the sensitivity of our effect estimates to assuming a persistent mandate effect that lasts beyond the point the mandate is lifted. Figure 35 shows the sensitivity of our effect estimates when removing the less stringent mask mandate feature. Figure 36 shows the sensitivity of our effect estimates to shorter periods of analysis. We see little variation in our effect estimates, which implies that our results may generalise to other periods.

. CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint . CC-BY-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity.

is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint 

Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis

An evidence review of face masks against COVID-19

Face masks effectively limit the probability of SARS-CoV-2 transmission

Face masks prevent transmission of respiratory diseases: a meta-analysis of randomized controlled trials. medRxiv

Community use of face masks and similar barriers to prevent respiratory illness such as COVID-19: a rapid scoping review

Physical interventions to interrupt or reduce the spread of respiratory viruses

COVID-19 World Symptoms Survey

Yot Teerawattananon, and Mark Jit. Challenges in ensuring global access to COVID-19 vaccines: production, affordability, allocation, and deployment. The Lancet

Recent trends in SARS-CoV-2 variants of concern in england

Coronavirus (COVID-19) Vaccinations

Effectiveness of adding a mask recommendation to other public health measures to prevent sars-cov-2 infection in danish mask wearers: a randomized controlled trial

Universal masking is urgent in the COVID-19 pandemic: SEIR and agent based models, empirical validation, policy recommendations

Appropriate attitude promotes mask wearing in spite of a significant experience of varying discomfort

Face mask-induced itch: A self-questionnaire study of 2,315 responders during the COVID-19 pandemic

Social and behavioral consequences of mask policies during the COVID-19 pandemic

Association of country-wide coronavirus mortality with demographics, testing, lockdowns, and public wearing of masks. The American journal of tropical medicine and hygiene

Politicizing the mask: Political, economic and demographic factors affecting mask wearing behavior in the usa

How robust are the estimated effects of nonpharmaceutical interventions against COVID-19?

Understanding the effectiveness of government interventions in Europe's second wave of COVID-19

Community use of face masks and COVID-19: Evidence from a natural experiment of state mandates in the US

Face masks considerably reduce COVID-19 cases in germany

Trends in county-level COVID-19 incidence in counties with and without a mask mandate-Kansas

Causal impact of masks, policies, behavior on early COVID-19 pandemic in the US

Mask mandates can limit covid spread: Quantitative assessment of month-over-month effectiveness of governmental policies in reducing the number of new covid-19 cases in 37 us states and the district of columbia

Mask-wearing and control of SARS-CoV-2 transmission in the USA: a cross-sectional study

Association between COVID-19 outcomes and mask mandates, adherence, and attitudes. medRxiv

The effect of face mask mandates during the COVID-19 pandemic on the rate of mask use in the United States

Mask-wearing during the COVID-19 pandemic. medRxiv

Inferring the effectiveness of government interventions against COVID-19

Widespread use of face masks in public may slow the spread of SARS CoV-2: an ecological study. medRxiv

Scenario analysis of non-pharmaceutical interventions on global COVID-19 transmissions

A global panel database of pandemic policies (Oxford COVID-19 government response tracker)

Weights and methodology brief for the COVID-19 Symptom Survey by University of Maryland and Carnegie Mellon university

The meaningfulness of effect sizes in psychological research: Differences between sub-disciplines and the impact of potential biases

Causal inference using regression on the treatment variable in Data Analysis Using Regression and Multilevel/Hierarchical Models

Unobservable selection and coefficient stability: Theory and evidence

A modelling framework to assess the likely effectiveness of facemasks in combination with 'lock-down' in managing the COVID-19 pandemic

A cluster randomised trial of cloth masks compared with medical masks in healthcare workers

Risk compensation and face mask mandates during the COVID-19 pandemic

Community universal face mask use during the COVID 19 pandemic-from households to travellers and public spaces

Weather variability and COVID-19 transmission: A review of recent research

Seasonal variation in sars-cov-2 transmission in temperate climates. medRxiv

Social desirability bias. Wiley international encyclopedia of marketing

Selfreported mask wearing greatly exceeds directly observed use: Urgent need for policy intervention in kenya

The COVID-19 pandemic: government vs. community action across the United States

Normalizing community mask-wearing: A cluster randomized trial in Bangladesh

An interactive web-based dashboard to track COVID-19 in real time. The Lancet infectious diseases

Google COVID-19 community mobility reports

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

The effectiveness and perceived burden of nonpharmaceutical interventions against covid-19 transmission: a modelling study with 41 countries. medRxiv

Epidemic forecasting: Country Rt estimates

A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics

Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing

Essential epidemiological parameters of COVID-19 for clinical and mathematical modeling purposes: a rapid review and metaanalysis

The early phase of the COVID-19 outbreak in

The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application

Imperial College London Big Data Analytical Unit, and YouGov Plc

Probabilistic programming in Python using PyMC3

Rank-normalization, folding, and localization: An improvedˆfor assessing convergence of MCMC

General methods for monitoring convergence of iterative simulations

Collinearity: a review of methods to deal with it and a simulation study evaluating their performance