key: cord-0472063-kzck6xss
authors: Sung, Chih-Li
title: Estimating functional parameters for understanding the impact of weather and government interventions on COVID-19 outbreak
date: 2021-01-13
journal: nan
DOI: nan
sha: baf530a935dabb69203e8601f2fd754677aee32f
doc_id: 472063
cord_uid: kzck6xss

As the coronavirus disease 2019 (COVID-19) has shown profound effects on public health and the economy worldwide, it becomes crucial to assess the impact on the virus transmission and develop effective strategies to address the challenge. A new statistical model derived from the SIR epidemic model with functional parameters is proposed to understand the impact of weather and government interventions on the virus spread in the presence of asymptomatic infections among eight metropolitan areas in the United States. The model uses Bayesian inference with Gaussian process priors to study the functional parameters nonparametrically, and sensitivity analysis is adopted to investigate the main and interaction effects of these factors. This analysis reveals several important results including the potential interaction effects between weather and government interventions, which shed new light on the effective strategies for policymakers to mitigate the COVID-19 outbreak.

1. Introduction. As the coronavirus disease 2019 (COVID-19) has already had profound effects on public health and the economy worldwide, how to use statistical approaches to model and understand the spread of COVID-19 to inform and educate the public about the virus transmission and develop effective strategies for addressing this challenge has become crucial. In particular, the understanding of how government interventions and environmental factors, such as temperature and humidity, affect the virus transmissibility is important yet unclear. Moreover, an effective strategy to mitigate the outbreak based on the weather conditions is in extreme need for policymakers yet little attention has been paid to the interaction effect between weather and government interventions. For instance, a natural question for policymakers is "Should the government implement more restrictions to mitigate the pandemic as the weather gets colder?"

Since the COVID-19 outbreak, many studies have investigated the impact of weather and government interventions, but some challenges remain. Yu (2020) ; Xu et al. (2020) ; Carson et al. (2020) find some evidence that the weather may be associated with the COVID-19 spread, while Jamil et al. (2020) ; Gupta, Pradhan and Maulud (2020) find no significant associations. Most of the studies on the impacts of government interventions on COVID-19 spread show that government interventions are associated with reduced COVID-19 transmission; e.g., Cowling et al. (2020) ; Haug et al. (2020) ; Haldar and Sethi (2020) ; Flaxman et al. (2020) . However, most of this work focuses on individual effects of weather and government interventions, which may lead to misleading results due to potential collinearity issues (Wilson, 2020) . For example, cold weather may increase the risk of disease overall, leading governments to impose travel restrictions. Further, interaction effects between weather and government interventions variables cannot be estimated if the effects in these two sets of variables are estimated separately. Moreover, most of the existing work study the impact without accounting for the presence of the persons who are asymptomatic but can nevertheless infect others, which is a distinguishing feature of COVID-19 in comparison with previous viral diseases.

In this paper, we employ a nonparametric regression method not only to model the impacts of weather and government interventions jointly in the presence of asymptomatic infections, which incorporates an epidemic model that allows us to evaluate the effects on the virus transmissibility, but also provide forecasts of future COVID-19 infections. Specifically, a Gaussian process prior (Williams and Rasmussen, 2006) is imposed on the functional parameters in the susceptible-infectious-removed (SIR) model (Kermack and McKendrick, 1927) , and based on this model, the posterior distribution of the basic reproduction number, which is used to measure the transmission potential of a disease, will be derived. Both main and interaction effects of these factors will be analyzed by sensitivity analysis (Sobol, 1993) .

Parameter estimation in epidemic models is often called calibration in the computer experiment literature (Kennedy and O'Hagan, 2001; Santner, Williams and Notz, 2018; Tuo and Wu, 2015) . Although there are numerous developments on calibration, most of the existing work are based on scalar parameters rather than functional parameters. Exceptions include the recent work by Plumlee, Joseph and Yang (2016) ; Brown and Atamturktur (2018) , but their work is based on continuous outputs with a Gaussian assumption, which does not hold for count data in the epidemic models.

In Section 2, the SIR model and a modified SIR model with functional parameters will be introduced. The statistical model incorporated with the SIR model will be explicitly described in Section 3. Numerical studies are conducted in Section 4 to examine the performance. In Section 5, the statistical model is applied to the COVID-19 outbreak to assess the impacts of weather and government interventions. Final remarks are given in Section 6. The details of sampling for the posterior distributions, the R (R Core Team, 2018) code, and the data for reproducing the results in this paper are provided in the Supplementary Material (Sung, 2022) .

2.1. SIR model. Compartmental models are widely used in epidemiology which simplify the mathematical modelling of infectious diseases. One of the prominent models is the SIR model (Kermack and McKendrick, 1927; Diekmann, Heesterbeek and Britton, 2012) , which assigns the population to three compartments: susceptible (S), infectious (I), and removed (R), where the three compartments respectively represent the number of the susceptible individuals, the infected individuals, and the removed individuals, which include the ones who are recovered, quarantined or deceased. The SIR model has been widely used for understanding how a disease spreads in outbreaks of measles, influenza, rubella, smallpox, Ebola, monkeypox, SARS, and the current COVID-19 pandemic. See, for example, Osthus et al. (2017) ; Chen et al. (2020) ; Cooper, Mondal and Antonopoulos (2020); Roda et al. (2020); D'Arienzo and Coniglio (2020) .

Transitions among the three compartments can be expressed mathematically by three ordinary differential equations as follows:

where S(t), I(t) and R(t) represent the numbers of cases in the corresponding compartments, N = S(t) + I(t) + R(t) is the total population, β is the contact rate that represents the average number of contacts per person per time in the susceptible compartment that is sufficient to spread the disease, and γ is the removed rate from the infectious compartment to the removed compartment. The ratio of β and γ is called the basic reproduction number in epidemiology, often denoted by R 0 := β/γ, which indicates the average number of infected cases generated by a typical infectious individual when introduced into a fully susceptible population. This number is of great importance in public health and epidemiology, which is often used to measure the transmission potential of a disease or a virus (Dietz, 1993; Zhao et al., 2020; Zhang et al., 2020) . Essentially, when R 0 is larger than 1, the infection will be able to start spreading in a population, and the larger R 0 is, the harder it is to control the epidemic.

2.2. Modified SIR model. Although the SIR model has been widely used in epidemiology, the model has been shown that it cannot reflect the reality due to its simplifications and assumptions. See, for example, Heesterbeek et al. (2015) ; Sung and Hung (2020) . In particular, the constant parameter assumption of β and γ, which implies that the contact rate and the removed rate are both fixed in the entire process, is too strong and unrealistic (Cowling, Ho and Leung, 2008; Cauchemez et al., 2016; Hong and Li, 2020; Yu, 2020; Ambrosio and Aziz-Alaoui, 2020) . Therefore, in this article, we consider a modified, more flexible SIR model by assuming that the parameters can vary based on potential factors.

First, similar to Hong and Li (2020), we consider a discrete version of SIR models by replacing the derivatives in (1) with finite differences, which results in

Then, by assuming the functional parameters β(x) and γ(x), where x ∈ Ω ⊆ R d is a ddimensional factor, and expressing the equations in a recursive fashion, a modified SIR model can be expressed as

and S(t + 1) = N − I(t + 1) − R(t + 1) for t ∈ N ∪ {0}. Thus, the number of the daily infectious cases at day t based on the modified SIR is the difference in susceptible from day t − 1 to day t, which we denoted as

(2) f (t, β(x), γ(x)) := S(t − 1) − S(t).

3. A statistical model incorporated with the SIR model.

3.1. Gaussian process priors for functional parameters. In this section, we introduce a statistical model incorporated with the modified SIR model in (2). First, denote y t as the daily reported number of infectious cases at day t, and assume y t follows an independent Poisson distribution with the mean function that is a fraction of f (t, β(x), γ(x)). That is,

where κ(t) ∈ (0, 1), indicating that only a fraction κ(t) of the total number of infected are reported. The fraction κ(t) plays a crucial role for taking into account the presence of asymptomatic, or however undetected, infectious cases, which is one of the distinguishing features of COVID-19. The idea of including the fraction was also mentioned in the literature (e.g., Piazzola, Tamellini and Tempone (2021) and Ansumali et al. (2020) ), but a constant fraction was often considered, which was criticized by Ansumali et al. (2020) as unrealistic. A dynamic fraction as in (3) varying with time is more realistic for the problem. It should be noted that in the model (3), it is intrinsically assumed that the rate of transmission by an infectedasymptomatic person is the same as the infected-symptomatic person. More sophisticated models with an additional "asymptomatic-but-infected" compartment, which allow for different transmission rates in the model, may be considered, such as Robinson and Stilianakis (2013) and Ansumali et al. (2020) . However, as pointed out by Ansumali et al. (2020) , several recent studies show that there is no discernible difference between the two rates in the COVID-19 outbreak. See, for example, He et al. (2020) ; Li et al. (2020) ; Liu et al. (2020); Wölfel et al. (2020) . As such, equal transmission rates are assumed in the proposed model, and the extension to the compartmental models with the asymptomatic-but-infected compartment, such as the SAIR model of Ansumali et al. (2020) , is left for the future work. Such work could also allow for studying whether the effects of potential factors differ among the asymptomatic versus symptomatic persons, such as government interventions.

Notably, as pointed out by Ansumali et al. (2020) , another popular epidemic model for the outbreak, SEIR model (Susceptible-Exposed-Infectious-Removed) (Kermack and McKendrick, 1927) , is not as realistic, because the exposed group (E) of the SEIR model does not infect the susceptible group (S) as the E group does not carry a sufficient viral load to infect others through contact. This is unlike the asymptomatic-but-infected individuals in the COVID-19 outbreak, which do lead to the S group getting infected. As such, a model like the proposed model accounting for these asymptomatic individuals is more realistic.

The functional parameters in the SIR model are assumed to follow a joint Gaussian process (GP) prior:

The logit transformation is used here because both β and γ are rates which are bounded from zero to one, but the GP prior has positive measures over the negative reals. Other transformation, such as the probit function, Φ −1 (x), the cumulative log-log function, log(− log(x)), or the identity function, x, could be also used here. µ j (·) is the mean function, where we assume a constant mean, i.e., µ j (x) = µ j . τ > 0 is the process variance, and K φ j is the correlation function, for which a Gaussian correlation function is commonly used in the form

is the unknown lengthscale parameter and denotes the element-wise product of two vectors. Note that the correlation function is usually reparameterized as

where φ j = (φ j1 , . . . , φ jd ) ∈ (0, 1) d , for the purpose of numerical stability, because the domain of φ jl ∈ (0, 1) is now bounded. See, for example, Brown and Atamturktur (2018) and Mak et al. (2018) . As a result, the form of the kernel function (6) is used throughout this article. In (4), we assume that logit(β(·)) and logit(γ(·)) are correlated with a positive crosscorrelation, 0 < ρ ≤ 1, which implies that for any given x ∈ Ω, the correlation between logit(β(x)) and logit(γ(x)) is ρ. This can be verified by (5) and the fact that K φ 1 (x, x) = K φ 2 (x, x) = 1 for any x ∈ Ω. The dependence assumption of the two parameters, β(·) and γ(·), is crucial and appealing from an epidemiological perspective. For the compartmental models like SIR, it is well known that the parameters are strongly coupled in the modeling literature. See, for example, the joint posterior distribution in Roda et al. (2020) which shows that the two parameters in an SIR model are highly positively correlated. Thus, the independent GP assumption as in Brown and Atamturktur (2018) and Plumlee, Joseph and Yang (2016) is not valid in this application. Note that unlike the covariance structures in Banerjee and Gelfand (2002) ; Qian, Wu and Wu (2008) where the parameters φ 1 and φ 2 are assumed to be identical, the covariance structure (4) adopts the one in Fricker, Oakley and Urban (2013) and Svenson and Santner (2016) , which is more flexible as the two lengthscale parameters are not necessarily identical.

Lastly, the fraction κ(t) is assumed to have a Gaussian process prior:

with ν > 0 and ϕ ∈ (0, 1), where K ϕ has the same form of (6). Suppose that we observe the reported infectious cases in n days, which are denoted by y n = (y 1 , . . . , y n ).

,k≤n ∈ R n×n , and denote a ij as the element A ij . Then, together with the model assumptions (3), (4), (6) and (7), we have the following hierarchical model,

and (8), (9), (10), (11), (12), (13) are the priors of the parameters, in which InvGamma(a, b) is an inverse gamma distribution with shape parameter a and rate parameter b, and Beta(1, b) is a beta distribution with parameters 1 and b.

3.2. Posterior Distributions. The goal of this study is to infer the functional parameters β(x) and γ(x) and subsequently investigate whether the d-dimensional factor x plays a role in varying the basic reproduction number, which is denoted by R 0 (x) := β(x)/γ(x). In addition, predicting the number of future infections based on forecast weather and government interventions, say x n+1 , is also of great interest. Therefore, the joint posterior distribution of β(x), γ(x), and the number of the future daily infected cases are developed as follows.

We first derive the joint posterior distribution of β(x) and γ(x). Denote the parameters ψ = (τ, ρ, ν, ϕ, φ 1 , φ 2 , µ 1 , µ 2 ) and data = {y t , x t } 1≤t≤n . Then, the posterior distribution given observations can be obtained by

where π(x|y) denotes the posterior distribution of x given y. Thus, the joint posterior distribution of β(x) and γ(x) can be approximated by Markov chain Monte Carlo (MCMC) by drawing the samples from π(β(x), γ(x)|β, γ, κ, ψ, data) and π(β, γ, κ, ψ|data) iteratively. The posterior π(β(x), γ(x)|β, γ, κ, ψ, data) can be drawn based on the property of conditional multivariate normal distributions, that is,

where

. The MCMC samples of β(x) and γ(x) can then be obtained by sampling from the multivariate normal distribution of (14) and taking the inverse of the logit function. For the posterior π(β, γ, κ, ψ|data), we have π(β, γ, κ, ψ|data) ∝ π(data|β, γ, κ, ψ)π(β, γ|κ, ψ)π(κ|ψ)π(ψ) (15)

The samples from this posterior distribution can be drawn by Gibbs sampling with Metropolis-Hastings algorithm, the details of which are given in Section 1 of the Supplementary Material (Sung, 2022) . Now we move to the posterior distribution of the number of future infections. Let x n+1 be the forecast weather and government interventions at time n + 1, and denote β n+1 = β(x n+1 ), γ n+1 = γ(x n+1 ), κ n+1 = κ(n + 1). Then the posterior distribution of the infected number, y n+1 , given the current observed data has π(y n+1 ,β n+1 , γ n+1 , κ n+1 , β, γ, κ, ψ|x n+1 , data) ∝π(y n+1 |β n+1 , γ n+1 , κ n+1 )π(β n+1 , γ n+1 |β, γ, κ, ψ, x n+1 ) ×π(κ n+1 |β, γ, κ, ψ)π(β, γ, κ, ψ|data).

Similarly, this posterior distribution can be drawn by MCMC sampling, where the samples for π(β, γ, κ, ψ|data) can be drawn as introduced before, and the samples from π(β n+1 , γ n+1 |β, γ, κ, ψ, x n+1 ) can be similarly drawn from the multivariate normal distribution (14). The samples of κ n+1 from π(κ n+1 |β, γ, κ, ψ) can be drawn from the posterior of κ(t),

, and set t = n+1, where k ϕ (t) = (K ϕ (t, 1), . . . , K ϕ (t, n)). The distribution, π(y n+1 |β n+1 , γ n+1 , κ n+1 ), follows a Poisson distribution with the mean κ n+1 f (n + 1, β n+1 , γ n+1 ). Thus, the MCMC samples can be drawn iteratively from these posteriors.

4. Simulation Study. In this section, simulation studies are conducted to examine the performance of the proposed method. In the simulations, the hyperparameters in the priors (8), (9), (10), (11), (12), (13), are set as follow. Similar to Brown and Atamturktur (2018) , the shape parameters b ρ , b φ and b ϕ are chosen to be 0.1, which place most probability mass near one to enforce the smoothness for the functional parameters; a τ = a ν = 0.01 and b τ = b ν = 0.01 are chosen so that the prior is centered at one with standard deviation 0.01/0.01 2 = 10; for (13) we set α j = 0 and σ 2 j = 1 for j = 1, 2, 3. For the MCMC sampling, 2,000 iterations are performed in a burn-in period, and after that additional 2,000 MCMC samples are drawn, which are thinned to reduce autocorrelation.

Suppose that the observation y t is simulated from a Poisson distribution with the mean function, κ(t)f (t, β(x), γ(x)), where f (t, β(x), γ(x)) = (t/10 + 5β(x) + γ(x)(t/10) 2 ) 2 and x is one-dimensional factor in the space [0, 1]. Let β(x) = sin(3x) exp(−x) + 0.2, γ(x) = sin(3x), and κ(t) = exp(−t/50), which are demonstrated in the left and middle panels of Figure 1 . It can be seen that the two curves, β(x) and γ(x), share some similarity overall, which suggests that the dependence assumption of these two functions is necessary. We generate x 1 , . . . , x 80 from a uniform distribution, and randomly generate n = 80 observations, y 1 , . . . , y 80 . The right panel of Figure 1 shows the random samples as dots, where the solid line is the true mean function, κ(t)f (t, β(x), γ(x)). We use the first 60 samples, y 1 , . . . , y 60 , as the training dataset and the other 20 samples as the test dataset. Figure 2 shows the posterior draws of β(x), γ(x) and κ(t). It can be seen that the posterior means can recover the true functions very well. The predictions on the test dataset are presented in Figure 3 , which shows that the posterior mean is reasonably close to the true function. These results demonstrate that the proposed method can perform well for the models with functional parameters in terms of estimation and prediction.

In this section, we use the proposed model to analyze the COVID-19 virus spread among the eight largest metropolitan areas in the United States (US). In particular, we use the model to estimate the impacts of weather and government interventions (and their interactions) on virus transmissibility, to forecast daily infected cases based on these factors, and to estimate the fraction of cases reported. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q The data source is briefly introduced here. The daily COVID-19 cases are obtained at the US county level from the data repository provided by New York Times (Almukhtar et al., 2020) , starting from January 25, 2020 to November 25, 2020. The population sizes are obtained from the census bureau website, which also can be found in Yu (2020) . The historical weather data and the weather forecast are collected from the Weather Underground (The Weather Company, 2020), which include the daily average temperature, humidity, wind speed, pressure, and precipitation. The information of government interventions is obtained from New York Times and local media, where we categorize the interventions into five levels: (0) no intervention; (1) all businesses are open with mask required and some capacity limitations ;

(2) all industries resume operations but some indoor services, such as bars and restaurants, remain closed; (3) Industries resume operations with severe restrictions and capacity limitations; (4) all non-essential businesses are closed. Scatterplots for every pair of factors are demonstrated in Figure 4 , which appears to have no obvious relationship between any pair of the factors. Now we are ready to apply the proposed model to the data, where the setting of the MCMC sampling is similar to the one in Section 4. Consider the confirmed cases before November 11 as the training data, and the cases from November 11 to 25 as the test data. Since the actual infectious period for COVID-19 is not available and it varies by individual and situation, as suggested by Centers for Disease Control and Prevention and Wilson (2020), we assume an infectious period of 11 days from the actual infection to the confirmation of the positive test result. In other words, we assume that the actual infection occurs 11 days prior to the confirmation date. The input factor is a 6-dimensional variable, i.e., x ∈ R 6 , including 5 variables representing weather data and one variable representing government intervention levels. The MCMC samples of the basic reproduction number can be obtained from the MCMC samples of β(x) and γ(x) by computing R 0 (x) = β(x)/γ(x). Since it's hard to visualize the function R 0 (x) via a six-dimensional x, similar to Welch et al. (1992) , we use a functional ANOVA decomposition (Hoeffding, 1948; Sobol, 1993; Santner, Williams and Notz, 2018) for R 0 (x) and plot its overall mean and main effects, which respectively measure the overall influence and the influence of a single factor on the basic reproduction number. That is, suppose that

, then the overall mean and the main effects of the function R 0 (x) can be obtained by

respectively, where Ω−j · · · dF −j (x −j ) indicates integration over the variables not in j and

Since the MCMC samples of R 0 (x) are available for any x ∈ Ω and the integration in (17) can be approximated by the Monte-Carlo integration (Caflisch, 1998) , the samples of the posterior distributions of m 0 and m j (x j ) can be naturally drawn via a Monte-Carlo sampling method. This is similar to Le Gratiet, Cannamela and Iooss (2014) for estimating the Sobol indices through a surrogate model that accounts for both the integration errors and the surrogate model uncertainty.

The boxplots of the overall means of R 0 (x) are shown in Figure 5 . It can be seen that among these eight cities, Chicago has the highest basic reproduction number than other cities, which implies that each existing infection in Chicago can cause more new infections than other cities, while the existing infection in Baltimore and Houston causes fewer new infections. Before illustrating the main effects, sensitivity analysis (Sobol, 1993 ) is adopted to determine which input factors are responsible for the most variation in the basic reproduction number. The result is shown in Figure 6 . Although no unique factor can dominate the others for all of the cities in terms of sensitivity index, it appears that government interventions have made stronger impacts than weather factors on the virus spread in most of the cities, especially in Baltimore and San Francisco. On the other hand, some cities, such as Los Angeles, Saint Louis, and Atlanta, have shown evidence that temperature has played a crucial role in explaining the variation of the basic reproduction number. The main effects of R 0 (x) are demonstrated in Figure 7 . As shown in the sensitivity analysis, the intervention and temperature factors both have larger variations in the main effects, ranging from -0.1 to 0.3, whereas the main effects of other weather factors mostly range from -0.1 to 0.1. Among these six factors, it shows that temperature and government intervention (2020); Wang et al. (2020) . We further investigate the interaction effects of the basic reproduction number. Particularly, we focus on the interaction effects between the intervention factor, which is controllable by governments, and the five weather factors, which are uncontrollable. The sensitivity indices of these five interaction effects are first computed to compare their relative importance. For the sake of saving the space, only the interaction effects for New York are demonstrated here. The sensitivity indices and the interaction plot with the highest index are respectively shown in the left and right panels of Figure 8 . It can be seen that the interaction effect between temperature and government interventions has the highest sensitivity index, and from the interaction plot of the two factors, it appears that when governments implement more restrictions, the effect of temperature on the virus spread tends to be milder. This result suggests that as the weather gets colder, policymakers may need to implement more restrictions to mitigate the pandemic.

We validate the proposed model by performing predictions on the test data from November 12 to 25. The prediction results of the eight cities are shown in Figure 9 . The predictions are reasonably accurate over the 14-day period. Particularly, in the cities New York, Los Angeles, Baltimore, and San Francisco, the infected cases tend to increase over the 14-day period and our predictions successfully capture the trend. This shows strong empirical justification for our model specification. Figure 10 presents the posteriors of the fraction of cases reported, κ(t). The posteriors show that the actual infections are greatly undetected in most of the cities. This finding coincides with the recent results by U.S. Centers for Disease Control and Prevention (2021); Pei et al. (2021) ; Noh and Danuser (2021) . In particular, San Francisco shows that the fraction less than 40% for the entire time of the ongoing pandemic, and New York shows low fractions during the peak of confirmed cases, which suggests that the number of actual infections during the peak is likely much higher (over 20,000 daily cases) than the confirmed daily cases (about 8,000). The estimation of the fractions provides crucial insights for pub- lic health, which determines the actual severity of COVID-19 and can be used to develop effective strategies against the outbreak.

To examine the robustness of the results, sensitivity analysis for the priors is conducted, which examines the impact of 30 different prior specifications on the resulting posteriors of R 0 (x) for the test data. The percentage deviations in the average posterior estimates between the original model and the models with the 30 alternative prior specifications are computed, which are presented in Figure 11 . It appears that the results are somewhat stable under different prior settings, with the average percentage deviation being about 3.6%. The posteriors based on the data of New York city are found to have higher shifts with some prior settings, indicating that it may require a bit more care in the future analysis.

6. Concluding Remarks. How the weather and government interventions affect the spread of a disease has been an important question but remains unclear in the literature. A new statistical model incorporated with the prominent SIR model is employed to study the impact on the COVID-19 virus transmissibility among eight US metropolitan areas. Gaussian process modeling and sensitivity analysis for the functional parameters enable to investigate the the main and interaction effects of the factors, which could lead to a new intervention strategy for policymakers. This study shows that, among six factors, temperature and government interventions have stronger impacts on the COVID-19 spread in most of the cities. The temperature has been found to have negative effects in all of the cities. Other weather factors, such wind speed and pressure, do not show common effects among the eight cities. New York city has shown a strong interaction effect between temperature and interventions, which suggests that more restrictions are necessary to mitigate the outbreak as the weather gets colder.

Although we found some potential associations between weather and virus transmissibility, it is worth emphasizing that these associations may not directly imply the causation of the virus transmissibility, meaning that there might be some lurking/causal variables which are correlated with these factors that make the associations appear stronger. For instance, as recent studies have shown (e.g., Wilson (2020) ; Soucy et al. (2020) ), the individual mobility may have the direct impact on the COVID-19 spread, which could be strongly correlated with weather factors. Therefore, incorporating the information of individual mobility and estimating the causal effects of mobility and weather is worthwhile to investigate in the future work. In addition, it is conceivable to consider other potential factors for the virus transmission, such as private sector (i.e., non-governmental) interventions, travel restrictions, and public 500 1,000 compliance with government recommendations such as vaccinations, quarantines and face coverings, if the data are available. However, including too many factors may lead to overparameterization which in turn causes unstable results. A potential solution is to perform sensitivity analysis to screen out non-influential factors that have the least effect, and then remove them from the analysis. We explore these issues in future work. 

Sampling details for the posterior distributions The details of sampling for the posterior distributions in Section 3.2 are described in this file.

A zip file containing the data and R code for reproducing the results in Sections 4 and 5.

Coronavirus in the US: Latest map and case count. The New York Times

On a coupled time-dependent SIR models fitting with New York and New-Jersey states COVID-19 data

The sensitivity analysis for the priors across 30 different prior specifications, which shows the percentage deviations between the original posteriors and the posteriors with the alternative priors

Modelling a pandemic with asymptomatic patients, impact of lockdown and herd immunity, with applications to SARS-CoV-2

Prediction, interpolation and regression for spatially misaligned data

Nonparametric functional calibration of computer models

Monte Carlo and quasi-Monte Carlo methods

COVID-19's US Temperature Response Profile

Unraveling the drivers of MERS-CoV transmission

A Time-dependent SIR model for COVID-19 with undetectable infected persons

A SIR model assumption for the spread of COVID-19 in different communities

Effectiveness of control measures during the SARS epidemic in Beijing: a comparison of the Rt curve and the epidemic curve

Impact assessment of nonpharmaceutical interventions against COVID-19 and influenza in Hong Kong: an observational study

Assessment of the SARS-CoV-2 basic reproduction number, R0, based on the early phase of COVID-19 outbreak in Italy

Mathematical Tools for Understanding Infectious Disease Dynamics

The estimation of the basic reproduction number for infectious diseases

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Estimated disease burden of COVID-19

Multivariate Gaussian process emulators with nonseparable covariance structures

Estimating the impact of daily weather on the temporal pattern of COVID-19 outbreak in India

The effect of country-level factors and government intervention on the incidence of COVID-19

Ranking the effectiveness of worldwide COVID-19 government interventions

Temporal dynamics in viral shedding and transmissibility of COVID-19

Modeling infectious disease dynamics in the complex landscape of global health

A Class of Statistics with Asymptotically Normal Distribution

Estimation of time-varying reproduction numbers underlying epidemiological processes: A new statistical tool for the COVID-19 pandemic

No evidence for temperature-dependence of the COVID-19 epidemic

Bayesian calibration of computer models

A contribution to the mathematical theory of epidemics

A Bayesian approach for global sensitivity analysis of (multifidelity) computer codes

See coronavirus restrictions and mask mandates for all 50 states. The New York Times

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2)

Viral dynamics in mild and severe cases of COVID-19

An efficient surrogate model for emulation and physics extraction of large eddy simulations

Estimation of the fraction of COVID-19 infected people in US states and countries worldwide

Forecasting seasonal influenza with a state-space SIR model

Burden and characteristics of COVID-19 in the United States during 2020

A note on tools for prediction under uncertainty and identifiability of SIR-like dynamical systems for epidemiology

Calibrating functional parameters in the ion channel models of cardiac cells

Gaussian process models for computer experiments with qualitative and quantitative factors

A model for the emergence of drug resistance in the presence of asymptomatic infections

Why is it difficult to accurately predict the COVID-19 epidemic?

The Design and Analysis of Computer Experiments

Sensitivity estimates for nonlinear mathematical models

Estimating the effect of physical distancing on the COVID-19 pandemic using an urban mobility index

Estimating functional parameters for understanding the impact of weather and government iinterventions on COVID-19 outbreak

Efficient calibration for imperfect epidemic models with applications to the analysis of COVID-19

Multiobjective optimization of expensive-to-evaluate deterministic computer simulator models

R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing

Efficient calibration for imperfect computer models

Mitigate the effects of home confinement on children during the COVID-19 outbreak

Screening, predicting, and computer experiments

Gaussian Processes for Machine Learning

Virological assessment of hospitalized patients with COVID-2019

The Modest Impact of Weather and Air Pollution on COVID-19 Transmission

Impact of mitigating interventions and temperature on the instantaneous reproduction number in the COVID-19 epidemic among 30 US metropolitan areas

Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside Hubei province, China: a descriptive and modelling study

Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak

Acknowledgments. The author gratefully acknowledges the conscientious efforts of the editor and two anonymous reviewers whose comments greatly strengthen this paper. The author is grateful to Dr. Ying Hung for the initial discussion which inspires the author to work on this problem.Funding. The author was supported by NSF Grant DMS-2113407.