key: cord-271425-ysdw31nq
authors: Carson, R. T.; Carson, S. L.; Dye, T. K.; Mayfield, S. L.; Moyer, D. C.; Yu, C. A.
title: COVID-19's U.S. Temperature Response Profile
date: 2020-11-05
journal: nan
DOI: 10.1101/2020.11.03.20225581
sha: 
doc_id: 271425
cord_uid: ysdw31nq

We estimate the U.S. temperature response curve for COVID-19 and show transmission is quite sensitive to temperature variation. This is despite summer outbreaks widely assumed to show otherwise. By largely replacing the death counts states report daily, with counts based on death certificate date, we build a week-ahead statistical forecasting model that explains most of the daily variation (R-square = 0.97) and isolates the COVID-19 temperature response profile (p < 0.001). These counts normalized at 31C (U.S. mid-summer average) scale up nearly 160% at 5C. Positive cases are more temperature sensitive; scaling up by almost 400% between 31C and 5C. Dynamic feedback amplifies these effects. There is a short window to get COVID-19 under control before cooler weather makes the task substantially more challenging.

The question of whether COVID-19 exhibits a pronounced temperature response profile has garnered attention since the early days of the pandemic (1, 2, 3) . Kissler et al.'s examination of medium and longterm management of the pandemic assigns a prominent role to understanding how the temperature response profile (hereafter "TRP") of COVID-19 might influence the pandemic's progression in the United States (3, 4) . This follows conventional wisdom regarding the strong seasonal weather-driven pattern of influenza (5, 6) , which helped mask COVID-19's early U.S. ascent (7). The recent rise in positive COVID-19 cases and related deaths across much of the United States has called into question whether COVID-19 transmission is adversely sensitive to summer weather. Nevertheless, some modeling groups such as the University of Washington's Institute for Health Metrics (IHME) have assumed COVID-19 activity will increase as temperatures decline and are now using information from the U.S. influenza monitoring network (8), which was previously shown to be predictive of the pandemic's path last spring (7), to help incorporate that effect.

Daily state reports often bear little resemblance to the actual number of COVID-19-related deaths that occurred on that day. The resulting temporal data misalignment creates a substantial impediment to recovering any relationships with a crucial dependence on event timing. By reconstructing the set of death counts states report daily, largely by substituting in retroactive corrections based on death certificate dates, we are able to reliably estimate the TRP for COVID-19 deaths.

We assemble over 2,500 state-level daily observations from April 16-July 15, 2020, after COVID-19 became well-established across the United States. We largely follow the literature in environmental economics on estimating pollution and temperature related impacts on a range of health and other outcomes (9). SARS-CoV-2 (the virus which causes COVID-19) being a novel virus makes full implementation of the standard approach infeasible, because multiple years of both spatially and temporally delineated data are not available. The specific issue that we cannot currently resolved is whether the TRP over a given temperature range is the same in both directions: the cooler to warmer direction reflected in our data set, and the warmer to cooler direction, occurring as the U.S. enters its fall and winter seasons. In what follows, it is also important to recognize that we estimate the joint, not separate, effect of any biological response by the virus and any behavioral response by the public.

Early attempts to pin down COVID-19's temperature response profile (TRP) have proven elusive. The now well-accepted approach for estimating influenza's TRP is epitomized by Barreca and Shimshack (6) , which draws heavily on the modelling of climate impacts on human populations (9,10,11). Under this approach, a panel data set of political entities, such as countries or their political subregions (states, counties, etc.), is assembled and the outcome of interest observed across a long time horizon (e.g., a 20-year period). The ability to employ fixed-effect indicator variables to correct for time-invariant differences between political jurisdictions, coupled with the ability to use short-run weather variability, provides statistical identification of a variety of impact response functions.

Prior research has three important limitations. First, cross-sectional data cannot statistically identify the desired function without making the implausible assumption that all possible confounding variables are adequately controlled for. Routinely updated time-series models slowly incorporate environmental conditions into their forecasts without ever isolating it. Short panel datasets, where the stimulus of interest has limited range (temperature, humidity, UV light, air pollution in each location), often lack the statistical power to pin down such response functions. Consequently, these estimated response functions are often fragile in the sense that statistical significance is lost when time trends or demographic variables are added to models (12, 13, 14, 15) .

Second, early work, often using Chinese or cross-country data, focused on the speed at which the pandemic ramped up in different locations, which used variants of derivative statistics like growth rates or R0 as the dependent variable (16,17) . These works point to temperatures in the 0-10°C range as being most conducive to spreading COVID-19, with a possible humidity effect (17, 18, 19) . Current interest is now focused on situations where COVID-19 is spatially well-seeded and its effective R0 can move up or down with actions like state reopening plans.

Third, the quality of reported COVID-19 statistics is often suspect, particularly from the early phase of the pandemic. We move past this period to an observational window where reporting has stabilized. However, this reveals a different problem: temporal mismatches between when an event (e.g., a COVID-19-related death) was reported and the weather variables potentially influencing that event (11). When reported event dates significantly differ from actual event dates, the resulting measurement error can overwhelm standard sources of biological variation such as individual differences in incubation periods.

Our ability to isolate the TRP for COVID-19 related deaths stems largely from our reconstruction of statelevel COVID-19 data. The most important of these is replacing the daily death counts initially reported by states with the actual daily counts based on death certificate date where possible. Second, if death certificate data is not available, we use the retroactively corrected data series that a number of states have produced. When available, this type of data generally rectifies many initial reporting errors. Third, we correct other implausible data reports such as implicit negative daily death counts and zero counts on one day followed by a clear double-count the following day using a consistent protocol. This effort is described in detail in Supplementary Information (SI) section on Data Preparation. ) CPTDailyDead it daily death counts for (Florida) in blue with the "Actual" death counts by death certificate date overlaid in red. Actual curves follow the general shape predicted by epidemic models; curves based on the originally reported counts have clear day of the week patterns and large spikes, neither of which are predicted by biological models. Panel (B) shows the two implications. First, the confidence interval from fitting a simple quadratic trend model is dramatically larger for the CPTDailyDeadit (i.e., the Reported data) than that fitting the same model using the Actuals. Second, the model based on the CPTDailyDeadit is slow to pick up the sharp rise in deaths in Florida because the reported data is temporally misaligned, specifically originally reported counts temporally lag the actual counts. Correct temporal alignment of the death count data is also critical to being able to isolate COVID's temperature response profile. Figure 1 , Panels (C) and (D) display results for Georgia which show how use of CTPDailyDeadit can lead to both missing a downturn and an upturn, while supporting an incorrect slow steady progress story. Figure S1 provides similar graphs for Arizona and Texas. The difference between Figure 2 (A) and (B) is striking. Notably it is not driven by our superior modelling ability, since our model is intentionally designed to be simple and to transparently isolate the TRP. Rather, the difference is driven by our use of dramatically higher-quality, almost textbook-like temporally aligned death counts. It suggests this pandemic's short run daily trajectory at the state-level is (now) predictable with high accuracy, if resources were devoted to ensuring that actual daily death counts were reported in a timely enough manner for use by the modeling community, rather than trickling in over a period of months as they do now. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint

Our objective is to isolate COVID-19's TRPs with respect to its two visible indicators, daily death counts (DailyDeadit) and new positive test counts (NewPositivesit) at the state level. Any TRP specification should allow for the possibility of non-responsiveness (i.e., zero temperature dependence). TRPs should not incorporate fixed factors like state demographics, nor other factors associated with calendar date or a clear temporal profile. Rather, daily exogenous variation in temperature on any specific day should be the source of statistical information for identifying the TRP of interest. Intuitively, the TRP is being statistically identified by having days where the lagged count of the COVID-statistic of interest is approximately the same magnitude but a range of different temperatures is observed. The number of such comparable days is substantially increased by the introduction of controls for conditions that remain fixed across states and a flexible time trend. Statistical modeling issues revolve around functional forms and specification of relevant lag structure. The slow-moving systematic changes in temperature over time imply that, over short time horizons, temperature will not be the driving force behind DailyDeadit and NewPositivesit. However, over longer time horizons, a virus's TRP can be a major factor in its long-run trajectory.

We focus on two directly observable COVID-19 statistics: daily deaths (DailyDeadit) and new (testdiagnosed) positive cases (NewPositivesit), indexed by state (i) and day (t). For each we build a simple week-ahead forecasting model, controlling for state-level fixed effects and including a quadratic time trend. We then examine how lagged (t-k) maximum daily temperature (MaxTempit-k) scales the model's predicted DailyDeadit or NewPositivesit. Attention is restricted to conditions where MaxTempit ≥ 5°C, since data below this level is sparse during our sample period and concentrated in a few sparsely populated states like Alaska and Montana. Our empirical estimates of the TRPs are normalized to 100 at 31°C (~88°F), the U.S. population-weighted average for the last week of our sample to aid interpretability.

The pandemic modelling community has largely concentrated on DailyDeadit, believing it to be a more reliable indicator of the COVID-19 infection pool than NewPositivesit due to the large differences in testing regimes across states and time. We proceed in a similar manner, but also produce the TRP for NewPositivesit, conditioning on available testing information, since it is positive cases that are potentially directly influenced by temperature.

Our base DailyDeadit statistical model (Eq. 1) is comprised of two multiplicative components. The first produces expected current period deaths as a function of past observed deaths at a fixed temperature. The second allows expected DailyDeadit to (potentially) vary with past values of MaxTempit-k.

The terms inside the first component are the infection pool proxy, DailyDeadit-7, a set of state-level fixed effect indicators, StateIndicatori and a quadratic time trend in Dayst (t=1, …, 137; initialized to March 1 to aid interpretability). The StateIndicatori captures the influence of a wide range of variables which remain constant over the period examined, such as demographic composition, geographic links between locales that influence infections, and public health infrastructure. The time trend variables pick up the decline in the case fatality rate and the average effect over time of initial lockdowns, social distancing, and state reopenings. The first component is exponentiated to incorporate the restriction that expected DailyDeadit should be positive if COVID-19 transmission is active anywhere in the set of connected units being examined. Commensurately, we use LogDailyDeadit-k and LogDayst as regressors so this component can be interpreted as a log-log regression model with state-level fixed effects.

The second component is a logistic function scaling predicted DailyDeadit up or down with MaxTempit-k. Deaths on any specific day are the result of infections propagated over an earlier period. We use LogMaxTempit-7 and LogMaxTempit-14 to roughly encompass the relevant period for temperature influencing current period deaths. Our panel data model specification uses an additive error term, which necessitates using nonlinear least squares to solve the model (20,21), but decouples conditional mean estimates from the estimated error component.

Formally, our base model for U.S. state i on day t is given by:

where estimated coefficients for the set of StateIndicatori, and the Greek letter parameters minimize the sum of the squared error term of the estimated εit. The NewPositivesit model is similarly structured.

We investigate (Table S6 ) the sensitivity of our results to range of alternative specifications (e.g., different infection pool indicators, different temperature scaling functions, adding absolute humidity, relative humidity, ultraviolet radiation, the inclusion of shelter-in-place/reopening orders, lagged cumulative death counts, and use of the CTP death counts) and provide further discussion of modelling issues in the Supplementary Information (SI) section on Modeling Approach.

Our analysis uses three main types of data:

1. COVID-19 statistics for state-level death counts, positive cases, and tests, using The COVID Tracking Project (CTP; covidtracking.com) as our base information source. 2. Temperature data from the U.S. National Weather Service Integrated Surface Database.

We undertook extensive repair of the COVID-19 data reported daily by states, particularly those involving death counts. A dominant feature of these data are the substantial lags between when many of these events occurred and when they are reported. It is not uncommon to see states include deaths that occurred several weeks prior in any given day's count.

Over half the U.S. states have made the number of COVID-related deaths publicly available by their death certificate dates in some manner (SI Data Preparation). A simple OLS regression of death counts by death certificate date on originally reported death counts for many of these states yields an R 2 of less than 0.5. Other states have made corrections to originally reported COVID-19 statistics. These updated counts tend to retroactively correct a myriad of reporting errors by states contained in the COVID Tracking Project's daily data snapshot. These issues range from failing to report any information on specific days, decisions on "probable" COVID-related deaths, and the resolution of duplicated death certificates. We use states' self-corrected counts when available.

In instances where neither of these two sources of information were available, we undertook a consistent set of data repair operations. These include: averaging across days with no reporting, prorating backwards in time large batches of deaths (or other statistics) reported on an arbitrary date when it is known those deaths occurred over a much longer period (typically from nursing homes), and making the minimum set of changes to effectively correct logical violations such as negative counts of new deaths, positive cases, and tests. Details on our data repairs are contained in the SI Data Preparation section; we make this dataset available for use by other researchers, along with a line-by-line account of the corrections made.

For each state, weather variables are taken from the airport with the highest volume of commercial traffic. We focus on maximum daily temperature. Results with mean temperature are quite similar. State-level aggregation requires taking weather data from a single station. The measurement error induced by this compromise is likely to be less than one might think. Many states are small spatially, or have a single concentrated metropolitan area (e.g., Illinois). In spatially large states with large populations, most of the population often lives within reasonable proximity to the largest airport. Even in Texas, most people live along the corridor between Dallas (DFW is our representative airport) and Houston. As a result, over 60% of the American population lives within 300km of the representative airport for their state. Similarly, positive cases are also concentrated in major metropolitan areas near those airports. More generally, a single source of classical measurement error attenuates parameter estimates toward finding no effect.

Eq. 1 is estimated using non-linear least squares and has an R 2 of 0.97. All parameter estimates (provided in Table S1 ) using robust standard errors clustered at the state-level are significant at p < 0.001. Figure  2 (B) displays the actual (corrected) DailyDeadit versus the model's in-sample predicted values. States where death certificate date were available have considerably smaller prediction errors (p < 0.001, Model 2 in Table S6 ).

The model's quadratic time trend suggests DailyDeadit has fallen over time; at a declining rate from mid-April through the end of May, and then almost flat (rising at a slow rate). LogDailyDeadit-7 is the dominant predictor and its coefficient estimate of 0.8686 (t=35.32) has a standard elasticity interpretation. Figure 3 shows the estimated TRP implied by the parameter estimates for the two MaxTempit lags in the logistic scaling function. The vertical axis represents expected DailyDeadit at each temperature value when both lags are set to the same temperature and the TRP is normalized to 100 at 31°C. ( Figure S9 show the corresponding TRP using the original CTP death counts.) Figure S2 provides the contour plot which allows the values of the two MaxTempit-k lags to independently vary. Differentiating Eq. 1 with respect to DailyDeadit-7 produces a measure of how the expected DailyDeadit increases from a one death increase in DailyDeadit-7. Setting t and MaxTempit-k to chosen values, yields [TRP*β*EXP(StateBaseit)]/(DailyDeadit-7) (1-β) , where the TRP directly scales the elasticity parameter, β, on LogDailyDeadit-7 and StateBaseit is the temporally vary sum of the fixed StateIndicatori and the quadratic time trend. As an example, for Georgia on July 15 (StateBaseit = 3.9184, MaxTempit-7 = 29.4, MaxTempit-14 = 27.8 and the corresponding non-normalized TRP = 0.0368), changing DailyDeadit-7 from 27 to 28, is predicted to increase the expected DailyDeadit in Georgia on July 15 by 1.0457. Figure S3 (B) displays a variant of this calculation for individual states as DailyDeadit-7 increases from 0 to 1.

We simulate series of DailyDeadit with both static and dynamic variants of our estimated base model (Eq 1). To do so, we take the DailyDeadit-7 from the last week of our sample period and MaxTempit = 31°C as the initial values to propagate the simulation. We fix MaxTempit-7 and MaxTempit-14 at 31°C from day 1 to day 45 to mimic the rest of the summer period, then progressively decrease them by 0.2°C each day until 5°C is hit, which occurs on day 175, after which temperature is held constant.

The brown dashed line in Fig. 4 provides a stylized static [left vertical axis] representation of information contained in our base model (Eq. 1) using Georgia as an example since it starts out at just below our 31C° normalization point and in a cold year hits our 5C° endpoint. The two MaxTempit-k change in tandem, with . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint all other variables fixed at their initial values. In this static response mode, expected DailyDeadit increases through only one channel -the direct impact of lowering temperature.

The blue dashed curve in Figure 4 shows the dynamic counterpart [right vertical axis] that allows MaxTempit-k to influence DailyDeadit, which is then used as the lagged model input for subsequent projections. In this dynamic mode, temperature affects expected DailyDeadit through two channels: first, the direct effect of lagged temperature on daily death count, as shown in the static model; second, the indirect compounding effect of these temperature-driven daily death counts when the initial death count of each cycle of the simulation is set to a lag of these outputs. The indirect effect of maximum temperature is the dominant mechanism, accounting for more than 90% of the increase in the expected death count as maximum temperature falls from 31°C to 5°C. The inset in Fig. 4 plots the two curves under the same scale and conveys the magnitude of the differences between the pure static and dynamic responses, both of which are hard to achieve in practice. The static response requires continual reductions in effective contact rates that exactly offset the increase in transmission potential, in the precise sense that the current death count is always held to be equal to last week's death count. Observing the full dynamic effect would require both the absence of offsetting government actions and endogenous actions by the public such as increased social distancing and face mask adherence as COVID-19 activity quickly increased. A possible example of the dynamic feedback our model allows when this is not done is the early path of the pandemic through Northern Italy in late February and March 2020.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint StateIndicatori values for some geographically isolated states with small populations like Hawaii are small enough to suggest their COVID-19 death counts would be unsustainable in warm enough weather. This is not true, though, of most states, which is consistent with Baker et al.'s finding from examining earlier emergent viruses that warming weather is not enough by itself to stop their spread (22).

The SI section on Alternative Specifications for Base Death Count Model describes additional analyses that (a) look at alternative temperature scaling functions, (b) substitute DailyDeadit-14 or NewPositivesit-7 for DailyDeadit-7, and (c) add the dates of state actions such as shelter-in-place orders as control variables. This work shows that our finding of a strong TRP for DailyDeadit is quite robust. Some specifications suggest the TRP is flatter in the 10-20°C range but steeper in the 5-10°C range (Fig. S8) .

NewPositivesit are modeled similarly to Eq. 1, substituting LogNewPositivesit-7 as the infection pool regressor. Testing information is required to interpret reported state-level positives cases. We use . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint LogNewTestit, to control for current testing intensity, LogNewTestit-7 (which allows for a past positivity rate interpretation), total tests administered per thousand lagged by one week (PerCapitaTestsit-7) to help control for prior testing intensity and an indicator variable for systematically lower reporting on Monday. We chose LogMaxTempit-7 for consistency with the death count model. For the other temperature lag, the 2 nd lag fits best.

The resulting model's (Table S2 ) R 2 is 0.95. All regressors are significant at p < 0.001, except for some of the test related variables: LogNewTestit-7 (p=0.005), PerCapitaTestsit-7 (p=0.002) and Monday (p=0.015). Estimated parameters for the two lagged temperature variables are considerably larger than their DailyDeadit counterparts. The implied TRP for NewPositivesit is displayed in Fig. 5 , with the contour plot that allows the two temperature lags to vary independently is provided in Fig. S4 . 

We show that DailyDeadit predictably varies with changes in maximum daily temperature (Fig. 3 ). This relationship is considerably more pronounced for new positive cases (Fig. 5 ). Our TRPs are normalized to 100 at 31°C, which is near the maximum of U.S. summer temperatures. These two TRPs suggest current COVID-19 infection pools in many states need to be brought under control while high temperatures are helping to reduce the virus's transmission. Cooler temperatures with the progression of fall and winter will dramatically ramp up the number of new positive cases and the deaths that follow unless current . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint infection pools are dramatically reduced. Dynamic feedback between rising infection pool indicators and cooling temperatures (Fig. 4) suggests delay in responding to increased virus activity signs will result in rapid escalation. This is already being seen in the current spatial pattern of outbreaks and the everincreasing medium-run death count forecasts (e.g., UW-IHME (8)). Investment in providing the pandemic modeling community with timely counts based on death certificate dates would allow them to deliver substantially more accurate and timely warnings of impending upturns.

Warming temperatures during the spring and summer actively aided efforts to reduce the spread of COVID-19 and may have contributed to a false sense of how effective those efforts were. Cooling temperatures going into the fall and winter will present a very different challenge. Figure 6 shows the average date over the past 30-years when each U.S. county enters the particularly dangerous 10°C to 5°C range for COVID-19. Effects of temperature amplification show up with a lag.

There has long been fear of facing COVID-19 during influenza's October to March season (23). Our results suggest, that in addition to this concern, COVID-19 transmission will become increasingly more efficient at transmission than it was during the summer. Further, if the TRPs for deaths and positive cases behave like influenza (6), both will continue to increase from 5°C until a few degrees below freezing. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint 6. A.I. Barreca, J.P. Shimshack, Absolute humidity, temperature, and influenza mortality: 30 years of county-level evidence from the United States. Am. J. Epidemiol. 176, S114-S122 (2012). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. Table S6 Data and code availability Figs. S1-S9 Tables S1-S6

Additional references . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint

The data set used in this paper starts with the COVID-19 statistics reported by individual states as aggregated daily by The COVID Tracking Project (covidtracking.com). In preliminary work, we (and other modelling groups) found there was no plausible epidemiological model that would produce the wild up and down jumps in the state-level COVID-19 statistics reported daily. As a result, there is a major fork in the path of any modeling effort. Do you build a predictive model for the reported COVID-19 daily death count that policymakers see, where success is judged by minimizing the forecast error around those observable quantities? Or do you attempt to model the actual underlying process with an eye toward understanding key aspects of it? When the reported dependent variable systematically diverges from that being generated by the underlying process, these two approaches fundamentally diverge. For the first path, success, to some degree, comes in uncovering the administrative procedures influencing the divergence. The second path led us to undertake a major effort to rectify and repair reported COVID-19 statistics focused on recovering the temporal structure needed to provide TRP estimates.

The most important correction was replacing originally reported death counts with death counts by date of death as reflected on death certificates. Figures 1 and S1 illustrate the nature of the differences for between the original CTPDailyDead it and DailyDeadit for four important states: Arizona, Florida, Georgia and Texas. The originally reported data results in substantially larger confidence intervals in predictive models and because it lags the actual death counts substantially reduce the ability to detect and respond changes in COVID-19 activity. We were able to make this correction for 26 states where deaths by death certificate information could be located. Our cutoff date of July 15 allowed for two and half months for states to make these death counts available, which should subsume most of the COVID-19 deaths in these states over our sample period. The states where we have been unable to obtain deaths by death certificate dates tend to be smaller and to have had relatively fewer COVID deaths. Typically, they did not face large peaks, which appear to be associated with increased testing delays. However, we have been unable to obtain this data from three large states -California, Illinois, and New York -and of the 20 observations (out of 4,567) with residuals from Eq. 1 whose absolute value is 50 or more, over half are from these three states.

The methods for collecting data backed by death certificates varied based on how the individual states chose to present them. Some states publish their counts based on death certificates in a downloadable format on their official COVID-19 website. Others publish them inside of longer reports in such a way that they can be hard to find, or present tables or graphs that needed to be extracted by hand into spreadsheets. Because there are no official rules for how to publish this information, even the charts themselves vary in structure and presentation. Commonality across states is largely dependent on the specific software vendor they are using for their public-facing COVID-19 dashboards. Data collection procedures sometimes required hunting through source code to find full datasets within, or by hovering cursors over each bar in a bar chart individually over the collection window to make them display the precise value represented. In each case we were able to verify via labeling or official statements that the data was compiled using verified death certificate dates. It's worth noting here that, while the CDC collects deaths by death certificate date (https://www.cdc.gov/nchs/nvss/vsrr/COVID19/index.htm), they only do so at the weekly level, which makes it unusable for the sort of modeling effort we have undertaken and, relative to the data we have obtained from individual states, suffers from even longer reporting lags.

For states where deaths by death certificate date were not available, we first sought to determine if an individual state, either on their COVID-19 reporting "dashboard" or in a downloadable file, had deaths by reported date. These datasets often contain substantial corrections to what a state originally reported on . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint a specific day. These include reports that missed covidtracking.com's daily reporting deadline, more accurate end-of-day tallies (e.g., late reporting counties/hospitals), correction of testing dumps that resulted in the appearance of testing spikes on certain days, the removal of duplicate death certificates, and the resolution of probable cases. When these differed from the COVID-19 death counts originally reported (COVIDTracking.com has a set of "snapshots" of the originally reported information) we used the revised state data. In some smaller states, we believe, but have been unable to fully verify, that these corrected datasets are "close" to deaths by death certificate date.

Two of these types of corrections turn out to be particularly important. First, when a state misses a reporting deadline, this is typically recorded as there having been no COVID-19 events (new deaths positives, or tests) on that date. Because covidtracking.com (and similar aggregator sites) operate off cumulative counts, the next day contains the events that happened during the previous day. Second, "probable" cases have often been treated differently across states and time. State-level resolution of this issue removes an extraneous source of variation. As with the data on death certificate dates, these corrections by states of their originally reported COVID-19 statistics were obtained through a variety of sources ranging from reading counts off interactive bar charts to downloadable *.csv files.

Next, we corrected two obvious problems with the data. First, occasional, exceptionally large spikes accompanied by auxiliary information (e.g., a news/twitter release by the state's department of health) noting that this spike was due to an accumulation of deaths over an extended time period, typically from one or more congregate living facilities, e.g., nursing homes and prisons. For these, the initial correction involved proportionately increasing death counts over the relevant period, with days where this would result in negative death counts not included in the reallocation. Second, we corrected reports of zero deaths on days surrounded by death counts that were sufficiently large that a zero-death count was highly unlikely. Such days are generally also characterized by a failure to report some other COVID-19 statistics (like new positive cases) and by an abnormally high death count on the following day (or two if it is a weekend). Our approach for these was to assume no reporting followed by double reporting so that the indicated correction is to average counts across the two and, in some instances, three days.

We also corrected data in situations where a state had "corrected" an earlier report, but where an entire rectified data series from the state was not available. A typical example is a state that initially reported all deaths except for those in the state's largest county, with the corrected version containing the death count for the whole state contained in a press/twitter release. A similar situation was when a state failed a logical consistency check by having the difference between the reported cumulative death counts on two consecutive days generate a negative daily death count. This typically occurs in a state with a small population and few COVID-19 deaths, which, without comment, reduces their cumulative death count by one or two. Here we "rollback" that correction to the closest date that no longer produces a negative daily death count.

Similar corrections have been made to daily positive case counts and new daily test counts. The major difference is that very few states have made available positive case counts by day of test administration (rather than the day the test result is reported). This means that few states publish datasets that can readily subsume the positive case count data from the COVID Tracking project. This is likely due to these types of information having different reporting standards. Specifically, a state eventually knows the actual date of COVID-related death, but this information is subject to reporting delays due to waiting for confirming test results or autopsies. Because the date of death is on the death certificate, obtaining actual death counts for all states is eventually feasible. However, while the lab doing the test knows the date of test administration, this information is often not shared with the state. States could require more . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint complete reporting, but it is unclear whether the past is capturable. Total test counts are even messier due to the common practice -particularly in the earlier part of our sample period -of reporting all newly returned positive test results daily but reporting negative results inconsistently or in batches. Some negative tests results (often by the state's lab) were reported along with the positives, while other negative tests (often those by private labs) were reported once a week. We performed rollbacks of antibody tests that for a while were mixed with diagnostic tests, but these are messy because information on the period over antibody tests were induced has rarely been disclosed. We have endeavored to average and prorate testing data where the nature of this practice could be reasonably inferred.

The influence of our data correct effort can quickly be gleaned from Table S5 , which describes four simple autoregressive models with a constant term and 7 th death count lag. The first uses the original "Reported" dataset (covidtracking.com) as both the source of the dependent variable, CTPDailyDeadit and the lagged death count. The second uses our corrected version as the "Actuals" for the dependent variable and uses the originally Reported for the lagged regressor. The third uses CTPDailyDeadit as the dependent variable and the Actuals for the lagged regressor. The fourth uses Actuals for both. The parameter estimates for lagged deaths are similar in versions using the same lagged variable and substantively larger in the two versions using lagged actual counts. The R 2 starts at 0.69 for the Reported/Reported model, stays roughly the same (.68) if Actuals are predicted from CTPDailyDeadit-7 , and increases somewhat further to 0.74 for the Reported/Actual combination. The Actual/Actual combination, though, has an R 2 of 0.91, which clearly illustrates that the large gain in explanatory power in the base Eq. (1) model in Table S1 comes mainly from our data repair and rectification effort. Note that the models in Table S5 do set the massive NJ (June 25) reported death count outlier of 1877 (Actual death count is 16) to missing, since it is so large many modeling groups have either dropped it or prorate it over earlier time periods. The R 2 of this Reported/Reported model using these observations falls to 0.42.

Temporal misalignment of NewPositivesit has received more attention than DailyDeadit because of the often-large gap in time between when a diagnostic test is administered and when the result is returned (24) . Indeed, this gap, coupled with different state testing regimes, has led most modeling groups to concentrate on predicting DailyDeadit. We have made substantial repairs to NewPositivesit and NewTestsit data by reference to corrected state reports, and prorated rollbacks of initially reported antibody tests. We have been much less successful in locating information on NewPositivesit by date of test administration. However, temporal misalignment of the NewPositivesit-k is somewhat less important than for deaths than it might first appear because there is a shorter window for positive-to-positive transmission than the positive to death transition and because test results for hospital patients, the persistent high positivity pool, are typically returned quickly. Further, even very noisy testing information can be helpful in serving as controls for variation in state-level testing behavior over time.

Weather data for our main analysis are drawn from the National Centers for Environmental Information (NCEI) Integrated Surface Database (ISD), which report hourly temperature and humidity data for most airports in the world. For each state, weather variables are taken from the airport with the highest volume of commercial traffic, where the volume information is found in the Federal Aviation Administration's 2018 Commercial Service Enplanements report. Our key variable of interest is daily maximum temperature (MaxTemp). We also look at measures of humidity and ultraviolet radiation. Hourly relative humidity is calculated as a function of hourly observed temperature and dewpoint temperature. Under the assumption of ideal gas behavior, we calculate hourly absolute humidity as well (details can be found at https://www.hatchability.com/Vaisala.pdf). We then pick the highest readings within each 24-hour period as daily MaxTemp, MaxRelativeHumidity and MaxAbsoluteHumidity. Minimum daily temperature . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint is obtained by picking the lowest reading and the mean by averaging the hourly readings. Our measure of ultraviolet radiation is UV index, which provides a forecast of the expected risk of overexposure to UV radiation from the sun. UV index data at our representative airports is obtained from OpenWeather Ltd., which publishes daily UV index forecast calculated by National Weather Service.

To calculate the expected date by U.S. county for entering the 5-10°C range (Fig. 6) , we use reanalysis data provided by PRISM Climate group (https://prism.oregonstate.edu/) which provide reliable weather data at a high spatial resolution of 4km by 4km for the contiguous U.S. We extract daily maximum temperature from 1990 to 2019 and count the average number of days it takes since Oct 1 for the daily maximum temperature to fall below 10°C.

There are two competing modelling approaches to predicting future COVID-19 deaths and positive cases. The first builds on a standard SEIR model; the other, a production function approach. The first has a strong epidemiological conceptualization and is clearly better in the early phase of a pandemic when data is scarce. The second is largely agnostic as to the underlying structure of transmission but requires dramatically more data to offset this flexibility. We follow the second, with (Eq. 1) using a simple production function approach. Conceptually, there is an infection pool, the seeds planted, and various inputs ranging from a state's health care system to temperature, that influence the output, the number of deaths observed to today. We primarily use past death counts as the infection pool proxy, take into account state-level fixed effects (amalgamating effects due to constant factors such as demographic characteristics, fixed resources like health care and transportation networks, and average differences in factors including climate variables and mobility), and a simple quadratic time trend. We exponentiate the right-hand-side variables to effectively impose the restriction that under the conditions we observe, expected death counts must be positive.

The use of a multiplicative scaling function incorporates the logic that temperature by itself cannot generate changes in DailyDead it. Because those dying at t became infected not on one day but rather over an extended period, some means of representing temperature in this setting is required. The two options, due to the strong correlation between closely adjacent MaxTempit-k, are a distributed lag structure that imposes structure on a set of MaxTempit-k, or parameterizing the scaling function as the product of individual scaling functions, each with a different MaxTempit-k. We find two lags -the 7 th and 14 th -are sufficient and reasonably consistent with what is known about the biology of the virus and its methods of attack.

Model results are reasonably robust to small shifts in temperature lag positions, with the following two caveats. First, the 7 th lag is a natural one to use; many people's lives follow a typical weekly pattern of contacts and activities, and administrative reporting procedures often have a day-of-the week pattern. Second, lags too far back are insignificant. Notably we use the 18th lag of MaxTempit in the weekly model rather than its naturally shifted counterpart LogMaxTempit-21.

The most commonly used scaling function for our purposes is the logistic function 1/(1 + EXP(X) ϓ ), where X is the variable of interest, and ϓ is the single estimated parameter. This function converges to a constant as ϓ goes to zero. Use of MaxTempit rather than LogMaxTempit-k as the stimulus variable provides a function with a different curvature. Statistically, it provides a similar fit, largely because the corresponding shifts in the estimate of ϓ makes the two scaling functions reasonably similar after normalization.

We also consider another commonly used scaling function, X/(X + ψ), where an estimate of ψ > 0 results in smaller values of X being scaled up more than large values of X and an estimate of ψ < 0 results in larger . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint values of X being scaled up more than smaller values of X, and an estimate of ψ = 0 results in a function exhibiting no temperature responsiveness. This function can be used with either LogMaxTempit-k or MaxTempit-k, and both will approximate a reasonable range of weakly monotonic scaling functions. It is well behaved as long as the estimate of -ψ is bounded away from MIN(X), which appears not to be an issue in the situations we examine. Table S4 provides estimates for a set of competing specifications and their TRPs are displayed in Fig. S8 .

Our first major empirical decision was to use the three-month period: April 16-July 15. This time window allows for the seeding of the virus (to different degrees) across the i=1, …, 51 U.S. states (including DC) over a three-month period. We effectively start tracking the observable outcomes, deaths, positive cases, and tests on April 9 th , because we generally use a one-week lag of the COVID-19 statistics of interest. Our time variable, t=1, …, 137 denoted in days starts with 1 on March 1 st , the approximate date individual states first started reporting COVID-19 statistics.

Our second major decision involved how to implement the core epidemiological concept of a pool of infected individuals who can potentially infect other individuals. This pool is dynamic; existing positives transition to being no longer infectious, while newly infected individuals enter the pool. The totality of the currently infected individuals is unobservable without universal administration at each point in time of a 100% accurate diagnostic test. The question, thus, is whether to use a lagged variant of the test diagnosed positives, NewPositivesit, which logically are part of the infection pool but not necessarily representative of it due to differential testing, or a lagged variant of DailyDeadit.

The lagged DailyDeadit measure is one step removed but was previously assumed to always be observed (25, 26) . This assumption is demonstrably false. In the early run of the pandemic many deaths failed to be classified as COVID-related, in part because some instances were thought to be influenza-related, while COVID's role in inducing cardiac and kidney failure was not widely recognized. We avoid many of these problems by only using DailyDeadit data generated after the pandemic was well established.

Other problems with using lagged DailyDeadit (or NewPositivesit) as the infection pool indicator (such as a state having a proportionately larger elderly or black population) are readily addressed using the standard statistical approach of including time-invariant state-level fixed effects.

One way the use of a lagged version DailyDeadit vs. NewPositivesit will vary is with respect to timing. Current positives are temporally closer to future positives or deaths than current deaths and can potentially pick up the virus spreading among healthy young adults. The shorter link between a lagged NewPositivesit is enhanced by the short period over which a current positive can influence future COVID outcomes, which makes TRP estimation less sensitive to the choice of temperature lags.

Subject to the same measurement error, using NewPositivesit-7 should be preferred to DailyDeadit-7 as the infection pool indicator. These results are provided in Table S6 (Model 16). The main lesson is that the model using LogNewPositivesit-7 is a good predictor of DailyDeadit (R 2 = 0.94), but not as good as LogDailyDeadit-7. This suggests that either the positives have more measurement error than our corrected version of death counts or that the DailyDeadit-7 is a better reflection of the current risk-adjusted infection pool than NewPositivesit-7. The other result worth noting is that with the infection pool indicator temporally advanced, the coefficient on LogMaxTempit-14, as expected, is no longer significant.

The parameter estimates for the base model (Table S1) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint these minimum infection pools is 9853 cases. Fig. S3 (A) provides a visual display of this information for the continental U.S. states.

We calculate a variant of the state-level fixed effects from our main regression model (Table S1 ) by setting the time trend equal to the last day of our sample period, the lagged number of deaths equal to zero, and calculating the expected number of deaths for each state. Small isolated states like Hawaii generate estimates close to zero while the most populous states tend to generate estimates above 2. These estimates suggest the minimum infection pool in each state varies by a factor of approximately 20 (Fig.  S3(A) ). A regression (Table S3 ) of the minimum expected death count on a small set of state-level demographics taken from the U.S. Census Bureau (LogPopulation, %Black, %Hispanic, and %Age80+) explains 79% of the variation in these counts.

We do not want our TRP estimates to be confounded by other factors changing over time (ranging from state and locally mandated shelter-in-place orders, reopening plans, endogenous social distancing, propensity to wear face masks and changing fatality rates). Since we are agnostic as to the mechanism, the straightforward specification is a polynomial time trend, where we find a quadratic term justified. Higher order terms add little insight or predictive power. The quadratic trend for predicting DailyDeadit falls sharply from mid-April to the end of May, after which it very slowly starts to turn up.

A model which adds (a) the number of days since a state issued a mandatory shelter-in-place order and (b) the number of days since a state started to formally reopen its economy can be found in Table S6 (Model 8). The coefficients on these two variables are small and insignificant. The two LogMaxTempit-k parameters fell on average by less than 1%.

This might seem puzzling until it is recognized that the two sets of time-related variables are highly correlated. Dropping the quadratic trend (Model 9 in Table S6 ) provides a different picture because, while still statistically significant at conventional levels, it was substantially diminished. The earlier a state shelter in-place order was issued, the lower the predicted death count (p = 0.028), and the earlier a state started to reopen the higher the predicted death count (p < 0.001). We are reluctant to provide any substantive interpretation to this result. Papers that have tried to determine the role of state actions and the behavior they are intended to influence show the need to (a) extensively model both state and local orders, and (b) incorporate spatially disaggregated mobility data in order to identify the effects of these government mandates from endogenous social distancing by the public that often occurs before these actions (28, 29, 30) . The two temperature coefficients in this model fall on average by less than 20% and remain highly significant. Table S4 compares our base model to alternative specifications. These alternative specifications were chosen to look at the sensitivity of the implied TRP because they all have reasonably similar fits relative to our base model. This allows us to observe how robust the TRP is to a substantial shift in various modelling decisions that we made. The first specification replaces the LogMaxTempit-k with their linear counterparts. On the surface, this seems like a decision about which of two different scales fits better, but because the models have estimated parameters in the scale function, it may be possible for the two different specifications to provide reasonably similar TRPs. The second uses a popular ratio scaling function LogMaxTempit-k/(LogMaxTempit-k + α), where α is an estimated parameter. The third replaces LogMaxTempit-k with MaxTempit-k in this ratio scaling function. A potential issue with Eq. 1, is the possibility that LogMaxTempit-14 directly influences our main infection pool indicator LogDailyDeadit-7. Our next specification replaces our infection pool indicator with an alternative, LogDailyDeadit-14, so both . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint temperature variables are now clearly exogenous from the temporal perspective of the infection pool indicator.

The last is a weekly variant of the model Eq. 2 where the dependent variable is the sum of the daily death count of the next seven days with corresponding pushbacks of the lagged variables. This model averages out much of the daily variation and many types of administrative reporting practices. In forecasting the pandemic's progression, it has become common to use data aggregated to a weekly level in an effort to average out many of the administratively-induced reporting issues that our extensive reconstruction and repair of the daily death count data sought to alleviate. It is possible to estimate a variant of Eq. (1) that uses death count data aggregated into seven-day periods, WeeklyDeadit = Σt DailyDeadit, where the summation is over t=1 to t=7. This makes WeeklyDeadit-7 the sum of the 7th through 13th lags of DailyDeadit. Importantly, this aggregation does not reduce the number of observations because on each day, the weekly aggregation at t=1 adds a new observation DailyDeadi1 and drops DailyDeadi8. Lags of the WeeklyDeadit variable can then be used in the standard way. One implication of this specification is that the temperature variables also need to shift backwards. For conceptual consistency, we use LogMaxTempit-14 in place of LogMaxTempit-7. Empirically, the model fits best with the second temperature lag being LogMaxTempt-18. The model fit is reasonably similar using the 13th lag and 19th lags, but beyond the 19th lag, the temperature variable becomes insignificant, suggesting temperature information farther back in time than this is not useful. The model we report is thus:

The overall impression from Table S4 is the general stability of most of the common parameters for time and the infection pool. Some of the specifications offer insights into the latter variable. It is little influenced by whether the logistic or ratio scaling function was used, nor whether LogMaxTempit or MaxTempit was the stimulus variable. Using LogDailyDeadit-14 in place of LogDailyDeadit-7 results in an estimated TRP being similar to the base model (Fig. S6) , although it does rise more sharply between 10°C and 5°C suggesting, while there may be a endogeneity effect, it is not large and that our base TRP is conservative. In this model, the coefficient on the infection pool indicator shifts from the .86 of the base model to .80; in the weekly version model it jumps to .92, The R 2 for the weekly model increases to 0.99 (from the base model's 0.97) and falls to 0.94 in the model using LogDailyDeadit-14, which is less informative than LogDailyDeadit-7 in term of the infection pool influencing current death counts .

To compare the TRP's implied by the models in Table S4 , we plot the functions using two independent random uniform variables RTemp and RTemp2, defined over the range 5°C and 40°C. The logistic scaling function with two temperature variables, LogMaxTempit-7 and LogMaxTempit-14, and corresponding Eq. 1 estimated parameters ϓ1 and ϓ2 (Table S4) , results in the following scaling function in the base model:

(1 /(1 + exp(LogRTemp ϓ1 )))*(1 /(1 + exp(LogRTemp2 ϓ2 ))).

(3)

There is a fundamental indeterminacy in such a scaling function, in that multiplying the production function part of Eq. 1 by a constant will result in an offsetting change in the scaling function which maintains the same expected value for the dependent variable. Note that each of the two multiplicative components of Eq. 3 converges to .5 irrespective of temperature values as ϓ1 and ϓ2 become increasingly negative. Often logistic functions are normalized to lie between 0 and 1 by changing the "1" in the numerator to "2", but this is not needed with our normalization to 31°C, which solves the indeterminacy . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint from the perspective of comparing curves. This is done by calculating the value of the estimated scale function at 31°C:

(1 / (1 + exp(log(31) ϓ1 )))*(1 /(1 + exp(log(31) ϓ2 ))).

(4) Dividing Eq. 3 by Eq. 4 produces a function which equals 1 at 31°C. Multiplying this quantity by 100 produces a function which has a natural percentage interpretation and equals 100 at 31°C. Note that for forecasting purposes, the original scaling function parameters need to be used.

The ratio scaling function with estimated parameters α1 and α2 using RTemp and RTemp2 is:

(LogRTemp/ (LogRTemp + α1)) *(LogRTemp2/(LogRTemp2 + α2)).

As α1 and α2 converge to zero, both multiplicative components of SI.1c converge to 1 irrespective of temperature values. The value of this function at 31°C can be calculated in a manner similar to that described for the logistic. Dividing Eq. 5 with the value of that function at 31°C and multiplying by 100 produces the desired TRP. Figure S8 graphs the daily death TRPs from the model specifications in Table S4 . The alternative specifications produce TRPs remarkably similar to that of our base model and tend to bracket it. The one systematic difference is that TRPs based on the ratio scaling functions tend to be flatter over 10°C to 20°C but rise more sharply in the 5°C to 10°C range.

The other weather variables which have received considerable attention are absolute humidity, relative humidity, and ultraviolet (UV) radiation. Details of construction are provided below in Data Preparation Section. The limitation with all these variables is that they are correlated with MaxTempit (i.e., absolute humidity: 0.58, relative humidity: -0.21, UV: 0.79). Maximum absolute humidity has a reasonable size effect (p < 0.001) in the model where it replaces the corresponding MaxTempit variables. However, the LogAbsoluteHumidityit-k parameter estimates are close to zero and are no longer significant when the two parallel logistic functions comprising the temperature scaling function are added (Model 10 in Table S6 ). Relative humidity has a marginally significant relationship with DailyDeadit when MaxTempit is not in the model and its effect is close to zero in a model with MaxTempit (Model 11 in Table S6 ).

The situation with UVit for which (15) found support is more complex (Table S4 &Table S6, Models 12 and 13). The MaxTempit lags are marginally better predictors in a head-to-head comparison. These two variables are strongly correlated. Adding UVit marginally decreases prediction errors, but the signs of high multicollinearity are obvious; the first MaxTempit lag is still highly significant while the second is insignificant and one of the UVit lags is only marginally significant (Table S4) . It is effectively impossible to disentangle the influence of LogMaxTempit and UVit. Their relationship is displayed in Fig. S5 , which plots LogMaxTempit and LogUVit for two states over our sample period: Georgia (Atlanta) and New York (New York City). We cast our results in terms of MaxTempit rather than UVit because it is more widely reported and understood, without making any claim our work supports a joint versus singular causal mechanism.

The specification in Eq. (1) is cast in terms of DailyDeadit. If MaxTempit-k influences COVID transmission via the link between positive cases, then we should be able to replace DailyDeadit-7 with NewPositivesit-7. Estimates for this model (Table S4) show LogNewPositivesit-7 being significant at p < 0.001 and the R 2 measure falling a bit. The coefficient on MaxTempit-14 is insignificant, which would be expected if . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint NewPositivesit-7 incorporates that information, while the coefficient on LogMaxTempit-7 is larger than in the base specification.

A different aspect of the COVID-19 death statistics that has not been incorporated into the model is lagged cumulative death count, TotalDeadit-k. If DailyDeadit-k can be seen as a proxy for the infection pool influencing DailyDeadit, then TotalDeadit-k (normalized by population) is proxy for the fraction of the population that no longer at risk in the sense of being either removed by death or recovered. Adding LogPCTotalDeadit-7 effectively makes the StateBaseit dynamic in a potentially different way than the quadratic time trend by letting each state evolve according to its own pattern of deaths. Table S6 (Models 14 and 15) displays the results of this model with LogPCTotalDeadit-7 enter as (a) a second order polynomial and (b) a fourth order polynomial.

Three results are worth noting. First, the increase in explanatory power is small with the most noticeable changes being as expected in the StateIndicatori and a substantial reduction in importance of the overall quadratic time trend. Second, the LogMaxTempit-k parameter estimates are similar to that of Eq. (1) suggesting that our TRP is robust to a substantial dynamic reparameterization of the model. Third, in the quadratic specification, DailyDeadit is declining with as LogPCTotalDeadit-7 at a declining rate. In the fourth order specification, all the LogPCTotalDeadit-7 terms are insignificant (although jointly significant). Figure  S6 displays, starting at .05, the two response functions for LogPCTotalDeadit-7. Like the quadratic time trend, they suggest a sharp drop in how DailyDeadit is influence by DailyDeadit-7 as LogPCTotalDeadit-7 increases from low levels with the fourth order polynomial being flatter at high levels of LogPCTotalDeadit-7 than the quadratic. The influence of this factor is reasonably small by the time a state hits 10 deaths per 100,000, a condition that characterizes 80% of the states at the end of our sample period. Earlier, we noted one interpretation of the quadratic time trend was that medical care (and hence death rates) had improved sharply at first and then at a declining rate. This specification has a similar interpretation but suggests that some of that learning is state specific and related to its prior COVID-19 caseload. There is no indication that this rate of decline is accelerating even in the hard-hit Northeastern states where deaths per 100,000 can be as high as 175 (NJ), looking at the fourth order polynomial which should allow this feature to emerge if the data supports it. This suggests the magnitude of the fraction of the population previously infection is still too small to be a substantial factor in slowing transmission of the virus.

If we have succeeded in isolating the TRP through use of the set of StateIndicatori and a quadratic time trend, the simple regression of DailyDeadit on lagged MaxTempit-7 without the state fixed effects and quadratic time trend should reveal a substantially different curve than our estimated TRP displayed in Fig.  4 . Figure S7 displays this relationship using a robust LOWESS smoother (bandwidth 0.2) on MaxTempit-7. This curve is dramatically more sensitive to temperature between 10°C and 30°C. Below 10°C it drops, which was expected since states with temperatures near 5°C tend to be more isolated and smaller population-wise. The curve bends up near 35°C and beyond where most of the observations come from a few states with (relatively) high June and July death counts.

Pinning down the TRP further will require: (a) obtaining more data over a longer time horizon with more temperature variation and, in particular, the important -5°C to 5°C range, where there has been little U.S. experience since the virus became widely dispersed, (b) obtaining death certificate data information from the few remaining large states (California, Illinois and New York) where it is not yet available, since states with death certificate date reporting have prediction errors that are substantially smaller (p < 0.001) than those that don't (Table S6 , Model 2), or (c) having high quality temporally aligned COVID-19 statistics at the county level, which would provide a better temperature match and dramatically increase sample size.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint

What does a TRP based on The COVID Tracking Project's (CTP) originally reported death counts look like compared to our base (Eq. 1) Model 1 which uses DailyDead? To examine this issue, we estimate Model 26 (Table S6 ) a direct analogue of Model 1 that substitutes CTPDailyDeadit and CTPDailyDeadit-7 for their DailyDead counterparts. Consistent with Model 18 (Table S6 ) and other models employing CTPDailyDead, we exclude the two observations that contain the large New Jersey outlier as either the dependent variable or regressor and further excluded twenty observations where LogCTPDailyDeadit-7 is undefined because CTPDailyDeadit-7 is negative.

There are some clear differences between Model 1 and Model 26. First, in Model 26 there is a large drop in the R 2 from 0.97 to 0.81 and the RMSE measure more than doubles. Second, the elasticity parameter on lagged LogCTPDailyDead is just over 60% as large as it is in the DailyDead version. This is the expected result from introducing substantial measurement error, but one that also has large implications for drawing inferences about the size of the infection pool or in undertaking any dynamic forecasting exercise. Third, the quadratic time trend, while still sizeable is substantially diminished in both magnitude and statistical significance. Fourth, there are differences between the TRPs implied by their temperature response parameters in Eq. (1).

The TRPs for Model 1 and Model 26 are plotted in Fig. S9 . The TRPs for the CTPDailyDead variant effectively lies above its DailyDead counterpart. At 5°C, it predicts almost 80% more deaths than our base Model 1. This result may seem counterintuitive to the usual belief that measurement error induces the parameter estimate to be attenuated toward zero That, intuition, generally correct, helps explain the fall in coefficient when CTPDailyDeadit-7 rather than DailyDeadit-7 is used as the infection pool indicator. However, it does not say anything about the implication of introducing measurement error into another covariate in the model. It is easy to show that as the scale of the measurement error in the lagged death count increases, the magnitude of the temperature responsiveness parameters adjusts to incorporate covariance with DailyDead it-k. All else held constant, this would increase the magnitude of the estimated temperature effect under classical measurement error, since the parameters on infection pool indicator and temperature variables should have the same sign. The situation here is more complicated because measurement error from using CTDailyDead also substantially reduces the ability to pin down the quadratic time trend. Thus, neither the sign nor magnitude of the difference between the temperature response parameters across comparable specifications is therefore known a priori. Our parameter estimates (Table S6) suggest an upward bias is likely in temperature effect estimates obtained from models similar to ours that (a) use reported death counts and (b) have specifications where the infection pool and temperature response variables are expected to share the same sign. Note though that in a simple OLS regression model, the infection pool and temperature variables should have opposite signs, resulting in an estimate of the temperature parameter that is biased in the direction of finding no effect.

The parameter estimates and standard summary statistics for models discussed is this paper are provided in Table S6 . It contains three columns for each model. The first column contains the variables included in the model, the second the parameter estimates, and the third the standard errors. After the parameter estimates, the model's R 2 , root mean square error (RMSE), and the number of observations on which the model was fit are provided.

The order in which the models appear are:

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint 1. The base model represented by Eq. 1 using LogDayst, LogDayst 2 , and LogDailyDeadit-7 as the predictors along with a set of state-level indicator variables and using LogMaxTempit-7 and LogMaxTempit-14 in the temperature scaling function. 2. A model which regresses the squared residuals from (1) on the log of state population and an indicator variable, GOOD_DATE for DailyDeadit representing death counts by death certificate date. 3. Base model using linear versions of the two temperature variables. 4. Base model using alternative ratio scaling function Temp/(Temp + α), where Temp is temperature variable and α is the estimated parameter. 5. Base model using the alternative ratio scaling function and linear versions of the MaxTempit-k. 6. Base model with LogDailyDeadit-14 substituted for LogDailyDeadit-7. 7. A weekly version of the base model (see Eq. 1) substituting LogDailyDeadit-14 for LogDailyDeadit-7 and LogDailyDeadit-18 for LogDailyDeadit-14. 8. A version of the base model that adds two state-level government policy variables, the log of the number of days since a shelter-in-place order was first issued (LogDaysShelterInPlaceit) and the log of the number of days since a state began to formally reopen its economy (LogDaysReopenit). 9. A version of (8) that drops the quadratic time trend of the base model. 10. The base model adding parallel scaling functions using the LogMaxAbsoluteHumidityit-7 and LogMaxAbsoluteHumidityit-14. 11. The base model adding parallel scaling functions LogMaxRelativeHumidityit-7 and LogMaxRelativeHumidityit-7. 12. Base model adding additional parallel scaling functions for LogUVit-7 and LogUVit-14. 13. Base model substituting LogUVit-k for LogMaxTempit-k. 14. Base modeling adding a quadratic in terms of LogTotalDeadit-7. 15. Base modeling adding a 4th order polynomial LogTotalDeadit-7. 16. Base model substituting LogNewPositivesit-7 and testing variables in place of LogDailyDeadit-7. 17. The model used in the paper to predict NewPositivesit. 18. The AR(7) regression of the originally reported death counts from the COVID Tracking Project (CPTDailyDeadit) on the 7 th lag of itself (CPTDailyDeadit-7) for Table S5 . 19. The same as (18) but with DailyDeadit regressed on the 7 th lag of the CTPDailyDeadit. 20. The same as (18) but CTPDailyDeadit regressed on the 7 th lag of DailyDeadit. 21. The same as (18) but DailyDeadit regressed on its 7 th lag. 22. The StateBaseit calculation for each state. 23. The model reported in Table S3 which 

The data used in this study are archived at https://github.com/xxx in the form of a Stata ".dta" file. An Excel version of this file was created using StatTransfer. The Stata "do" file creating the data set contains a line-by-line set of the changes made to the original CovidTracking.com data set, and the providence of those changes. Three additional Stata "do" files are available in this archive. The first contains the code (Stata 16.1) for the regression models reported in this paper. The second provides an example of how to estimate the static and dynamic temperature response profiles for an individual state using Georgia as an example. The third contains Stata code for creating the basic versions (fine labeling was done using Stata's graph editor) of the figures in this paper. We also provide a further Stata dataset and corresponding Excel files at the state level (using our representative airport) with maximum daily temperature readings from the last thirty years.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted November 5, 2020. ; https://doi.org/10.1101/2020.11.03.20225581 doi: medRxiv preprint

Rapid expert consultation on SARS-CoV-2 survival in relation to temperature and humidity and potential for seasonality for the COVID-19 pandemic

Will coronavirus disease 2019 become seasonal?

Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period

Misconceptions about weather and seasonality must not misguide COVID-19 response

Absolute humidity and pandemic versus epidemic influenza

Failing the test-the tragic data gap undermining the US pandemic response

No autopsies on COVID-19 deaths: a missed opportunity and the lockdown of science

Estimation of excess deaths associated with the COVID-19 pandemic in the United States

Center for Disease Control. Planning scenarios

Strong social distancing measures in the United States reduced the COVID-19 growth rate: study evaluates the impact of social distancing measures on the growth rate of confirmed COVID-19 cases across the United States

Lockdown Policies at the State and Local Level

How did covid-19 and stabilization policies affect spending and employment? a new real-time economic tracker based on private sector data