key: cord-0812869-o72ls2my
authors: Pedrosa, Renato H.L.
title: The dynamics of Covid-19: weather, demographics and infection timeline
date: 2020-04-27
journal: nan
DOI: 10.1101/2020.04.21.20074450
sha: 58f667195a4d1c1c1aee1dec3f0cd9bb59e6e92b
doc_id: 812869
cord_uid: o72ls2my

We study the effects of three types of variables on the early pace of spread of Covid-19: weather variables, temperature and absolute humidity; population density; the timeline of Covid-19 infection, as outbreak of disease occurs in different dates for different regions. The regions considered were all 50 U.S. states and 110 countries (those which had enough data available by April 10th. We looked for associations between the above variables and an estimate of the growth rate of cases, the exponential coefficient, computed using data for 10 days starting when state/country reached 100 confirmed cases. The results for U.S. states indicate that one cannot expect that higher temperatures and higher levels of absolute humidity would translate into slower pace of Covid-19 infection rate, at least in the ranges of those variables during the months of February and March of 2020 (-2.4 to 24C and 2.3 to 15g/m3). In fact, the opposite is true: the higher the temperature and the absolute humidity, the faster the Covid-19 has expanded in the U.S. states, in the early stages of the outbreak. Secondly, using the highest county population density for each state, there is strong positive association between population density and (early) faster spread of Covid-19. Finally, there is strong negative association between the date when a state reached 100 accumulated cases and the speed of Covid-10 outbreak (the later, the lower the estimate of growth rate). When these variables are considered together, only population density and the timeline variable show statistical significance. We also develop the basic models for the collection of countries, without the demographic variable. Despite the evidence, in that case, that warmer and more humid countries have shown lower rates of Covid-19 expansion, the weather variables lose statistical significance when the timeline variable is added.

The first reported cases of people showing serious respiratory symptoms, which were later identified as caused by a new variant of Coronaviridae, occurred in China during the last month of 2019. Cases soared during the second half of January, going from 80 on January 18 th to over 9,000 by the end of the month, with 213 deaths, mostly from acute respiratory insufficiency. At that point, there were 11 cases reported in Japan and only a few other across Asia. By mid-February, there were cases in many other countries, but still in small numbers, and none in Africa or South America, which have most of their territories in the Southern Hemisphere, then under Summer. Australia was one of the few countries in the Southern Hemisphere with cases, but, at that time, all could be traced to travelers arriving from China or other Asian countries. By the end of that month, some countries were already facing an epidemic situation. The first positive case reported by a South American country occurred on Feb. 26 th , in Brazil, of a man who had just returned from a trip to Northern Italy. Since then, the virus started to spread, slowly, initially, as in other countries, with all cases being of people arriving from other countries, especially from Italy, or who had direct contact with them. By March 6 th , it was recognized that local untraceable transmission was occurring in Brazil and the number of cases increased rapidly. As we will see, the pattern, for the 10 days starting on the day of the 100 th case, in Brazil as in many other countries, is exponential to a high degree, with varying growth rate values. During the whole period since the start of the pandemic until the end of March, weather in Brazil has been warm and humid, with temperatures frequently soaring above 32C and absolute humidity never below 10g/m 3 . This raises the possibility that warm and humid weather will not help contain the spread of the virus, as is typical of many viral diseases 0, 1,3,4,5 . There are various studies with early, somewhat conflicting, results on those relationships for Covid-19 6, 7, 8, 9, 10, 11, 12, 13 , as discussed in the National Academies of Science, Engineering and Medicine report on the issue 14 .

As indicator of the pace of expansion of Covid-19, we compute as estimate of the pace of the disease's growth the coefficient of the best exponential fit to the evolution curve in the period of 10 days starting when the region reached the 100 th case, denoted by k, for all 50 U.S. states and 110 countries with enough data (up to April 10 th ). The control variables are: the average temperature and absolute humidity values during the 25 days starting 15 days before the region reached the 100 th case; a couple of timeline variables, the date when the 100 th case occurred and the number of days from 1 st to 100 th cases (for countries and U.S. states); and the (log of) population density for the densest county in the state (for U.S. states). We find that the weather variables, albeit significant in single-variable models (but with opposite effects for the two groups of regions, U.S. states and countries), lose significance when the timeline and/or demographic variables are introduced. In the first case, for both groups of regions, the later the date of the 100 th case, the slower the initial pace of expansion of Covid- 19 . In the second one, the higher the population density indicator for the U.S. states, the faster the disease has spread in its initial phase. The model with both variables, only available in the case of U.S. states, furnishes the best estimates, explaining more than 50% of the variability of the growth rate k. Finally, the population density also impacts the start of the local transmission phase and how long it took to go from 1 st to 100 th cases, for U.S. states. The author acknowledges conversations with Aluísio Pinheiro, from the Department of Statistics, Unicamp, about statistical aspects of models employed.

Data on Covid-19 cases: databases of reported cases of Johns Hopkins University's Center for Systems Sciences and Engineering (CSSE/JH) 15 for US states the European Centre for Disease Prevention and Control (ECDC/EU) 16 for countries.

Weather data: NOAA Integrated Database (ISD) 17 of meteorological observations, through the R package "worldmet" 18 . For countries, we used the station nearest to the capital with 100% coverage, when available. In the case of the United States, since most initial cases were reported in the Seattle, in the counties around San Francisco and in New York, we used the averages from those cities. For U.S. states we used the data from station of main airport in the largest city in the state, the more likely place to have had the earliest outbreak. For Brazil, we took the averages for the locations given by the largest airports of São Paulo and Rio de Janeiro, which were the largest cities in states with at least 20 cases on the day of the 100 th case.

From NOAA weather data, the absolute humidity Ah (g/m 3 ) is computed from temperature T (C) and relative humidity Rh (%) using the following approximation formula derived from the Clausius-Clapeyron equation 8 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 27, 2020. The averages for both variables were computed during the period, for each country/U.S. state, of 25 days starting 15 days before the day the region reached 100 accumulated cases of Covid-19. This would cover the period of incubation of two weeks between transmission and evidence of the disease and that of 10 days used to estimate the exponential coefficient for the evolution of cases (see Eq. 2 below). Eq. 1, for constant relative humidity, approximates closely an exponential of T. We have run the regression models with both Ah and log(Ah) as the control variable for absolute humidity (see Figs. 1.d/e and 2.d/e for the distributions). The logtransformed version usually provided the stronger regression parameters (p-value and R 2 ) for Ah.

We have considered the 110 countries and that had at least 10 days of data starting when they had reached the number of 100 accumulated cases, end-date being at most April 10 th . All 50 U.S. states had already reached that stage by that date.

The population density of the densest county in each U.S. state was used as indicator of population density for the state (U.S. Census 2010 21,22 ). Covid-19 has typically started its spread in the city/county of highest population or population density, which almost always coincide (in most cases the city is part of a county, in some, the county is part of a larger city, like for New York city). We have tested the models with two variables: the population of the largest county 23 and the density population for the densest one. Both gave essentially the same general results regarding the regression models employed. We chose to use the latter, as it is easier to interpret regarding the transmission of infectious diseases 24 . For the regression models, we have employed the log of population density as control variable, as mathematical models of the dependence of the basic reproductive rate (R0) of infectious diseases and population density 24 show that one must use non-linear scaling when the range of densities is very wide, which is the case here (from 13 to over 25,000 pop/km 2 ). This causes the original distribution of population density to be very skewed, concentrated on values below 2,000pop/km 2 (Figs. 1.h/i, 3.a/b). As we analyzed various options of transformations, both in terms producing a more evenly spread distribution and for our application, the study of the early rate of growth of cases of Covid-19, it turned out that the log-transformation of population density showed not only good distribution properties (Fig. 3 .c), but proved to be well adapted as a control variable for the study of growth rate coefficient. We have not found a discussion about this in the literature, but one may argue that there is increasing dampening of social interaction as population density reaches very large values, and that the dampening follows a logarithmic behavior. As an example, one may think of New York city, where the population density is higher than 25,000 people/km 2 in some areas. That is certainly caused by people living in tall buildings, since, as the footprint of a building is very small, those living in various floors would be collapsed into a very small area, making the density artificially high. That would increase the interactions as people would likely meet in halls and elevators, but not by a rate indicated by the nominal density.

For each country and U.S. state included in the study, we considered the period of 10 days starting on the day the number of cases reached 100 and computed a simple linear regression for log(Ni) as function of day (t), where Ni is the number of accumulated observed cases for region i at a given day of the period, given by the model . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04.21.20074450 doi: medRxiv preprint

The growth rate indicator will be the estimated ki, the exponential coefficient, for region i. We have explored also 12-day and 15-day periods starting on the 100th-case day, but there was no relevant impact on the results of the models. We preferred the 10-day model as it provides the best fitting parameters for model in Eq. 2. We could also have estimated the growth rate using endpoints, but we wanted to be able to check the fitting of evolution curves to an exponential. Also, when there are jumps in the reported number of cases, which happened often in the early phase of the disease, the endpoints estimate tends to also jump, while the regressed estimate follows a smoother path (see Fig. 6 .b for the case of New York State). Thus, even if one of the endpoints in our 10-day window had shown a sudden jump in the number of cases, that would be smoothed out by our methodology. The choice of 100 accumulated cases to start the analysis is related to when it is expected that local transmission would be under way, as that is when an exponential increase is expected to start. The point is that the first 100 cases for each country or state typically happen within one or a few communities, so that one expects that local transmission is already at work. It has been estimated that when a U.S. county has reached 20 cases, the chances that there is ongoing local transmission are 99% 25, 26 . For example, for the United States, the date of the 100 th case was March 3 rd . On that date, Washington and California led the country, with 27 and 25 cases, respectively. Within the 10-day period used in the estimation, New York had also become a hotspot for cases, and on the last day of the 10-day period, it was second, and other states had surpassed 20 cases. Similarly, for Brazil, the cities of São Paulo and Rio de Janeiro had already had at least 20 confirmed cases when Brazil reported its 100 th case.

There are two hypotheses for the estimate ki to be representative of the speed of transmission of the novel coronavirus in a community. First, that the way the cases were accounted for did not change significatively along the 10-day period used for the models. The evidence from the estimates corroborates that assumption, as the values of R 2 for regressions (Eq. 2) are mostly above 0.95 (Figs. 1.b, 2.b). We discuss the sensitivity of our models w.r.t. the R 2 values of regressions given by Eq. 2, in the section on models. The second one is that the number of positively tested cases of people with various levels of symptoms, or of hospitalized cases, are constant fractions of the total number of people infected (of which, many, are asymptomatic). Some countries have tested all people with any symptom, others only those hospitalized (the case for Brazil), and others with in-between levels of symptoms. The model (Eq. 2) employed to estimate the coefficients ki, under those two hypotheses, would not depend on the alternatives considered by different countries (or states) regarding how they measured confirmed cases.

Using the data described above, we consider the following linear regression model for the U.S. states:

where k is the exponential growth rate coefficient estimate (Eq. 2), AvTemp is the average temperature, AvAh is the average absolute humidity, Day100 is the date when region reached 100 cases, Days1to100 is the number of days from 1 st to 100 th case and PopDens is the population density of county with highest such value in each state. For the collection of 110 countries we used the same model without the last regressor. We also tested the model for Ah, but log(Ah) showed consistently better regressing properties than the linear case.

Software: R, ggplot2 and worldmet packages.

x / n f 9 D y X n X J r e S q 8 9 d K T K p S W U d s R q 3 7 y Z N 9 J o s k p w b a L o O g j v 3 X / w 8 N H K 6 t r j J 0 + f r T c 2 N o + 1 r F X O R r k U U p 1 m V D P B S z Y y 3 A h 2 W i l G i 0 y w k 2 w y s P G T K V O a y / L I z C p 2 X t B x y S 9 5 T g 1 S a W M x I e / I I I 3 I S 9 z j d n 9 6 x I r K O X t t I c e t / r R / t e P 8 b v u Q z t J 5 H E U L 9 J M E q Z 6 l N H J G e n q Q v n K y o a w O W a m t M p l S x S r N h S z J L k l 2 l 4 u 0 u j t p o x l 1 I m f k L o g 9 a I K 3 o W

< l a t e x i t s h a 1 _ b a s e 6 4 = " D 8 I T C q t g + j I Q 1 S a + h K e 6 k B z m k v M = " > A A A C / H i c j V H L S s Q w F D 3 W 9 3 v U p Z v g I I w P h s 4 o P h a C 4 M a V K D g q O F L a G s c w m b a k q S C i f + L O n b j 1 B 9 z q W v w D / Q t v Y k c U E U 1 o c 3 P u O S e 5 u U E i R a p d 9 6 X D 6 e z q 7 u n t 6 x 8 Y H B o e G S 2 M j e + l c a Z C X g t j G a u D w E + 5 F B G v a a E l P 0 g U 9 1 u B 5 P t B c 8 P k 9 8 + 4 S k U c 7 e r z h B

7 X q k E 6 R 9 C l S M k y T J i a e o t i c x m w + s 8 4 G / c 3 7 w n q a u 5 3 T G u R e L U I 1 T g n 9 S 9 d m / l d n a

CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2020. To visualize how well the exponential model for the growth rate k fits the actual trajectories, Fig. 4 displays the exponential prediction and the curves for actual accumulated number of positive Covid-19 cases, for selected U.S. states. For countries, the behavior is similar, we omit the graphs. Figs. 5.a-d show the daily behavior of the two weather variables along the 25-day periods for a choice of U.S. states and countries. Both variables show wider range of values for the U.S. states, compared to the countries displayed. In some cases, the range of temperatures was more than 15C, and over 10 g/m 3 for absolute humidity. For countries the ranges for individual countries are typically smaller, with a few exceptions. The ranges tend to be narrower for regions with higher values, for both variables. It is important to observe that there is no trend regarding when the averages were computed and the actual values for weather variables. Fig. 7 shows the evolution curves of cases for U.S. states for the 10-day period used to estimate the growth rate k, by temperature. It shows that there is a wide range of values of k, and no obvious trend in the temperature averages w.r.t. k. The graph for countries, not included, is similar.

It is possible to relate the values of k with the basic reproduction number, R0, the estimate of the average number of new infections generated by an infectious person, which has been subject of much investigation since the start of pandemic 27 . It has been estimated that R0 for the early outbreak in China was between 4.7 to 6.6, derived from a growth rate of 0.29 28 . Our estimate for the growth rate k for China is 0.334, but if we start on Jan. 13 th (instead of Jan. 19 th ), and compute k using the next 12 days of evolution, we get k=0.291. If the value of k is reduced from 0.29 down to 0.14, their 28 growth estimate for early February (our estimated k for Feb. 1-10 is same, 0.137), R0 would drop by 50-59%, to a range between 2.3 to 3.0, certainly due to the containment measures put in place in China in late January. The estimates of k for the U.S. (10-day window Mar. 3-12) is k=0.292 and, for New York State (10-day window Mar. 8-17), is k=0.291, thus both coincide with their 28 estimate for China, and all the above results for R0 apply equally. In the case of New York, Fig.  6 .b indicates that it reached k=0.14 using the 10-day window starting on Mar. 24 th , so that, on the 10 days between that date until Apr. 2 nd , R0 had an average value between 2.3 and 3.0, as for China in early February. New York reached the peak for k on the 10-day window Mar. 14-23, at 0.441 (Fig. 6.b) , implying that the evolution of k during the second half of March was already showing the impact of social distancing measures (schools were closed on Mar. 16 th , but other social distancing measures had been adopted earlier), by reducing the number of new infections caused by each infectious person from around 5.6 to about half of that, in a 16-day period (from 10-day window Mar. 8-17 to that of Mar 24-Apr. 2). Fig. 6 .a shows the evolution of k for a few U.S. states around the date when they reached the 100th case. Most states were already showing a decreasing value of k on the period around the date they reached the 100 th case, especially those for which Day100 came later in March, which reinforces the idea that social distancing practices were already in place in the second half of March, in most states. This will be discussed further in the next section, as we study the dependence of k on timeline variables. Table 1 presents the results of regressions for the exponential growth constant (k in Eq. 2) using the model given by Eq. 3 and its sub-models: AvTemp is the average temperature, AvAh is the average absolute humidity, Day100 is the date when region reached 100 cases, Days1to100 is the number of days from 1 st to 100 th case and PopDens is the population density of densest county for each state. We also include regressions for the timeline variables w.r.t. population density. All parameters are computed for 95% CI. We omit the values of intercepts, as they are not relevant for the analysis. We include F-statistics p-values for the multivariable cases and Shapiro-Wilk test results for the residuals of all regressions with significative coefficients. In the case of multivariable regressions, R 2 is the adjusted value. There are two sets of models, the first one with all states and a second one with the states which are outliers for at least one of the variables removed (list is below Table  1 ). Figures in Table 1 indicate that the qualitative results are basically the same, even though the second set of models provide stronger regression parameters for all cases. The last model (9#), which gives the association between the lag between Days1to100 and population density, has only the version for the restricted set of states, as the one for all states showed very poor parameters.

From the complete models (1/1#), it is observed that the date when state reached 100 th case and the log of population density are the only significant variables. One observes that in those two models, the coefficients of the weather variables reverse signs, not unexpected since their confidence intervals include the zero value. Removing one or two of the other three variables did not change that, only Day100 and log(PopDens) stayed significant. Using those variables, model 2 explain 53% (68% in the restricted model) of the variability of k, and the residuals satisfy normality to a good level. The coefficient for Day100 indicates that if a state reached 100 cases 10 days after another one, k would be reduced by about 0.053 point (0.074 in the restricted model). For example, if k=0. 25 for the first, one would expect about k=0.197 (k=0.176) for the second one. That would translate into increasing the time to double the number of cases from 2.8 days to 3.5 days (3.9 days). As an example, New York, which reached 100 cases on March 8 th (actually, 106 cases), had 140,000 confirmed cases 30 days later (April 7 th ), so that the (average) k during the period was about 0.25. As the estimated k for New York for the first 10 days starting on March 8 th was 0.291, it means that there was attenuation of the pace of spread of Covid-19 since at least March 17 th , which is confirmed by its evolution curve and varying k (Fig.  6.b) .

The population density coefficient of model 2 results in that doubling the population density implies that the value of k increases by 0.0162*log(2) = 0.0112. For example, the states of Iowa and Missouri, which reached the 100 th case on Mar. 23 rd , had k's estimated as 0.195 and 0.235, respectively, thus a difference of 0.040. Their population densities are 290 and 1,991 people/k 2 , respectively, which imply an estimated increase of 0.031 point to the value of k, or about 75% of the actual estimate. This is within the expected confidence intervals of the models, as they predict at most 68% of the variability of k (adjusted R 2 , model 2#).

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04.21.20074450 doi: medRxiv preprint No other 3-or 2-variable models with k as dependent variable provided new relevant results. The 1-variable models (3/3# to 7/7#) show significative results, except for model 6 (case with relevant outliers, as discussed). The possible surprise is that both average temperature and average absolute humidity have positive coefficients, individually, which is confirmed by the scatter plots in Fig. 8 . Those behaviors are actually a consequence of the fact that many colder states, especially the less densely populated ones, started their outbreaks later, thus affecting these results, according to models 2/2#. Figs. 8.a-e include scatter plots and trend lines for k and the control variables, to illustrate the above results, with trend lines and parameters from the models for all states.

Models 8/8# show that (log of) population density has significant impact on when a state reached 100 cases. This is expected, as not only lower population density slows the pace of Covid-19 infection, as we have seen, but one may also think that the introduction of the virus would have been delayed in such states (which, in fact, is the case, checking the date of first reported case for each state), and thus it would be the case that social . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04.21.20074450 doi: medRxiv preprint distancing was already occurring when that happened. The estimate given by the coefficient in model 8 is that doubling the population density would make the day of 100 th happen about 1.9 day earlier. Fig. 9 .a shows how strong the association is, with the exception of the state of Washington, which was the first one to have an outbreak and thus does not follow the general trend, as its densest county, which had the very first cases in the U.S., has relatively low population density (King County, 352 pop/km 2 ). Relaxing statistical requirements, model 9# shows that the time lag between first and 100 th cases is reduced by about 0.6 day when the population density doubles. Fig. 9 .b shows that the association in this case is in fact weak, not just caused by outliers. Still, there is a clear trend of shortening the period from case 1 to case 100 as population density increases (New York is not an outlier in this case, as its value was 5 days, but we kept the same set of states in the restricted set for consistency). Table 3 shows regression results for the group of 110 countries in our database, starting with complete model 1, which is same as given by Eq. 3, dropping the demographic variable. Analogously to the U.S. states' case, the variable Day100 shows statistical significance in the complete model. We have two versions, model 1 one with China and model 2 without. The reason is that China is a clear outlier for that variable (see Fig. 2 .f), having reached the 100 th case on Jan. 19 th , more than a month before the second country to do so (South Korea, on Feb. 20 th ). In any case, the results are essentially the same. In the complete models (1,2), if we relax the statistical requirements for significance, temperature would contribute positively and absolute humidity, negatively, to the estimate of k. The same is true for model 3, for these two variables only. This behavior is similar to that of the weather variables in the case of U.S. states (Table 1 , models 1/1#, and model with AvTemp and log(AvAh) as control variables, not included in Table 1 ).

The 1-variable models show what some studies have reported, that temperature and absolute humidity go along with faster pace of spread of Covid-19, but one must note that the residuals, in both cases, show low pvalues in the Shapiro-Wilk test for the residuals (see Fig. 10 ). Until we have more countries satisfying the criterion to be included in the study, we cannot say, for the group of countries with data available for our models to work, much about weather variables' impact on the early pace of Covid-19 infection. One can say, though, that, similarly to the case for the U.S. states, the later a country reaches the 100 th case, the lower the expected pace of spread of the disease is. So far, that variable explains about 30% of the variability of the growth coefficient k (models 6/6#). The value for the coefficient in this case is similar to that of the case of U.S. states, the value of k is reduced by about 0.05 by delaying the date of when country reaches the 100 th by 10 days. The variable Days1to100 did not show relevant association with the value of k for countries, even relaxing statistical requirements. Fig. 11 presents the scatter plots for k and relevant control variables.

All the above models were run for subgroups of countries/U.S. states with values of R 2 (model of Eq. 2) above the levels of 0.90, 0.95 and 0.97. There were no relevant qualitative differences between those models and the ones employing all countries/U.S. states, just small coefficient and parameter variations.

• Results for the 50 U.S. states indicate that weather variables (average temperature and absolute humidity for the period of 25 days starting 15 days before the date of 100 th case), once one takes into account timeline of the diseases evolution and demographic information, have little effect on the rate of expansion of Covid-19, at least in the early phase of the outbreak, considered in this study as the 10 days starting when state reached 100 cases. Table 2 indicates how relevant it is to try to reduce the value of k to keep number of cases from exploding. Fig. 6 .b for New York State illustrates that in a real case. • Eq. 5 shows the effect of the population density of the county of highest value, for a state: doubling the population density variable would imply an expected increase of 0.011 point on the value of k. • Regressions 8/8# in Table 1 indicate that the population density also impacts significantly when the 100 th case occurred, by making it to occur earlier if population density is higher (1.9 day by doubling the population density). And, relaxing statistical significance requirements, model 9# indicates that doubling the population density reduces the time between the 1 st and 100 th cases by 0.6 day.

• For the 110 countries in our database, the date when the 100 th case occurred is the only significant variable for the complete model (Table 3 , models 1/1#). • Using the individual regression for that variable, for each 10 days of delay in that occurrence, one reduces the value of k by 0.045 (0.051 for model 1#), which is about the same as in the case of the U.S. states. About 33% of the variability of k is explained by the date the country reached the 100 th case. • Individually, the weather variables show significative negative association with the pace of Covid- 19 expansion, but the models suffer from statistical limitation, as the normality of residuals is not guaranteed ( Table 3 , models 3-5).

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2020. The results summarized above indicate that the weather variables considered in this study do not seem to be relevant determinants for the pace of early spread of Covid-19. Even when temperature and absolute humidity are considered in isolation, the results for U.S. states and for the group of countries in our database show opposite behavior. The population density of densest county U.S. states imply faster paces of expansion of the disease. This result is expected, as higher population density implies various characteristics of communities that would help a high level of contact among people. Another aspect is that the denser counties include or are part of cities with higher levels of circulation of outside travelers. But even taking those aspects into account, the best model(s) for U.S. states includes the effect of the date when the 100 th case was reached, which helps explain the variability of k. For countries, that is also the most important factor to explain the growth rate of the disease. This is likely a result of measures that people and governments started to take as the seriousness of the disease became more evident. An alternative explanation would be that the virus is losing strength, but there is no evidence of that, at least so far.

Some final comments on the question if warmer and more humid weather would help reduce the dissemination of Covid-19. Besides the results of models in this study, it is relevant to look at cases of countries/U.S. states for which one has both higher temperatures and absolute humidity values and also higher levels for the coefficient k. Table 4 presents data for countries/U.S. states with average temperatures above 15C and k above 0.18, which is the average of k for all countries and U.S. states (removing the U.S. from the countries' database). It also includes the R 2 estimates for the determination of k and the dates when 1 st and 100 th cases were reported. All regions in the table had the date of 100 th case on Mar. 10 th or later, so that they are not a group of countries with very early Day100 variable, which could have impacted positively its value of k, as predicted from our models. Regions with highest values of k, above 0.25, include, in descending order of k: Florida, Brazil, Louisiana, Ecuador, Texas, Nigeria, South Africa, Arizona and Georgia (state). Of those, Florida, Brazil, Louisiana and Nigeria had average temperatures above 20C, and average humidity values above 11g/m 3 , with Nigeria's and Brazil's above 15g/m 3 . Except for South Carolina and Mexico, all regions . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 27, 2020. There are also cases of countries in the database with high average temperatures and levels of absolute humidity and low values for k, but we think there are higher chances that there are possible reasons for that: one possibility is the lack of reporting the development of Covid-19 by local authorities; another, that some countries cannot apply enough testing and, therefore cannot maintain a reliable sequence of reports, possibly impacting the rate of expansion negatively. Anyway, the existence of many countries and U.S. states (and Brazilian states) with warm and humid weather and fast expansion rates of Covid-19 indicates that the way for countries and regions to keep the evolution of expansion of the disease under control, within the capabilities of their health systems (at least until the development of vaccines or effective therapeutics is successful), is to keep employing social distancing policies, which is predicted by modelling 29 and seems to be working effectively in all countries and states that have adopted them. Figure 1 .a-i. -Boxplots for regression variables and for R 2 for the exponential regression for k (Eq. 2). U.S. states, by region. Boxplot for population density (h) does not include value for New York (26,822 persons/km 2 ), to allow for better visualization of distribution. Data: CESS/JHU, NOAA/USDC, U.S. Census (2010). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04.21.20074450 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04.21.20074450 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 27, 2020. . https://doi.org/10.1101/2020.04.21.20074450 doi: medRxiv preprint

Influence of meteorological factors and air pollution on the outbreak of severe acute respiratory syndrome

Association between viral seasonality and meteorological factors

Climate factors and incidence of Middle East respiratory syndrome coronavirus (2019)

Potential Factors Influencing Repeated SARS Outbreaks in China

Dozens of diseases wax and wane with the seasons. Will Covid-19?

The role of absolute humidity on the transmission rates of the Covid-19 outbreak

Temperature, humidity and latitude analysis to predict potential spread and seasonality for Covid-19

Will Coronavirus Pandemic Diminish by Summer?

Climate affects global patterns of Covid-19 early outbreak dynamics

High temperature and high humidity reduce the transmission of Covid-19

Temperature, humidity, and wind speed are associated with lower Covid-19 incidence

Temperature dependence of Covid-19 transmission

Covid-19 transmission in Mainland China is associated with temperature and humidity: a time-series analysis

Rapid Expert Consultation on SARS-CoV-2 Survival in relation to Temperature and Humidity and Potential for Seasonality for Covid-19 Pandemic

Atmospheric Thermodynamics

The Computation of Equivalent Potential Temperature

The scaling of contact rates with population density for the infectious disease models

Probability of current Covid-19 outbreaks in all US counties

The reproductive number of Covid-19 is higher compared to SARS coronavirus

High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. Emerg Infect Dis

Covid-19 Modelling: the Effects of Social Distancing

Limitations: the above results are preliminary in scope and depend on the quality of available data. As more countries and change in seasons bring new information for analysis, models may be updated and further developed.