key: cord-0788248-rhh227xf
authors: Vilar, J. M. G.; Saiz, L.
title: The evolving worldwide dynamic state of the COVID-19 outbreak
date: 2020-11-30
journal: nan
DOI: 10.1101/2020.11.26.20239434
sha: 87d32d1dddcf8ac80026dca923dcfcb22db75b0f
doc_id: 788248
cord_uid: rhh227xf

The dynamic characterization of the COVID-19 outbreak is critical to implement effective actions for its control and eradication but the information available at a global scale is not sufficiently reliable to be used directly. Here, we integrate multiple data sources through dynamical constraints to quantify its temporal evolution and controllability around the world and within the United States. Overall, the numbers of actively infectious individuals have remained high beyond targeted controllability, with worldwide estimates of 10.24 million on November 24, 2020, totaling in 266.1 million cumulative infections growing at a rate of 11.12 million new infections per week. The actively infectious population reached a local maximum of 7.33 million on July 16, 2020 and remained virtually stagnant at a global scale, with growth rates for most countries around zero that compensated each other, until reverting to net growth on September 22, 2020. We validated the approach, contrasting with prevalence data and the effects of nonpharmaceutical interventions, and we identified general patterns of recession, stabilization, and resurgence. The diversity of dynamic behaviors of the outbreak across countries is paralleled by those of states and territories in the United States, converging to remarkably similar global states in both cases. Our results offer precise insights into the dynamics of the outbreak and an efficient avenue for the estimation of the prevalence rates over time.

The global spread of the COVID-19 outbreak has had a major global impact. As of November 24, 2020, there have been 1.67 million reported deaths and 72.4 million confirmed cases [1] . Massive travel restrictions, imposed quarantines, lockdowns, and other nonpharmaceutical interventions (NPIs) around the world have been able to slow down the progression of the outbreak, but their gradual lifting has already led to its deadly resurgence in multiple areas [2] . Currently, it is still unclear how best to proceed to balance the economic, societal, ethical, and health trade-offs present. The most widespread quantification, based on the basic reproduction number, is insufficient to address the different trade-offs [3] . It indicates whether the outbreak is growing up or dying out, but it misses crucial information, including the timescales of the processes, the latent potential for a resurgence of the outbreak, and the possibility of actively controlling and tracing the infectious population.

Obtaining a detailed dynamic characterization of the outbreak, however, has proved to be particularly challenging and has been achieved only over small controlled populations. This characterization involved testing for the causative virus SARS-CoV-2, identifying contacts and the infection time, and following up the clinical evolution [4] [5] [6] [7] . The available epidemiological information has not been sufficiently reliable to be used directly and the detailed characterization is not feasible at a global scale. Methods based on compartmental models [8] or Bayesian approaches [9] have relied on location-specific information that is not readily available at a global scale.

Here, we uncovered optimal dynamical constraints to integrate the available detailed clinical information with demographic data and epidemiological curves, and we applied them to the quantification of the key determinants of the temporal evolution and controllability of the COVID-19 outbreak, including the number of infectious individuals, the growth rate of the infectious population, and the size of the infected population. We considered explicitly countries in the world and states and territories within the United States (US) with at least 30 COVID-19 reported deaths, which covers 97.4% of the world and 99.9% of the US population.

The approach considers the dynamics of the infectious population, ! ( ), at time described through the expression

which establishes the definition of the (per capita) growth rate " ( ). This expression is completely general and independent of the underlying dynamics of the infection.

The underlying infection dynamics dictates the relationship of " ( ) and ! ( ) with the different epidemiological quantities. Based on an infection-age structured mathematical description [10, 11] , we have developed an approach to uncover the optimal dynamic constraints for these relationships in terms of delays and scaling factors (Supplementary Information).

We find that ! ( ) is optimally related to the rate of increase of the expected cumulative deaths, # , at a later time + # according to

where is the infection fatality rate and " is the average generation time. This equation explicitly takes into account that, on average, an infectious individual within ! ( ) has been infected at time

($ ! and potentially dies with probability at a time ! + )# after infection. Here, " ( is the variance of the generation time, and ! and )# are the incubation and symptom onset-to-death average times, respectively. The values of these characteristic times have been estimated in days as A general assumption of the approach is that the for each age group remains the same for all locations [5] and that the overall for each location is obtained as the average over its specific population's age distribution.

The growth rate is obtained directly from Eqs. (1) and (2) as

which is related to the time-varying reproduction number, * , through the Euler-Lotka equation (Supplementary Information).

The expected cumulative number of infected individuals at a time , + ( ), follows from

which is obtained also from the dynamical constraints (Supplementary Information).

To compare with prevalence studies, we used the dynamical constraints (Supplementary Information) to obtain the relationship of the expected number of seropositive individuals, ,-( ), with the infected population,

where ,-is the average seroconversion time after infection. We also obtained the relationship of the expected number of positive reverse transcription polymerase chain reaction (RT-PCR) testing individuals, +-( ), with the infectious population,

where +-is the average time for testing positive after infection and Δ +-is the average number of days of positive testing. The values of these additional characteristic times have been estimated in days as ,-= 13.4, +-= 14.4, and Δ +-= 20 from clinical studies [13] [14] [15] [16] (Supplementary information).

Eqs. (1) -(4) completely characterize the dynamics of the outbreak from the expected cumulative deaths, the age structure of the population, and the general clinical parameters of the infection. We used a workflow (Fig. 1A ) that incorporates explicitly the death counts compiled by the Johns Hopkins University Center for Systems Science and Engineering [1] , the age structure reported by the United Nations for countries [17] and the US Census for states and territories [18] , previously estimated aged-stratified [5] , and other previously estimated clinical parameters [5, [12] [13] [14] [15] [16] . The expected number of daily deaths was inferred using density estimation from the death curves after preprocessing to minimize reporting artifacts (Supplementary Information). Eqs. (5) and (6) were used to set the appropriate scale and delay to compare with seroprevalence studies.

To validate the approach, we contrasted the estimated infectious and infected populations with the results from available antibody seroprevalence and RT-PCR testing studies for countries in the world and locations in the US with at least 30 reported deaths (Fig. 1B) . The observed and estimated values agree with each other within the 95% credibility intervals (CrI), with a 1.54-fold accuracy over almost a 1,000-fold variation and a correlation coefficient ( ) on a logarithmic scale of = 0.94.

We combined into a separate analysis (Fig. 1C ) the data of two additional comprehensive antibody seroprevalence studies within the US, which included 49 [19] and 38 [20] states with nonzero prevalence values. For the states present in the two studies, we considered their average values. The estimated values agree with the observed prevalence with 1.62-fold accuracy and = 0.81. The agreement for the combined data is better than for the data of each of the studies independently (1.65-fold accuracy and = 0.74 for one study [19] and 1.74-fold accuracy and = 0.77 for the other study [20] ) and better than the agreement of both studies with each other ( = 0.55 for the 38 states common to both studies), indicating that the estimations of the approach fall within the observed variability of the prevalence studies. Indeed, the approach is effectively unbiased collectively, with the (geometric) average of the estimations being just a factor 1.12 (globally) and 1.14 (US) larger than that of the observations.

Overall, the validation of the approach shows that it can reproduce the available prevalence studies over a factor 615.0 between minimum and maximum values for countries and states with 72.0% (globally) and 79.6% (US) of the estimates within a factor 2 of the observed values, and with 100.0% (globally) and 93.9% (US) within a factor 3. There are no countries and only three states with values outside the factor 3 boundary. In the cases of Rhode Island and New Hampshire, the estimated infectious population is larger than the ones reported by one study [20] , which is consistent with the observed age-specific seroprevalence biased to older populations. In the case of Oregon, the estimated infectious population is smaller than the ones reported [19, 20] , which is consistent with the observed age-specific seroprevalence biased to younger populations [20] . In the case of Oregon, in addition, only 338 of the 1123 excess deaths during the outbreak before the collection of the specimens for analysis were attributed to COVID-19 [21] .

We also validated the ability of the approach to capture the effects of NPIs (Fig. 1D) . The inferred timings of the peak infectiousness (maximum infectious population) are concordant with the dates of the major early country-wide lockdowns in Europe [9] , with an overall average deviation of 0.0 days between lockdown and peak dates and a mean absolute deviation of 2.4 days. This concordance also indicates that there is only a small variability in the average timing between infection and death among those countries.

We analyzed explicitly the time evolution of the growth rate, the infectious population, the cumulative number of infections, and the relationship between growth rate and infectious population for all countries and states with at least 30 deaths (Supplementary Fig. S1 and Table 1 ).

The characterization of the dynamics in terms of the growth rate and infectious population ( Supplementary Fig. S1 ) shows that there are prototypical types of behavior, or motifs (highlighted with representative examples in Fig. 2 ). Locations either contained (Figs. 2A-2C) or amplified (Figs. 2E-2F) the outbreak in its initial stages. In the case of initial containment, the behavior branched subsequently into contained sporadic resurgences ( Fig. 2A) , uncontrolled resurgence of the infectious population over the initial maximum infectiousness (Fig. 2B) , and sustained decrease of the outbreak (Fig. 2C) . The specific behavior depended on the success of the measures implemented, e.g. targeted control and moderate lockdowns, and their subsequent relaxation [22] . In the case of initial amplification, the dynamics proceeded in diverse ways, including an increasing infectious population converging to zero growth (Fig. 2D) , a fast-evolving infectious population switching from positive to negative growth (Fig. 2E) , and fast convergence to subsequent sustained residual growth (Fig. 2F ). In general, the locations that reached a substantial negative growth rate are those that implemented long-term strict lockdowns, whereas zero or small positive growth rates correspond to intermediate measures with partial restrictions [22] . In many cases, lifting restrictions

The dynamical constraints we have obtained through a detailed infection-age mathematical description of the outbreak allowed us to find the optimal time delays and scaling factors to connect the evolution of the reported death counts over time with those of the infectious and infected populations. Overall, integrating these constraints through a workflow with essential preprocessing steps showed that the approach can consistently infer the precise timing of NPIs and estimate prevalence data across countries in the world and locations in the US. A prominent feature of the approach is its ability to provide reliable results even for low death counts, which overcomes the major limitations of choosing between unreliable infection case data (highly dependent on testing rates) or noisy death counts as input to the inference problem [9] .

The approach assumes a general age-stratified . In general, these quantities are expected to depend potentially on specific features of the population and the medical care facilities available. The available studies show a minimal variability among different countries and other locations that reported on prevalence [23] . It also assumes an age-uniform exposure (attack rate), which is consistent with data for other respiratory diseases [5] and holds to a large extent when there is information available for COVID-19 [19, 24, 25] . We have also assumed a constant generation interval typical of non-confinement locations, which has been observed to shorten in some cases by NPIs [26] . Prevalence studies can also depend on the diminishing antibody levels after infection [14, 27] , collecting and processing specimens for analysis [28] , and potential biases towards specific population groups [19] . In addition, there might be a degree of under-reporting of COVID-19 deaths, as suggested by excess mortality not attributable to other causes than COVID-19 [21] . Our results show that all these potential deviations on the assumptions, on the data, and on prevalence studies collectively have only a restricted impact on the approach, with 72.0% (globally) and 79.6% (US) of the estimates within a factor 2 of the observed values and 100.0% (globally) and 93.9% (US), within a factor 3. This accuracy of the estimations is highly remarkable in the context of the observed prevalence spread over a factor 615.0 between minimum and maximum values, ranging from 0.02% to 12.30%.

The analysis in terms of the growth rate-infectious population trajectories has revealed universal types of behavior of the outbreak for countries around the world and locations within the US. This information can be used to anticipate the response to enacting, modifying, or lifting NPIs. The most marked example is the response to strict lockdowns across countries in Europe (e.g. United Kingdom, Italy, Belgium, Spain, France, Germany, Austria, Netherlands, and Switzerland) and Northeast states in the US (e.g. New York, New Jersey, Massachusetts, Pennsylvania, and District of Columbia). They followed the same type of behavior (fast decrease of the growth rate from high to sustained negative values) until major restrictions were lifted in the European countries [22] , turning the growth rate of their infectious populations into highly sustained positive values. Our results show that those European countries had small actively infectious populations but not as small as required for targeted controllability. They also show that most Northeast states in the US are in a similar resurgence path but at much earlier stages, with many of their NPIs still in place and with markedly smaller growth rates, which makes their reaching as deadly a resurgence as in Europe still avoidable.

At a global scale, the outbreak has reached a net growth rate fluctuating near zero values but with a high infectious population. A similar state has also been present in the US. This type of fluctuating states, with long stagnant overall infectious population periods and median growth rate close to zero, is expected of bounded unsynchronized fluctuating populations [29] , such as those from uncoordinated locations aiming at just preventing an unrestricted expansion of the outbreak rather than at its eradication. This widespread feature is present for both countries in the world and locations within the US. Considering the NPIs implemented [22] , our results show that there have been locations with interventions to move the growth rate towards zero values and that there have been locations switching on and off severe measures to decrease temporarily the active infectious population. Despite not growing substantially since reaching its highest local maximum of 7.33 million active infections on July 16, 2020, the high value of the global infectious population attained is currently leading to 11.12 million new infections per week that replace the same ballpark number of individuals that stop being infectious. This high turnover makes the control of any potential resurgence extremely costly.

At a local scale, our results show a highly variable temporal evolution of the infectious populations, both over time for each location and across locations. Having an up-to-date estimate of the infectiousness of populations would allow policymakers to better implement travel planning among locations. The approach has proven to accurately track the effects of local NPIs. We also expect it to play a fundamental role in evaluating the progress of vaccination efforts, especially considering the challenges present, such as waning immunity levels and pathogen evolution [ Fig. 1 . The approach consistently estimates prevalence and the timing of NPIs. The approach (A) has been validated with prevalence data of the infectious (PCR-RT testing) and infected (antibody testing) populations at a global scale (B) and for states within the US (C). The continuous black lines (B, C) represent the perfect prediction (identity function denoted by id(⋅)) with the parallel dotted/dashed lines indicating the fold accuracy. Global data were obtained from sources described in Supplementary Table S1 . Several locations have estimates for multiple date ranges. State data correspond to two studies with specimen samples taken primarily on the first two weeks of July [19] and of August, 2020 [20] . The inferred timings of the peak infectiousness are plotted against the dates of the major country-wide lockdowns in Europe [9] (D). The continuous orange line represents perfect concordance, and the parallel dotted/dashed lines indicate the mean absolute error. 

We consider the infection-age structured dynamics of the number of individuals ! ( , ) at time who were infected at time − given by

with boundary condition ! ( , 0) = ( ).

(2)

Here, is the time elapsed after infection, referred to as infection age, and ( ) = ∫ ! ( , ) ! ( , )

. / is the incidence, with ! ( , ) being the rate of secondary transmissions per single primary case [10, 11] .

The solution is obtained through the method of characteristics as

for ≥ and ! ( , ) = 0 for < . The resulting renewal equation, ( ) = ∫ ! ( , ) ( − )

. / , is used as the basis for the definitions of the reproduction number * = ∫ ! ( , )

. / and the probability density of the generation time "

The infectious population is given by

which considers that an individual remains potentially infectious after a time from infection with probability

Therefore, in terms of the incidence, we have

Additionally, we consider the expected cumulative number of infections, + ( ), expressed in terms of the overall accumulated incidence as

and the dynamics of the expected cumulative deaths, # ( ),

which takes into account that deaths occur with probability given by the infection fatality rate, , at times after infection given by the convolution of the probability density functions of the incubation, ! , and symptom onset-to-death, )# , times.

Similarly, the variation of the expected number of seropositive individuals at a time , ,-( ), is expressed as

where ,-is the probability density function of the seroconversion time after infection, and the expected number of individuals with positive RT-PCR testing -+ ( ), as

where +-( ) is the probability that an infected individual would test positive at a time after infection.

To obtain a closed set of equations from the expressions involving an integral ∫ ( ) ( − )

. / of a function with the incidence , we perform a series expansion of the incidence around the infection-age time 6 ,

with the value of 6 chosen as

The specific value of 6 leads directly to a first-order approximation,

because ∫ ( )( − 6 )

. / = 0.

Using this approach, we obtain

where " and " ( are the average and variance of the generation time, respectively, and

where ! and )# are the incubation and symptom onset-to-death average times, respectively, which leads to

up to ( 77 ). These expressions are used to estimate the infectious population ! ( ) from the daily deaths,

($ ! and the cumulative infected population + ( ) from the cumulative deaths, # , at time + ! + )# , leading to

Similarly, we obtain ,-( ) = ( − ,-) + ( 77 ) (20)

where ,-is the average seroconversion time after infection and Δ +-is the average number of days an individual tests positive, which up to ( 77 ) leads to 

which is used to validate the values of the estimated infectious population ! ( ) from RT-PCR testing results, +-, at time

($ ! and the cumulative infected population + ( ) from seropositivity testing, ,-, at time + ,-.

The raw cumulative death counts over time, 9 ( ), are obtained from the Johns Hopkins University Center for Systems Science and Engineering [1] for countries and for US locations.

The daily death counts Δ 9 ( ) = 9 ( ) − 9 ( − 1) are considered to contain reporting artifacts if they are negative or if they are unrealistically large. This last condition is defined explicitly as larger than 4 times its previous 14-day average value plus 10 deaths, Δ 9 ( ) > 10 + 4 × : :; W 9 ( ) − 9 ( − 14)X, from a non-sparse reporting schedule with at least 2 consecutive non-zero values before and after the time , Δ 9 ( ) ≠ : < W 9 ( + 2) − 9 ( − 3)X.

Reporting artifacts identified at time are considered to be the result of previous miscounting. The excess or lack of deaths are imputed proportionally to previous death counts. Explicitly, death counts are updated as

with 9 ( − 1) =>*?@A*=8 = 9 ( ) − : B W 9 ( − 1) − 9 ( − 8)X for all ≥ 0. In this way, Δ 9 ( ) is assigned its previous seven-day average value.

The expected daily deaths, Δ # ( ), are obtained through a density estimation multiscale functional, 8= , as Δ # ( ) = 8= WΔ 9 ( )X, which leads to the estimation of the expected cumulative deaths at time as # ( ) = 9 ( / ) + ∑ Δ # ( ) * >C* % %:

. Specifically,

where :; (⋅) is a centered moving average with window size of 14 days and & (⋅) is a centered rolling average through a Gaussian window with standard deviation .

We consider an average delay of two days between reporting a death and its occurrence. This value is obtained by comparing the daily death counts reported for Spain [1] and their actual values [31] from February 15 to March 31, 2020. The values of the root-mean-squared deviation between reported and actual deaths shifted by 0, 1, 2, 3, and 4 days are 77.9, 58.4, 38.5, 58.7, and 88.6 deaths respectively.

The infection fatality rate is computed assuming homogeneous attack rate as

where A is the previously estimated for the age group [5] and A is the population in the age group as reported by the United Nations for countries [17] and the US Census for states [18] .

We obtained the values of the average " and standard deviation " of the generation time from Ref. [12] , of the averages of the incubation ! and symptom onset-to-death )# times from Refs. [5, 13] , and of the average number of days Δ +-of positive testing by an infected individual from Refs. [14, 16] . The average time at which an individual tested positive after infection +-was computed as +-= ! − 2 + Δ +-/2, where we have assumed that on average an individual started to test positive 2 days before symptom onset. The average seroconversion time after infection ,-was estimated as ! plus the 7 days of 50% seroconversion after symptom onset reported in Ref. [15] .

We implemented the dynamical constraints to compute the infectious and infected population as outlined in the main text and as detailed in the previous section of this document, using days as time units. Time delays were rounded to days to assign daily values.

The first derivative of the cumulative number of deaths is computed as

The growth rate was computed explicitly from the discrete time series as the centered 7day difference

The 7-day value was chosen to mitigate reporting artifacts.

Confidence intervals associated with death counts were computed using bootstrapping with 10,000 realizations [32]. These confidence intervals were combined with the credibility intervals of the in infectious and infected populations assuming independence and additivity on a logarithmic scale.

The fold accuracy, 6 , is explicitly computed as

where |⋅| is the absolute value function, ? HI> is the *K observation, ? =>* is its corresponding estimation, and is the total number of observations.

Because of the delay between infections and deaths, inference for the values of the growth rate and infectious populations ends on November 2, 2020 and for the values of the infected populations ends on October 29, 2020. Extrapolation to the current time (November 24, 2020) is carried out assuming the last growth rate computed.

The quantities * and " ( ) are related to each other through the Euler-Lotka equation, * D: =

. Generation times can generally be described through a gamma distribution

ND: DL$ with = " ( / " ( and = " / " ( , which leads to * = (1 + " ( )/ ) N for " ( ) > − and * = 0 for " ( ) ≤ − . In the case of the exponentially distributed limit ( ≃ 1) or small values of " ( )/ , it simplifies to * = 1 + " ( ) " for " ( ) > −1/ " and * = 0 for " ( ) ≤ −1/ " . Table S1 . Sources for the prevalence data at a global scale. The sources include press releases, online preprints, and publications. Seroprevalence data for Belgium and Spain does not include late estimates that show clear effects of diminishing antibody levels after infection. Prevalence data for England was used as a proxy for the United Kingdom. One of the data points for France [33] is a theoretical estimate based on detailed data.

Data source 

(The 69-page long figure is appended after the caption) The 1st panel from the top shows the trajectory in the growth rate-infectious population space. Each day is indicated by a symbol increasing in size with time. The largest symbol corresponds to the last day of the estimation (November 2, 2020). The blue line at the end indicates the extrapolation to the current time (November 24, 2020) assuming for the growth rate its last estimated value.

The 2nd panel from the top shows the temporal evolution of the growth rate (orange circles) and its extrapolation (blue line without circles). The shaded blue region indicates the 95% confidence intervals (CI).

The 3rd panel from the top shows the temporal evolution of the infectious population (orange circles) and its extrapolation (blue line without circles). The shaded blue region indicates the 95% CI assuming a certain . The dashed lines indicate the overall 95% credibility intervals (CrI) taking into account the uncertainty in the estimates of the .

The 4th panel from the top shows the temporal evolution of the infected population (orange circles) and its extrapolation (blue line without circles). The shaded blue region indicates the 95% CI assuming a certain . The dashed lines indicate the overall 95% CrI taking into account the uncertainty in the estimates of the .

The 5th panel from the top shows the raw reported daily deaths (blue circles), the processed daily deaths to mitigate reporting artifacts (orange circles), and the expected deaths (black curve).

Countries are arranged in alphabetical order. Locations in the US are arranged in alphabetical order after the data for the "United States" as a country. The prefix "US " has been added to the name of the locations in the US. The analyses for the locations "World" and "United States" have been computed with their overall deaths and demographics in contrast to Fig. 3 in the main text, which considered the cumulative contributions of countries and states. Death counts for Spain between April 1 and November 4, 2020 were obtained from "https://www.mscbs.gob.es" [31] because of missing updates in JHU CSSE COVID-19 data [1] . Death counts for China before January 22, not present in JHU CSSE COVID-19 data [1] , were obtained from the European Centre for Disease Prevention and Control The following locations were not included in the analysis because they did not have at least 30 reported COVID-19 deaths: Antigua and Barbuda, Barbados, Bhutan, Brunei, Burundi, Cambodia, Comoros, Eritrea, Fiji, Grenada, Iceland, Laos, Mauritius, Mongolia, New Zealand, Papua New Guinea, Saint Lucia, Saint Vincent and the Grenadines, Sao Tome and Principe, Seychelles, Singapore, Solomon Islands, Taiwan, Tanzania, Timor-Leste, US American Samoa, Vanuatu, and Western Sahara. Daily deaths (individuals) Daily deaths (individuals) Daily deaths (individuals)

Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China

Estimates of the severity of coronavirus disease 2019: a model-based analysis

Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany

Investigation of a COVID-19 outbreak in Germany resulting from a single travel-associated primary case: a case series. The Lancet Infectious Diseases

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2)

Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe

Mathematical models of infectious disease transmission

The mathematics of infectious diseases

Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. The Lancet Infectious Diseases

Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China

Clinical and immunological assessment of asymptomatic SARS-CoV-2 infections

Virological assessment of hospitalized patients with COVID-2019

Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study