key: cord-0903815-i6q3ftu9 authors: Vilar, Jose M. G.; Saiz, Leonor title: Reliably quantifying the evolving worldwide dynamic state of the COVID-19 outbreak from death records, clinical parametrization, and demographic data date: 2021-10-07 journal: Sci Rep DOI: 10.1038/s41598-021-99273-1 sha: 11764f1b19c76ebff1f71647239b31eb9d558dfc doc_id: 903815 cord_uid: i6q3ftu9 The dynamic characterization of the COVID-19 outbreak is critical to implement effective actions for its control and eradication but the information available at a global scale is not sufficiently reliable to be used directly. Here, we develop a quantitative approach to reliably quantify its temporal evolution and controllability through the integration of multiple data sources, including death records, clinical parametrization of the disease, and demographic data, and we explicitly apply it to countries worldwide, covering 97.4% of the human population, and to states within the United States (US). The validation of the approach shows that it can accurately reproduce the available prevalence data and that it can precisely infer the timing of nonpharmaceutical interventions. The results of the analysis identified general patterns of recession, stabilization, and resurgence. The diversity of dynamic behaviors of the outbreak across countries is paralleled by those of states and territories in the US, converging to remarkably similar global states in both cases. Our results offer precise insights into the dynamics of the outbreak and an efficient avenue for the estimation of the prevalence rates over time. www.nature.com/scientificreports/ These methods have typically pivoted toward using either the number of reported COVID-19 cases over time or the reported death counts over time. Focusing on the number of cases is challenging because they represent only an undetermined fraction of the actual infections and this fraction depends on the testing capacity, which was remarkably low in the early stages of the pandemic, and on how the testing capacity evolves over time. In addition, there are a large fraction of asymptomatic infections, some of which are uncovered through contact tracing approaches while others remain undetected 8 . Moreover, it is not straightforward to determine when a positive testing person was infected, which depends on the location-specific testing procedures as well. Therefore, the connection between reported cases and the actual number of infections depends on the testing policies of each location, how they are implemented, and how they change over time. An alternative approach is to use daily death counts. In this case, there is detailed statistical information on the disease progression over time through its different stages from infection to potential death, as well as on the age-stratified death ratios. The main drawbacks of this approach are on the mathematical side because it requires the solution of an inverse problem (finding the infectious population that would lead to the observed death curves) and on the fact that death counts are typically much lower than the number of infections, which make them prone to the inherent random fluctuations of the death process. These challenges are usually overcome by restricting the analyses to locations with large numbers of deaths and by imposing constraints on the reproduction number based on the information available about NPIs 9 . Here, we provide a general quantitative approach for reliably quantifying the temporal evolution of the COVID-19 outbreak infectious and infected population utilizing multiple data sources, including daily death records, clinical parametrization of the disease, and location-specific demographic data. Explicitly, we use the customary infection-age structured mathematical description 4,10-12 , which relies on the detailed statistical characterization of the disease progression over time, to develop an approach to obtain explicit estimates of the number of infectious and infected individuals in terms of the epidemiological death curves. We find that there is a general time delay between the infectious population and the daily deaths and a different time delay between the infected population and the cumulative number of deaths, which depend on the clinical parameters of the disease. The major location-specific contribution, besides the death counts, is the age structure of the population, which determines the infection fatality rate (IFR) and therefore the proportionality factors between infectious population and daily deaths and between infected population and cumulative deaths. We validated the approach with prevalence data of the infectious (PCR-RT testing) and infected (antibody testing) populations at a global scale and for states within the US, as well as with the timings of the peak infectiousness against the dates of the major country-wide lockdowns in Europe. To further quantify the temporal evolution and controllability of the COVID-19 outbreak we also obtained the temporal evolution of the growth rate of the infectious population. We consider explicitly countries in the world and states and territories within the United States (US) with at least 30 COVID-19 reported deaths as of January 21, 2021, which covers 97.4% of the world and 99.9% of the US population. Optimal dynamical constraints. The approach considers the dynamics of the infectious population, n I (t) , at time t described through the expression which establishes the definition of the (per capita) growth rate k G (t) . As this expression results from the definition of the per capita growth rate, k G (t) ≡ 1 n I (t) dn I (t) dt , it is completely general and independent of the underlying dynamics of the infection. The underlying infection dynamics dictates the relationship of k G (t) and n I (t) with the different epidemiological quantities. Based on an infection-age structured mathematical description 10, 11 (See "Methods: Infection-age structured dynamics"), we have developed an approach to uncover the optimal dynamic constraints for these relationships in terms of delays and scaling factors (See "Methods: Dynamical constraints"). We find that n I (t) is optimally related to the rate of increase of the expected cumulative deaths, n D , at a later time t + τ D according to where IFR is the infection fatality rate and τ G is the average generation time. Note that the first derivative of the expected cumulative number of deaths is obtained from the expected number of daily deaths as dn D (t) dt = n D (t) − n D (t − 1) (See "Methods: Dynamical constraints implementation with discrete time") and that expected deaths are obtained from raw death counts reported by the Johns Hopkins University Center for Systems Science and Engineering 1 (See "Methods: Expected deaths"). The preceding equation for n I (t) explicitly takes into account that, on average, an infectious individual within n I (t) has been infected at time t − τ 2 G +σ 2 G 2τ G and potentially dies with probability IFR at a time τ I + τ OD after infection. Here, σ 2 G is the variance of the generation time, and τ I and τ OD are the incubation and symptom onset-todeath average times, respectively. The values of these characteristic times have been estimated in days as τ G = 6.5 , σ G = 4.2 , τ I = 6.4 , and τ OD = 17.8 from precise follow up of specific groups of patients 5, 13, 14 , which leads to 6 (See "Methods: Dynamical constraints" and "Methods: Clinical parameters "). (1) www.nature.com/scientificreports/ Therefore, the value of τ D can be interpreted as the average number of days from infection to death ( τ I + τ OD ) minus the average number of days that an individual remains infectious after infection ( . A general assumption of the approach is that the IFR for each age group remains the same for all locations 5 and that the overall IFR for each location is obtained as the average over its specific population's age distribution (See "Methods: Infection fatality rate (IFR)"). The growth rate is obtained directly from Eqs. (1) and (2) as which is related to the time-varying reproduction number, R t , through the Euler-Lotka equation (See "Methods: Reproduction number"). The expected cumulative number of infected individuals at a time t , n T (t) , follows from which is obtained also from the dynamical constraints (See "Methods: Dynamical constraints"). To compare with prevalence studies, we used the dynamical constraints (See "Methods: Dynamical constraints") to obtain the relationship of the expected number of seropositive individuals, n SP (t) , with the infected population, Implementation. Equations (1-4) completely characterize the dynamics of the outbreak from the expected cumulative deaths, the age structure of the population, and the general clinical parameters of the infection. We used a workflow (Fig. 1A ) that incorporates explicitly the death counts compiled by the Johns Hopkins University Center for Systems Science and Engineering 1 , the age structure reported by the United Nations for countries 18 and the US Census for states and territories 19 , previously estimated aged-stratified IFR 5 , and other previously estimated clinical parameters 5, [13] [14] [15] [16] [17] . The expected number of daily deaths was inferred using density estimation from the death curves after preprocessing to minimize reporting artifacts (See "Methods: Inference and extrapolation"). Equations (5) and (6) were used to set the appropriate scale and delay to compare with seroprevalence studies. Validation against prevalence data and NPIs. To validate the approach, we contrasted the estimated infectious and infected populations with the results from available antibody seroprevalence and RT-PCR testing studies for countries in the world and locations in the US with at least 30 reported deaths (Fig. 1B) . The observed and estimated values agree with each other within the 95% credibility intervals (CrI), with a 1.56-fold accuracy over almost a 1,000-fold variation and a correlation coefficient ( ρ ) on a logarithmic scale of ρ = 0.94. We combined into a separate analysis (Fig. 1C ) the data of two additional comprehensive antibody seroprevalence studies within the US, which included 49 20 and 38 21 states with non-zero prevalence values. For the states present in the two studies, we considered their average values. The estimated values agree with the observed prevalence with 1.61-fold accuracy and ρ = 0.80. The agreement for the combined data is better than for the data of each of the studies independently (1.65-fold accuracy and ρ = 0.73 for one study 20 and 1.74-fold accuracy and ρ = 0.77 for the other study 21 ) and better than the agreement of both studies with each other ( ρ = 0.55 for the 38 states common to both studies), indicating that the estimations of the approach fall within the observed variability of the prevalence studies. Indeed, the approach is effectively unbiased collectively, with the (geometric) average of the estimations being just a factor 1.12 (globally) and 1.13 (US) larger than that of the observations. Overall, the validation of the approach shows that it can reproduce the available prevalence studies over a factor 615.0 between minimum and maximum values for countries and states with 72.0% (globally) and 77.6% (US) of the estimates within a factor 2 of the observed values, and with 100.0% (globally) and 93.9% (US) within a factor 3. There are no countries and only three states with values outside the factor 3 boundary. In the cases of Rhode Island and New Hampshire, the estimated infectious population is larger than the ones reported by one study 21 , which is consistent with the observed age-specific seroprevalence biased to older populations. In the case of Oregon, the estimated infectious population is smaller than the ones reported 20, 21 , which is consistent with the observed age-specific seroprevalence biased to younger populations 21 . In the case of Oregon, in addition, only 338 of the 1123 excess deaths during the outbreak before the collection of the specimens for analysis were attributed to COVID-19 22 . We also validated the ability of the approach to capture the effects of NPIs (Fig. 1D ). The inferred timings of the peak infectiousness (maximum infectious population) are concordant with the dates of the major early country-wide lockdowns in Europe 9 , with an overall average deviation of 0.0 days between lockdown and peak Outbreak progression motifs. We analyzed explicitly the time evolution of the growth rate, the infectious population, the cumulative number of infections, and the relationship between growth rate and infectious population for all countries and states with at least 30 deaths (Supplementary Fig. S1 and Table 1 ). The characterization of the dynamics in terms of the growth rate and infectious population ( Supplementary Fig. S1 ) shows that there are prototypical types of behavior, or motifs (highlighted with representative examples in Fig. 2 ). Locations either contained ( Fig. 2A -C) or amplified (Fig. 2E ,F) the outbreak in its initial stages. In the case of initial containment, the growth rate switched rapidly to negative values. Subsequently, the behavior branched into contained sporadic resurgences below the initial maximum infectiousness ( Fig. 2A) , uncontrolled resurgence of the infectious population over the initial maximum infectiousness (Fig. 2B) , and sustained decrease Figure 1 . The approach consistently estimates prevalence and the timing of NPIs. The approach (A) has been validated with prevalence data of the infectious (PCR-RT testing) and infected (antibody testing) populations at a global scale (B) and for states within the US (C). The continuous black lines (B,C) represent the perfect prediction (identity function denoted by id(·) ) with the parallel dotted/dashed lines indicating the fold accuracy. Global data were obtained from sources described in Supplementary Table S1 . Several locations have estimates for multiple date ranges. State data correspond to two studies with specimen samples taken primarily on the first two weeks of July 20 and of August, 2020 21 . The inferred timings of the peak infectiousness are plotted against the dates of the major country-wide lockdowns in Europe 9 (D). The continuous orange line represents perfect concordance and the parallel dotted/dashed lines indicate the mean absolute error. www.nature.com/scientificreports/ of the outbreak (Fig. 2C ). The specific behavior depended on the success of the measures implemented, e.g. targeted control and moderate lockdowns, and their subsequent relaxation 23 . In the case of initial amplification, the dynamics proceeded in diverse ways, including an increasing infectious population converging to zero growth (Fig. 2D) , a fast-evolving infectious population switching from positive to negative growth (Fig. 2E) , and fast convergence to subsequent sustained residual growth (Fig. 2F ). In general, the locations that reached a substantial negative growth rate are those that implemented long-term strict lockdowns, whereas zero or small positive growth rates correspond to intermediate measures with partial restrictions 23 . In many cases, lifting restrictions has led to fast switching from negative to positive sustained growth, as illustrated by United Kingdom, and Italy ( Fig. 2E) , indicating the inability of these locations to contain the outbreak even at low values of the infectious population. Global dynamics. At a global scale, the outbreak is characterized by two early local maxima of the overall worldwide infectious population on January 25, 2020 and March 26, 2020, coincidental with the regression of the outbreak in China, initially, and in Europe, afterward, reaching the highest local maximum of 7.35 million active infections (Fig. 3A ,B) on July 17, 2020 with a subsequent increasing trend since September 22, 2020. In the US, there has been a local maximum of the infectious population on March 28, 2020 with 1.42 million active infections, a smaller local maximum on July 14, 2020, and a subsequent increasing trend since September 15, 2020 (Fig. 3D,E) . The estimated infected population is 375.0 million, growing at a rate of 15.70 million new infections per week, for the World (Fig. 3C ) and 52.0 million, growing at a rate of 2.62 million new infections per week, for the US (Fig. 3F ), which is a factor 3.1 for countries in the World and 2.1 for locations in the US higher than the corresponding reported cases 1 . At a local scale, the per capita infectious population of countries and states has only exceptionally crossed the 1% value (27 out of 154 countries and 14 out of 53 states) (Fig. 4) . The infected populations have surpassed the 10% of the total population only for 44 countries and 43 states (Fig. 4) , with 8 countries and 11 states reaching the 20% value. These results confirm that relying on herd immunity is not a realistic option. Controlling the outbreak by actively tracing and quarantining newly and potentially infected individuals has been successfully implemented, until early October, 2020, in South Korea, with the use of large resources and with occasional outbreaks that required short-term extended human-interaction restrictions, and almost successfully in Japan, with voluntary business closures and other restrictions 23 www.nature.com/scientificreports/ to control the outbreak as soon as restrictions are lifted, even at low values of the infectious population but above the South Korea average value, is illustrated by United Kingdom and Italy (Fig. 2E) . From local to global dynamics. The collective properties of the individual local dynamics, as quantified by the distribution of growth rates across countries and states over time (Fig. 5) , shows a progressive double stabilization of the outbreak. The double stabilization means that growth rates have mainly moved from large initial values towards zero values both at local scales and at a global scale. At local scales, growth rates for most The dynamical constraints we have obtained through a detailed infection-age mathematical description of the outbreak allowed us to find the optimal time delays and scaling factors to connect the evolution of the reported death counts over time with those of the infectious and infected populations. Overall, integrating these constraints through a workflow with essential preprocessing steps showed that the approach can consistently infer the precise timing of NPIs and estimate prevalence data across countries in the world and locations in the US. A prominent feature of the approach is its ability to provide reliable results even for low death counts, which overcomes the major limitations of choosing between unreliable infection case data (highly dependent on testing rates) or noisy death counts as input to the inference problem 9 . The approach assumes a general age-stratified IFR . In general, these quantities are expected to depend potentially on specific features of the population and the medical care facilities available. The available studies show a minimal variability among different countries and other locations that reported on prevalence 24 . It also assumes an age-uniform exposure (attack rate), which is consistent with data for other respiratory diseases 5 and holds to a large extent when there is information available for COVID-19 20, 25, 26 . We have also assumed a constant generation interval typical of non-confinement locations, which has been observed to shorten in some cases by NPIs 27 . Prevalence studies can also depend on the diminishing antibody levels after infection 15, 28 , collecting and processing specimens for analysis 29 , and potential biases towards specific population groups 20 . In addition, there might be a degree of under-reporting of COVID-19 deaths, as suggested by excess mortality not attributable to other causes than COVID-19 22 . Our results show that all of these potential deviations on the assumptions, on the data, and on prevalence studies collectively have only a restricted impact on the approach, with 72.0% (globally) and 77.6% (US) of the estimates within a factor 2 of the observed values and 100.0% (globally) and 93.9% (US), within a factor 3. This accuracy of the estimations is highly remarkable in the context of the observed prevalence spread over a factor 615.0 between minimum and maximum values, from 0.02 to 12.30%. The analysis in terms of the growth rate-infectious population trajectories has revealed universal types of behavior of the outbreak for countries around the world and locations within the US. This information can be used to anticipate the response to enacting, modifying, or lifting NPIs. The most marked example is the response to strict lockdowns across countries in Europe (e.g. United Kingdom, Italy, Belgium, Spain, France, Germany, Austria, Netherlands, and Switzerland) and Northeast states in the US (e.g. New York, New Jersey, Massachusetts, Pennsylvania, and District of Columbia). They followed the same type of behavior (fast decrease of the growth rate from high to sustained negative values) until major restrictions were lifted in the European countries 23 , turning the growth rate of their infectious populations into highly sustained positive values. Our results show that those European countries had small actively infectious populations but not as small as required for targeted controllability. They also show that most Northeast states in the US are in a similar resurgence path but at much www.nature.com/scientificreports/ earlier stages, with many of their NPIs still in place and with markedly smaller growth rates, which makes their reaching as deadly a resurgence as in Europe still avoidable. At a global scale, the outbreak has reached a net growth rate fluctuating near zero values but with a high infectious population. A similar state has also been reached in the US. This type of fluctuating states, with long stagnant overall infectious population periods and median growth rate close to zero, is expected of bounded unsynchronized fluctuating populations 30 , such as those from uncoordinated locations aiming at just preventing an unrestricted expansion of the outbreak rather than at its eradication. This widespread feature is present for both countries in the world and locations within the US. Considering the NPIs implemented 23 , our results show that there have been locations with interventions to move the growth rate towards zero values and that there have been locations switching on and off severe measures to decrease temporarily the active infectious population. Despite not growing substantially since reaching its highest local maximum of 7.35 million active infections on July 17, 2020, the high value of the global infectious population attained is currently leading to 15.70 million new infections per week that replace the same ballpark number of individuals that stop being infectious. This high turnover makes the control of any potential resurgence extremely costly. At a local scale, our results show a highly variable temporal evolution of the infectious populations, both over time for each location and across locations. Having an up-to-date estimate of the infectiousness of populations would allow policymakers to better implement travel planning among locations. The approach has proven to accurately track the effects of local NPIs. We also expect it to play a fundamental role in evaluating the progress of vaccination efforts, especially considering the challenges present, such as waning immunity levels and pathogen evolution 31 . Infection-age structured dynamics. For the description of the dynamics, we follow the customary infection-age structured approach (for details see for instance Refs. 4,10-12 ). Explicitly, we consider the infection-age structured dynamics of the number of individuals u I (t, τ ) at time t who were infected at time t − τ given by with boundary condition Here, τ is the time elapsed after infection, referred to as infection age, and j(t) = ∞ 0 k I (t, τ )u I (t, τ )dτ is the incidence, with k I (t, τ ) being the rate of secondary transmissions per single primary case. The solution is obtained through the method of characteristics 32 as for t ≥ τ and u I (t, τ ) = 0 for t < τ . The resulting renewal equation, j(t) = ∞ 0 k I (t, τ )j(t − τ )dτ , is used as the basis for the definitions of the reproduction number R t = ∞ 0 k I (t, τ )dτ and the probability density of the generation time f GT (τ ) = k I (t,τ ) R t . The infectious population is given by which considers that an individual remains potentially infectious after a time τ from infection with probability Therefore, in terms of the incidence [substituting Eq. (9) in Eq. (10)], we have Additionally, we consider the expected cumulative number of infections, n T (t) , expressed in terms of the overall accumulated incidence as and the dynamics of the expected cumulative deaths, n D (t), which takes into account that deaths occur with probability given by the infection fatality rate, IFR , at times after infection given by the convolution of the probability density functions of the incubation, f I , and symptom onset-to-death, f OD , times. Similarly, the variation of the expected number of seropositive individuals at a time t , n SP (t) , is expressed as where f SP is the probability density function of the seroconversion time after infection, and the expected number of individuals with positive RT-PCR testing n PT (t) , as ∂ ∂t u I (t, τ ) + ∂ ∂τ u I (t, τ ) = 0 To obtain a closed set of equations for the different epidemiological quantities, we developed an approach to optimally simplify the convolutions. Explicitly, for the expressions involving an integral ∞ 0 A(τ )j(t − τ )dτ of a function A with the incidence j , we perform a series expansion of the incidence around the infection-age time τ A , with the value of τ A chosen as The specific value of τ A leads directly to a first-order approximation, Combining Eqs. ( 28 ) and ( 29 ) with Eqs. ( 24 ) and ( 25 ) leads to (16) n TP (t) = ∞ 0 P TP (τ )u I (t, τ )dτ , 2τ G and the cumulative infected population n T (t) from seropositivity testing, n SP , at time t + τ SP . Expected deaths. The raw cumulative death counts over time, n W (t) , are obtained from the Johns Hopkins University Center for Systems Science and Engineering 1 for countries and for US locations. The daily death counts �n W (t) = n W (t) − n W (t − 1) are considered to contain reporting artifacts if they are negative or if they are unrealistically large. This last condition is defined explicitly as larger than 4 times its previous 14-day average value plus 10 deaths, �n W (t) > 10 + 4 × 1 14 (n W (t) − n W (t − 14)) , from a non-sparse reporting schedule with at least 2 consecutive non-zero values before and after the time t , �n W (t) = 1 5 (n W (t + 2) − n W (t − 3)). Reporting artifacts identified at time t are considered to be the result of previous miscounting. The excess or lack of deaths are imputed proportionally to previous death counts. Explicitly, death counts are updated as In this way, �n W (t) is assigned its previous seven-day average value. The expected daily deaths, �n D (t) , are obtained through a density estimation multiscale functional, f de , as �n D (t) = f de (�n W (t)) , which leads to the estimation of the expected cumulative deaths at time t as n D (t) = n W (t 0 ) + t s=t 0 +1 �n D (s) . Specifically, with where ma 14 (·) is a centered moving average with window size of 14 days and rg σ (·) is a centered rolling average through a Gaussian window with standard deviation σ . The specific value of the window size has been chosen to mitigate weekly reporting effects. The values of the standard deviations of the Gaussian windows have been selected to achieve a smooth representation of the expected death estimation for each country as shown in the bottom panels of Supplementary Fig. S1 . We consider an average delay of two days between reporting a death and its occurrence. where IFR a is the previously estimated IFR for the age group a 5 and g a is the population in the age group a as reported by the United Nations for countries 18 and the US Census for states 19 . Clinical parameters. We obtained the values of the average τ G and standard deviation σ G of the generation time from Ref. 13 , of the averages of the incubation τ I and symptom onset-to-death τ OD times from Refs. 5, 14 , and of the average number of days t TP of positive testing by an infected individual from Refs. 15, 17 . The average time at which an individual tested positive after infection τ TP was computed as τ TP = τ I − 2 + �t TP /2 , where we have assumed that on average an individual started to test positive 2 days before symptom onset. The average seroconversion time after infection τ SP was estimated as τ I plus the 7 days of 50% seroconversion after symptom onset reported in Ref. 16 . Dynamical constraints implementation with discrete time. We implemented the dynamical constraints to compute the infectious and infected population as outlined in the main text and as detailed in the previous section of this document, using days as time units. Time delays were rounded to days to assign daily values. (31) n T (t) = n SP (t + τ SP ), dd 0 = ma 14 (ma 14 (�n W (t))), (37) dd 1 = rg 12 (ma 14 (�n W (t))), (38) dd 2 = rg 48 (ma 14 (�n W (t))), www.nature.com/scientificreports/ The first derivative of the cumulative number of deaths is computed as with �n D (t) = n D (t) − n D (t − 1). The growth rate was computed explicitly from the discrete time series as the centered 7-day difference The 7-day value was chosen to mitigate reporting artifacts. Confidence and credibility intervals. Confidence intervals associated with death counts were computed using bootstrapping with 10,000 realizations 34 . These confidence intervals were combined with the credibility intervals of the IFR in infectious and infected populations assuming independence and additivity on a logarithmic scale. Fold accuracy. The fold accuracy, F A , is explicitly computed as where |·| is the absolute value function, x obs i is the i th observation, x est i is its corresponding estimation, and N is the total number of observations. Inference and extrapolation. Because of the delay between infections and deaths, inference for the values of the growth rate and infectious populations ends on December 30, 2020 and for the values of the infected populations ends on December 26, 2020. Extrapolation to the current time (January 21, 2021) is carried out assuming the last growth rate computed. Reproduction number. The quantities R t and k G (t) are related to each other through the Euler-Lotka equation, R −1 t = ∞ 0 f GT (τ )e −k G (t)τ dτ , which considers j(t − τ ) ≃ e −k G (t)τ j(t) in the renewal equation j(t) = ∞ 0 k I (t, τ )j(t − τ )dτ . Generation times can generally be described through a gamma distribution f GT (τ ) = β α Ŵ(α) τ α−1 e −βτ with α = τ 2 G /σ 2 G and β = τ G /σ 2 G , which leads to R t = (1 + k G (t)/β) α for k G (t) > −β and R t = 0 for k G (t) ≤ −β . In the case of the exponentially distributed limit ( α ≃ 1 ) or small values of k G (t)/β , it simplifies to R t = 1 + k G (t)τ G for k G (t) > −1/τ G and R t = 0 for k G (t) ≤ −1/τ G . Global prevalence data were obtained from multiple data sources [35] [36] [37] [38] [39] [40] [41] [42] , as described in Supplementary Table S1 . www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. JHU CSSE COVID-19 Data The temporal association of introducing and lifting non-pharmaceutical interventions with the time-varying reproduction number (R) of SARS-CoV-2: A modelling study across 131 countries Estimating epidemic exponential growth rate and basic reproduction number Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China Estimates of the severity of coronavirus disease 2019: A model-based analysis Transmission of 2019-nCoV infection from an asymptomatic contact in Germany Investigation of a COVID-19 outbreak in Germany resulting from a single travel-associated primary case: A case series Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe Mathematical models of infectious disease transmission The mathematics of infectious diseases Factors that make an infectious disease outbreak controllable Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: A retrospective cohort study Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China Clinical and immunological assessment of asymptomatic SARS-CoV-2 infections Virological assessment of hospitalized patients with COVID-2019 Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study UN World Population Prospects: Total Population -Both Sexes US Census: ACS Demographic and Housing Prevalence of SARS-CoV-2 antibodies in a large nationwide sample of patients on dialysis in the USA: A crosssectional study Excess deaths from COVID-19 and other causes Oxford COVID-19 Government Response Tracker, Blavatnik School of Government Assessing the age specificity of infection fatality rates for COVID-19: Systematic review, meta-analysis, and public policy implications Prevalence of SARS-CoV-2 in Spain (ENE-COVID): A nationwide, population-based seroepidemiological study Estimation of SARS-CoV-2 infection fatality rate by real-time antibody screening of blood donors Serial interval of SARS-CoV-2 was shortened over time by nonpharmaceutical interventions Seroprevalence of IgG antibodies against SARS coronavirus 2 in Belgium: A serial prospective cross-sectional nationwide study of residual samples Viral load in community SARS-CoV-2 cases varies widely and temporally Determinants of population responses to environmental fluctuations Immune life history, vaccination, and the dynamics of SARS-CoV-2 over the next 5 years Mathematical Biology 2nd edn Fallecidos COVID-19 The Elements of Statistical Learning: Data Mining, Inference, and Prediction Estimating the burden of SARS-CoV-2 in France Remarkable variability in SARS-CoV-2 antibodies across Brazilian regions: nationwide serological household survey in 27 states Repeated leftover serosurvey of SARS-CoV-2 IgG antibodies Lower prevalence of antibodies neutralizing SARS-CoV-2 in group O French blood donors Population point prevalence of SARS-CoV-2 infection based on a statewide random sample-Indiana Seroprevalence of SARS-CoV-2 in the Norwegian population measured in residual sera Notes from the field: Seroprevalence estimates of SARS-CoV-2 infection in convenience sample-Oregon European Centre for Disease Prevention and Control. Daily update of new reported cases of COVID-19 by country worldwide J.M.G.V. and L.S. designed research, performed research, analyzed data, and wrote the paper. The authors declare no competing interests. The online version contains supplementary material available at https:// doi. org/ 10. 1038/ s41598-021-99273-1.Correspondence and requests for materials should be addressed to J.M.G.V. or L.S.Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.