key: cord-1043847-cmmkfokm authors: Lee, Se Yoon; Mallick, Bani; Lei, Bowen title: Estimation of COVID-19 spread curves integrating global data and borrowing information date: 2020-04-29 journal: nan DOI: 10.1101/2020.04.23.20077065 sha: 784d98476956327129d18b352d29597d17f3417e doc_id: 1043847 cord_uid: cmmkfokm Currently, novel coronavirus disease 2019 (COVID-19) is a big threat to global health. Rapid spread of the virus has created pandemic, and countries all over the world are struggling with a surge in COVID-19 infected cases. Scientists are working on estimating or predicting infection trajectory for the COVID-19 confirmed cases, which will be useful for future planning and policymaking to effectively cope with the disease. There are no drugs or other therapeutics approved by the US Food and Drug Administration to prevent or treat COVID-19 (on April 13, 2020): information on the disease is very limited and scattered even if it exists. This motivates the use of data integration, combining data from diverse sources and eliciting useful information with a unified view of them. In this paper, we propose a Bayesian hierarchical model that integrates global data to estimate COVID-19 infection trajectories. Due to information borrowing across multiple countries, the proposed growth curve models provide a powerful predictive tool endowed with uncertainty quantification. They outperform the existing individual country-based models. Additionally, we use countrywide covariates to adjust infection trajectories. A joint variable selection technique has been integrated into the proposed modeling scheme, which aimed to identify the possible country-level risk factors for severe disease due to COVID-19. Since the COVID-19 outbreak, there have been numerous research works to better understand the pandemic in different aspects Jia et al., 2020; Liu et al., 2020; Peng et al., 2020; Qiang Li, 2020; Remuzzi and Remuzzi, 2020; Sheng Zhang, 2020; Yang et al., 2020) . Some of the recent works from statistics community are as follows. Sheng Zhang (2020) focused on a serial interval (the time between successive cases in a chain of transmissions) and used the gamma distribution to study the transmission on Diamond Princess cruise ship. Peng et al. (2020) proposed the generalized susceptible exposed infectious removed model to predict the inflection point for the growth curve, while Yang et al. (2020) modified the proposed model and considered the public health interventions in predicting the trend of COVID-19 in China. Liu et al. (2020) proposed a differential equation prediction model to identify the influence of public policies on the number of patients. Qiang Li (2020) used a symmetrical function and a long tail asymmetric function to analyze the daily infections and deaths in Hubei and other places in China. Remuzzi and Remuzzi (2020) used an exponential model to study the number of infected patients and patients who need intensive care in Italy. One of the major limitations of these works is that the researches are confined by analyzing data from a single country, thereby neglecting the global nature of the pandemic. 2 One of the major challenges in estimating or predicting an infection trajectory is the heterogeneity of the country populations. It is known that there are four stages of a pandemic: visit economictimes.indiatimes.com/-. The first stage of the pandemic contains data from people with travel history to an already affected country. In stage two, we start to see data from local transmission, people who have brought the virus into the country transmit it to other people. In the third stage, the source of the infection is untraceable. In stage four the spread is practically uncontrollable. In most of the current literature, estimation or prediction of the infection trajectory is based on a single country data where the status of the country falls into one of these four stages. Hence, such estimation or prediction may fail to capture some crucial changes in the shape of the infection trajectory due to a lack of knowledge about the other stages. This motivates the use of data integration (Huttenhower and Troyanskaya, 2006; Lenzerini, 2002) which combines data from different countries and elicits a solution with a unified view of them. This will be particularly useful in the current context of the COVID-19 outbreak. Recently, there are serious discussions all over the world to answer the crucial question: "even though the current pandemic takes place globally due to the same virus, why infection trajectories of different countries are so diverse?" For example, as seen from Figure 1 , the US, Italy, and Spain have accumulated infected cases within a short period of time, while China took a much longer time since the onset of the COVID-19 pandemic, leading to different shapes of infection trajectories. It will be interesting to find a common structure in these infection trajectories for multiple countries, and to see how these trajectories are changing around this common structure. Finally, it is significant to identify the major countrywide covariates which make infection trajectories of the countries behave differently in terms of the spread of the disease. The rapid spread of coronavirus has created pandemic, and countries all over the world are struggling with a surge in COVID-19 infected cases. Scientists are working on estimating the infection trajectory for future prediction of cases, which will be useful for future planning and policymaking. We propose a hierarchical model that integrates worldwide data to estimate COVID-19 infection trajectories. Due to information borrowing across multiple countries, the proposed growth curve 3 model will be a powerful predictive tool endowed with uncertainty quantification. Additionally, we use countrywide covariates to adjust curve fitting for the infection trajectory. A joint variable selection technique has been integrated into the modeling scheme, which will identify the possible reasons for diversity among the country-specific infection curves. There are three major classes of infectious disease prediction models: (i) differential equation models, (ii) time series models, and (iii) the statistical models. The differential equation models describe the dynamic behavior of the disease through differential equations allowing the laws of transmission within the population. The popular models include the SI, SIS, SIR, and SEIR models (Hethcote, 2000; Korobeinikov, 2004; Tiberiu Harko, 2014) . These models are based on assumptions related to S (susceptible), E (exposed), I (infected), and R (remove) categories of the population. Time series based prediction models such as ARIMA, Grey Model, Markov Chain models have been used to describe dependence structure over of the disease spread over time (Hu et al., 2006; Reza Yaesoubi, 2011; Rushton et al., 2006; Shen X, 2013; Zhirui He, 2018) . On the other hand, statistical models which follow the laws of epidemiology (Clayton and Hills, 2013; Thompson et al., 2006) are also popular, and can be easily extended in the framework of hierarchical models (multilevel model) to analyze data within a nested hierarchy, eventually harnessing the data integration (Browne et al., 2006; Hill, 1965; Stone and Springer, 1965; Tiao and Tan, 1965) . In this paper, we use Bayesian hierarchical models so that data integration and uncertainty analysis (Malinverno and Briggs, 2004) are possible in a unified way. Specifically, we use the Gompertz growth curve model (Gompertz, 1825) . The novelties of our method are as follows: we (i) use a flexible hierarchical growth curve model to global COVID-19 data, (ii) integrate information from multiple countries for estimation and prediction purposes, (iii) adjust for country-specific covariates, and (iv) perform covariate selection to identify the important reasons to explain the differences among the country-wise infection trajectories. We demonstrate that our proposed models perform better than the individual country-based modes. 4 The Gompertz growth curve model (Gompertz, 1825) is widely used to describe a growth curve for population studies in situations where growth is not symmetrical about the point of inflection (Anton and Herr, 1988; Seber and Wild, 2003) . Examples include trend of mobile phone uptake, bacterial growth in a confined space, and growth of cancer stem cell tumor (Caravelli et al., 2015; Islam et al., 2002; Sottoriva et al., 2010; Zwietering et al., 1990) . There are variant versions of the curve in the literature (Tjørve and Tjørve, 2017) , and we use the following form in this research where θ 1 , θ 2 , and θ 3 are real numbers. It is easy to derive that the Gompertz curve (1) has its unique inflation point at θ 3 (Goshu and Koya, 2013) . Figure 3 shows different shapes of the Gompertz growth curve obtained by varying each of the three parameters, θ 1 , θ 2 , and θ 3 , while fixing others. The followings are summary of the role of the the parameters: first, θ 1 represents an asymptote for the curve (1); second, θ 2 is related to a growth rate (slope) at the inflection point θ 3 ; third, θ 3 sets the displacement along the x-axis. Shape of Gompertz curve with fixed θ 1 =1 and θ 2 =1 Figure 3 : Change of shape of Gompertz curve with varying each of the curve parameters while fixing other two parameters: varying θ 1 (left); θ 2 (middle); θ 3 (right). We use the Gompertz growth curve (1) to model the infection trajectory. In this context, each of the curve parameters can be interpreted as follows: θ 1 is the maximum cumulative number of infected cases across the times; θ 2 is the growth rate of the trajectory at the inflection time point; and θ 3 is the inflection time point of the trajectory. More detailed interpretations will be revisited in Subsection 4.5. 5 4 Results We investigate the predictive performance of three Bayesian models based on the Gompertz growth curve. We start with the individual country-based model (here we use only the single country data) which has been widely used in the literature (M 1 ). Next, we extend the previous model For evaluation criteria, we calculate the mean squared error (MSE) (Fomby, 2006 ) associated with the extrapolated infection trajectory for each of the 50 countries. Training and the test data are selected as follows: given that y k = (y k,1 , · · · , y k,T ) is an infection trajectory of the k-th country spanning for T days since January 22nd, and d is the chosen test-day, then (i) the trajectory spanning for T − d days since January 22nd, that is, (y k,1 , · · · , y k,T −d ), is selected as the training data, and (ii) the d recent observations, (y k,T −d+1 , · · · , y k,T ), is selected as the test data. For the two models M 2 and M 3 , the MSE is averaged over the 50 countries, given by where y k,r is the actual value for the cumulative confirmed cases of the k-th country at the r-th time point, and y * k,r is the forecast value. More concretely, y * k,r is the posterior predictive mean given the information from 50 countries. For the model M 1 , the MSE d is acquired by using the predicted values based on a single country. We evaluate the MSE d from 20 replicates, for each of the short-term test-days (d = 5, 6, 7, 8, 9, 10) and long-term test-days (d = 22, 24, 26, 28, 30) , and then report the median of the MSE d 's. The 6 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. results are shown in Figure 4 . From the panel, we see that (1) the predictive performances of two hierarchical models, M 2 and M 3 , are universally better than that of M 1 across the number of test-days; and (2) the gap of MSE d between M 1 and the other two models increases as the number of test days d increases. Based on the outcomes, we conclude that information borrowing has improved the accuracy of the forecasting in terms of MSE. Hence, we present all the results in the consequent subsections based on the model M 3 . A similar result is found in the Clemente problem from (Efron, 2010) where the James-Stein estimator (James and Stein, 1992) better predicts then an individual hitter-based estimator in terms of the total squared prediction error. We categorize the 50 countries into the three levels by estimating the the total number of infected cases (that is, θ 1 of the Gompertz growth curve (1)), for the 50 countries. Grouping criteria are as follows: (1) Level 1 (estimated total number is no more than 10,000 cases); (2) Level 2 (estimated 7 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10. 1101 total number is between 10,000 and 100,000 cases); and (3) Level 3 (estimated total number is more than 100,000 cases). (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10. 1101 A crucial question is then when this trajectory gets flattened. To that end, we approximate a time point where an infection trajectory levels off its value, showing a flattening pattern after that time point. The following is the definition of the flat time point which we use in this paper: Definition 4.1. Given the Gompertz growth curve g(t; θ 1 , θ 2 , θ 3 ) (1), the flat time point t flat, is defined as the solution of the equation θ 1 − = g(t; θ 1 , θ 2 , θ 3 ) for some small > 0, given by All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . August 6th when the corresponding 's are chosen by 100,000, 10,000, 1,000, and 100, respectively. It is important to emphasize that these estimates are based on 'observations tracked until April 9th'. Certainly, incorporation of new information such as compliance with social distancing or advances in medical and biological sciences for this disease may change the inference. (1) for the Spain, t flat, =10,000 =May 2nd, t flat, =1,000 =May 27th, and t flat, =100 =June 20th; (2) for the UK, t flat, =10,000 =June 4th, t flat, =1,000 =July 12th, and t flat, =100 =August 19th; and (3) for the Brazil, t flat, =10,000 =June 6th, t flat, =1,000 =July 22nd, and t flat, =100 =September 6th. Results for other countries are included in the SI Appendix. 10 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (1) for the Spain, t flat, =10,000 =May 2nd, t flat, =1,000 =May 27th, and t flat, =100 =June 20th; (2) for the UK, t flat, =10,000 =June 4th, t flat, =1,000 =July 12nd, and t flat, =100 =August 19th; and (3) for the Brazil, t flat, =10,000 =June 6th, t flat, =1,000 =July 22nd, and t flat, =100 =September 6th. 11 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10. 1101 4.4 Global trend for the COVID-19 outbreak COVID-19 is a new disease and there is very limited information regarding risk factors for this severe disease. There is no vaccine aimed to prevent the transmission of the disease because there is no specific antiviral agent is available (For more detail, visit www.cdc.gov/-). It is very important to find risk factors relevant to the disease. CDC described High-Risk Conditions based on currently available information and clinical expertise: those at high-risk for severe illness from COVID-19 12 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.23.20077065 doi: medRxiv preprint include • People 65 years and older; • People who live in a nursing home or long-term care facility; • People with chronic lung disease or moderate to severe asthma; • People who are immunocompromised, possibly caused by cancer treatment, smoking, bone marrow or organ transplantation, immune deficiencies, poorly controlled HIV or AIDS, and prolonged use of corticosteroids and other immune weakening medications; • People with severe obesity (body mass index of 40 or higher); • People with diabetes; • People with chronic kidney disease undergoing dialysis; • People with liver disease. The model M 3 involves three separated linear regressions whose response, and coefficient vector are given by θ l and its corresponding regression parameters β l , respectively (l = 1, 2, 3). (See the equation (3)) The sparse horseshoe prior (Carvalho et al., 2009 (Carvalho et al., , 2010 is imposed for each of the coefficient vectors which makes the model equipped with covariates analysis. That way, we can identify key predictors explaining the heterogeneity of shapes among country-wise infection trajectories, which can be further used in finding risk factors for severe disease due to COVID-19. The results are in table 1 1. 13 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . • In the second column of the Table 1 , the parameter θ 1 represents the total number of infected cases across the times. A larger number of θ 1 implies that a country has (can have) more COVID-19 infected patients. A covariate with plus sign (+) (or minus sign (−)) is a factor associated with an increase (or decrease) of the total infected cases. • In the third column of the Table 1 , the parameter θ 2 represents the a growth rate of the infection trajectory at the time point t = θ 3 . A larger number of θ 2 implies a faster spread of the virus around the country. A covariate with plus sign (+) (or minus sign (−)) is a factor associated with a rapid (or slow) spread of the virus. • On the fourth column of the Table 1 , the parameter θ 3 is related to the a time-delaying factor of the infection trajectory. The larger the value of θ 3 the later the trajectory begins to accumulate infected cases, leading to a later onset of the accumulation. A covariate with plus sign (+) (or minus sign (−)) is a factor associated with accelerating (or decelerating) the onset of the accumulation. Now, based on the aforementioned guideline, we shall interpret the Finally, moving to the parameter θ 3 , having larger numbers of doctors and COVID-19 testings conducted are helpful in earlier detection of the infected patients, which leads to an earlier onset 14 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. It is important to emphasize that, while medical and biological sciences are on the front lines of beating back COVID-19, the true victory relies on advance and coalition of almost every academic field. However, information about COVID-19 is limited: there are currently no vaccines or other therapeutics approved by the US Food and Drug Administration to prevent or treat COVID-19 (on April 13, 2020). Although numerous research works are progressed by different academic field, the information about COVID-19 is scattered around different disciplines, which truly requires interdisciplinary research to hold off the spread of the disease. Proper integration of data from multiple sources is a key to understand the COVID-19 disease, and this can be accomplished by borrowing information. The motivation of using the borrowing information is to make use of the indirect evidence (Efron, 2010) to enhance the predictive performance: for example, to extrapolate the infection trajectory for the US, the information is not only from the US (direct evidence) but also from other countries (indirect evidence) which has been utilized to improve the predictive accuracy of the trajectory for the US. To harness the borrowing information endowed with uncertainty quantification, Bayesian argument is useful, which induces sensible inferences and decisions for the users (Lindley, 1972) . The results demonstrated the superiority of our approach compared to the existing individual country-based models. Our research outcomes can be thought even more insightful given that we have not employed information about disease-specific covariates. That being said, using more detailed information such as social mixing data, precise hospital records, or patient-specific information will further improve the performance of our model. Moreover, integration of epidemiological models with these statistical models will be our future topic of research. 15 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020 . . https://doi.org/10.1101 6 Materials and Methods In this research, we analyze global COVID-19 data {y i , x i } N i=1 , obtained from N = 50 countries. (Meanings for the vector notations, y i and x i , will be explained shortly later.) These countries are most severely affected by the COVID-19 in terms of the confirmed cases on April 9th, and listed on Table 2 : each country is contained in the table with format "country name (identifier)", and this identifier also indicates a severity rank, where a lower value indicates a severer status. The order of the ranks thus coincides with the order of the countries named on the y-axis of the Figure 2 . (3), France (4), Germany (5), China (6), Iran (7), United Kingdom (8) (50) NOTE: Countries are listed with the form "country name (identifier)". This identifier also represents a severity rank. The rank is measured based on the accumulated number of the confirmed cases on April 9th. For each country i (i = 1, · · · , N ), let y it denotes the number of accumulated confirmed cases for COVID-19 at the t-th time point (t = 1, · · · , T ). Here, the time indices t = 1 and t = T correspond to the initial and end time points, January 22nd and April 9th, respectively, spanning for T = 79 (days). The time series data y i = (y i1 , · · · , y it , · · · , y iT ) is referred to as an infection trajectory For each country i, we collected 74 covariates, denoted by x i = (x i1 , · · · , x ij , · · · , x ip ) (p = 74). The 74 predictors can be further grouped by 6 categories: the 1st category: general country and 16 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.23.20077065 doi: medRxiv preprint population distribution and statistics; the 2nd category: general health care resources; the 3rd category: tobacco and alcohol use; the 4th category: disease and unhealthy prevalence; the 5th category: testing and immunization statistics; and the 6th category: international health regulations monitoring. The data sources are the World Bank Data (https://data.worldbank.org/-), World Health Organization Data (https://apps.who.int/-), and National Oceanic and Atmospheric Administration (https://www.noaa.gov/-). Detailed explanations for the covariates are described in SI Appendix. We propose a Bayesian hierarchical model based on the Gompertz curve (1), which is referred to as Bayesian hierarchical Gompertz model (BHGM), to accommodate the COVID-19 data (Although the model is based on the Gompertz curve, the idea can be generalized to any choice for growth curves.) Ultimately, a principal goal of the BHGM is to establish two functionalities: (a) [Extrapolation] uncover a hidden pattern from the infection trajectory for each country i, that is, y i = (y i1 , · · · , y iT ) , through the Gompertz growth curve g(t; θ 1 , θ 2 , θ 3 ) (1), and then extrapolate the curve. (b) [Covariates analysis] identify important predictors among the p predictors x = (x 1 , · · · , x p ) that largely affect on the shape the curve g(t; θ 1 , θ 2 , θ 3 ) in terms of the three curve parameters. A hierarchical formulation of the BHGM is given as follows. First, we introduce an additive independently identical Gaussian error to each observation {y it } N,T i=1,t=1 , leading to a likelihood part: where g(t; θ 1i , θ 2i , θ 3i ) is the Gompertz growth curve (1) which describes a growth pattern of infection trajectory for the i-th country. Because each of the curve parameters has its own interpretations in characterizing the infection trajectory, we construct three separate linear regressions: θ li = α l + x i β l + ε li , ε li ∼ N (0, σ 2 l ), (i = 1, · · · , N, l = 1, 2, 3), 17 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . where β l = (β l1 , · · · , β lj , · · · , β lp ) is a p-dimensional coefficient vector corresponding to the l-th linear regression. To impose a continuous shrinkage effect (Bhadra et al., 2019) on each of the coefficient vectors, we adopt to use the horseshoe prior (Carvalho et al., 2009 (Carvalho et al., , 2010 : β lj |λ lj , τ lj , σ 2 l ∼ N (0, σ 2 l τ 2 l λ 2 lj ), λ lj , τ lj ∼ C + (0, 1), (l = 1, 2, 3, j = 1, · · · , p). Finally, improper priors (Gelman et al., 2004) are used for the intercept terms and error variances terms in the model: See SI Appendix for a posterior computation for the BHGM (2) -(5). 6.3 Technical expressions for three models M 1 , M 2 , and M 3 Technical expressions for the three models, M 1 , M 2 , and M 3 , compared in Subsection 4.1 are given as follows: M 1 is an individual country-based model (nonhierarchical model) that uses infection trajectory for a single country y = (y 1 , · · · , y T ) . The model is given by y t = g(t; θ 1 , θ 2 , θ 3 ) + t , t ∼ N (0, σ 2 ), θ l ∼ N (α l , σ 2 l ), (t = 1, · · · , T, l = 1, 2, 3), where g(t; θ 1 , θ 2 , θ 3 ) is the Gompertz growth curve (1), and improper priors (Gelman et al., 2004) are used for error variances and intercept terms as (5). M 2 is a Bayesian hierarchical model without using covariates, which uses infection trajectories . This model is equivalent to BHGM (2) -(5) with removed covariates terms in (3). M 3 is the BHGM (2) -(5). 18 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020 . . https://doi.org/10.1101 (1) Population ages 65 and above (% of total population) in 2018. Total popu (2) The total number of population in 2018. Female per (3) The percentage of female in the population in 2018. Death disease (4) Death by communicable diseases and maternal, prenatal and nutrition conditions (% of total) in 2016. GDP PPP (5) GDP, PPP (current international $) in 2017. GDP PPP per (6) GDP per capita, PPP (current international $) in 2017. Median age (13) Population median age in 2013. Birth rate (14) Crude birth rate (per 1000 population) in 2013. Death rate (15) Crude death rate (per 1000 population) in 2013. Life expect total birth (26) Life expectancy at birth (years) in 2016. Life expect total 60 (27) Life expectancy at age 60 (years) in 2016. Hea life expect total birth (28) Healthy life expectancy at birth (years) in 2016. Hea life expect total 60 (29) Healthy life expectancy at age 60 (years) in 2016. Dis to China (69) Calculated by the R function distm based on the average longitude and latitude. Popu density (73) Population density (people per sq.km of land area) in 2018. Tempe avg (74) The average temperature in February and March in the captain of each country (we choose New York for US and Wuhan for China, due to the severe outbreak in the two cities). 23 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . (17) Unrecorded alcohol consumption per capita (15+) (in litres of pure alcohol) in 2016. Abstainers total (18) Alcohol lifetime abstainers (those adults who have never consumed alcohol) (% of total) in 2016. Alcohol consumers total (19) Alcohol consumers past 12 months (those adults who consumed alcohol in the past 12 months) (% of total) in 2016. Heavy drinking total (20) Age-standardized estimates of the proportion of adults (15+ years) (who have had at least 60 grams or more of pure alcohol on at least one occasion in the past 30 days) in 2016. Alcohol death total (21) Alcohol-attributable death (% of all-cause deaths in total) in 2016. Alcohol disorder total (22) Number of adults (15+ years) with a diagnosis of F10.1, F10.2 (alcohol disorder) during a calendar year (% of total 15+) in 2016. Tobacco smoke (58) Age-standardized rates of prevalence estimates for daily smoking of any tobacco in adults (15+ years) in 2013. Cigarette smoke (59) Age-standardized rates of prevalence estimates for daily smoking of any cigarette in adults (15+ years) in 2013. 24 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . Mortality rate attributed to exposure to unsafe wash services (per 100000 population) (SDG 3.9.2) in 2016. 25 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . (43) Scores that show the availability of human resources to implement IHR Core Capacity. Health Emergency (44) Scores that show the ability of effective response at health emergencies in 2018. Health Service Provision (45) Scores that show an immediate output of the inputs into the health system, such as the health workforce, procurement and supplies, and financing in 2018. NOTE 1: Table 9 and Table 10 are both for the explanation of (the 6th category) international health regulations monitoring framework. 27 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . (47) Scores that show whether general obligations at point of entry are fulfilled (including for coordination and communication) to prevent the spread of diseases through international traffic in 2018. Chemical Events (48) Scores that show whether mechanisms are established and functioning for detection, alert and response to chemical emergencies that may constitute a public health event of international concern in 2018. Radiation Emergencies (49) Scores that show whether mechanisms are established and functioning for detecting and responding to radiological and nuclear emergencies that may constitute a public health event of international concern in 2018. NOTE 1: The International health regulations, or IHR (2005), represent an agreement between 196 countries including all WHO Member States to work together for global health security. Through IHR, countries have agreed to build their capacities to detect, assess and report public health events. WHO plays the coordinating role in IHR and, together with its partners, helps countries to build capacities. (https://www.who.int/ihr/about/-) NOTE 2: IHR monitoring framework was developed, which represents a consensus among technical experts from WHO Member States, technical institutions, partners and WHO. (https://www.who.int/ihr/procedures/-) 28 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . tions. Proportional parts of the distributions are given by respectively, where T -dimensional vector g(θ i1 , θ i2 , θ i3 ) (i = 1, · · · , N ) is obtained by g(θ i1 , θ i2 , θ i3 ) = (g(1; θ 1i , θ 2i , θ 3i ), . . . , g(T ; θ 1i , θ 2i , θ 3i )) . Here, · 2 indicates the l 2 -norm. Note that the two conditional densities are not known in closed forms because two parameters, θ 2i and θ 3i , participate to the function g(θ 1i , θ 2i , θ 3i ) in nonlinear way. We use the Metropolis algorithm (Andrieu et al., 2003) with Gaussian proposal densities within this Gibbs sampler algorithm. Step 3. Sample σ 2 from its full conditional distribution Step 4. Sample α l , l = 1, 2, 3, independently from their full conditional distributions π(α l |−) ∼ N 1 (1 (θ l − Xβ l )/N, σ 2 l /N ). Step 5. Sample β l , l = 1, 2, 3, independently from conditionally independent posteriors π(β l |−) ∼ N p (Σ β l X (θ l − 1α l ), σ 2 l Σ β l ), where Σ β l = [X X + Λ −1 * l ] −1 ∈ R p×p , Λ l = diag(λ 2 l1 , · · · , λ 2 lp ) ∈ R p×p , and Λ * l = τ 2 Λ l . Step 6. Sample λ lj , l = 1, 2, 3, j = 1, · · · , p, independently from conditionally independent posteriors π(λ lj |−) ∼ N (β lj |0, σ 2 l τ 2 l λ 2 lj ) · {1/(1 + λ 2 lj )}. Note that the densities π(λ lj |−) (l = 1, 2, 3, j = 1, · · · , p) are not expressed in closed forms: we use the slice sampler (Neal, 2003) . 30 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 29, 2020. . An introduction to mcmc for machine learning Calculus with analytic geometry Lasso meets horseshoe: A survey A comparison of bayesian and likelihood-based methods for fitting multilevel models Optimal leverage trajectories in presence of market impact Handling sparsity via the horseshoe The horseshoe estimator for sparse signals Explaining the gibbs sampler Statistical models in epidemiology The future of indirect evidence. Statistical science: a review journal of the Institute of Scoring measures for prediction problems Breakthrough: Chloroquine phosphate has shown apparent efficacy in treatment of covid-19 associated pneumonia in clinical studies Bayesian data analysis Xxiv. on the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. in a letter to francis baily, esq. frs &c Derivation of inflection points of nonlinear regression curvesimplications to statistics The mathematics of infectious diseases Inference about variance components in the one-way model Rainfall, mosquito density and the transmission of ross river virus: A time-series forecasting model Bayesian data integration: a functional perspective Modelling multinational telecommunications demand with limited data Estimation with quadratic loss Demographic science aids in understanding the spread and fatality rates of covid-19 Prediction and analysis of coronavirus disease Lyapunov functions and global properties for seir and seis epidemic models Data integration: A theoretical perspective Bayesian statistics, a review Predicting the cumulative number of cases for the covid-19 epidemic in china from early data Expanded uncertainty quantification in inverse problems: Hierarchical bayes and empirical bayes Slice sampling Health security capacities in the context of covid-19 outbreak: an analysis of international health regulations annual report data from 182 countries Epidemic analysis of covid-19 in china by dynamical modeling Trend and forecasting of the covid-19 outbreak in china Covid-19 and italy: what next? The Lancet Generalized markov models of infectious disease spread: A novel framework for developing dynamic health policies Monte Carlo statistical methods Nonlinear regression Projecting hospital utilization during the covid-19 outbreaks in the united states The application of the grey disaster model to forecast epidemic peaks of typhoid and paratyphoid fever in china Estimation of the reproductive number of novel coronavirus (covid-19) and the probable outbreak size on the diamond princess cruise ship: A data-driven analysis Cancer stem cell tumor model reveals invasive morphology and increased phenotypical heterogeneity A paradox involving quasi prior distributions Epidemiology of seasonal influenza: use of surveillance data and statistical models to estimate the burden of disease Bayesian analysis of random-effect models in the analysis of variance. i. posterior distribution of variance-components Exact analytical solutions of the susceptibleinfected-recovered (sir) epidemic model and of the sir model with equal death and birth rates The use of gompertz models in growth analyses, and new gompertz-model approach: An addition to the unified-richards family Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions Epidemiology and arima model of positive-rate of influenza viruses among children in wuhan, china: A nine-year retrospective study (32) immunization coverage (% of total 1-year-olds) in 2018. Measles-containing-vaccine first-dose (MCV1) first-dose immunization (33) immunization coverage (% of total 1-year-olds) in 2018. Measles-containing-vaccine second-dose (MCV2) second-dose immunization (34) immunization coverage (% of total nationally recommended age) in 2018. Pneumococcal conjugate vaccines third-dose (PCV3) third-dose immunization (35) immunization coverage (% of total 1-year-olds) in 2018. Polio third-dose immunization (36) Polio (Pol3) third-dose immunization coverage (% of total 1-year-olds) in 2018. Testing num COVID19 (70) The number of COVID-19 testing cases (ourworldindata.org/-collect the data and the data dates are between Febrary and March on several media). Testing confirm COVID19 (71) The covariate Testing num COVID19 (70) divided by the total number of confirmed cases on the same day with testing num. Testing popu COVID19 (72) The covariate Testing num COVID19 (70) divided by covariate Total popu (2). We illustrate a full description of a posterior computation for the BHGM (2) -(5) by using a Markov chain Monte Carlo (MCMC) simulation (Robert and Casella, 2013) . To start with, we re-express the linear regression (3) in a vector form representationwhere θ l = (θ l1 , · · · , θ lN ) (l = 1, 2, 3) is N -dimensional vector for the latent responses, β l = (β l1 , · · · , β lp ) (l = 1, 2, 3) is p-dimensional vector for the coefficients, and X is N -by-p design matrix whose i-th row vector is given by the p predictors x i = (x i1 , · · · , x ip ) ∈ R p , (i = 1, · · · , N ).The notation I stands for an identity matrix. Each of column vectors of the design matrix X should be standardized: that is, each column vector has been centered, and then columnwisely scaled to have the unit l 2 Euclidean norm.Under the formulation of BHGM (2) -(5), our goal is to sample from the full joint posterior distribution π(θ 1 , θ 2 , θ 3 , σ 2 , Ω 1 , Ω 2 , Ω 3 |y 1:N ) where Ω l = {α l , β l , λ l , τ l , σ 2 l } (l = 1, 2, 3), and a proportional part of this joint density iswhere the matrix Λ l is p-by-p diagonal matrix Λ l = diag(λ 2 l1 , · · · , λ 2 lp ) (l = 1, 2, 3). To sample from the full joint density, we use a Gibbs sampler (Casella and George, 1992) to exploit conditional independences among the latent variables induced by the hierarchy. The following algorithm describes a straightforward Gibbs samplerStep 1. Sample θ 1 from its full conditional distributionHere, the vector r is a N -dimensional vector which is given by r = (y 1 h(θ 21 , θ 31 ), . . . , y N h(θ 2N , θ 3N )) such that the T -dimensional vector h(θ 2i , θ 3i ) (i = 1, · · · , N ) is obtained byStep 2. Sample θ 2i and θ 3i , i = 1, · · · , N , independently from their full conditional distribu- Step 7. Sample τ l , l = 1, 2, 3, independently from conditionally independent posteriors π(τ l |−) ∼ N p (β l |0, σ 2 l τ 2 l Λ l ) · {1/(1 + τ 2 l )}.Note that the densities π(τ l |−) (l = 1, 2, 3) are not expressed in closed forms: we use the slice sampler (Neal, 2003) .Step 8. Sample σ 2 l , l = 1, 2, 3, independently from their full conditionally distributions π(σ 2 l |−) ∼ IG N + p 2 , θ l − 1α l − Xβ l 2 2 + β l Λ −1 * l β l 2 . The file includes extrapolated infection trajectories for top 20 countries that are most severely affected by the COVID-19. The panels in the file display extrapolated posterior mean (red curve)for the Gompertz curve along with pointwise 95% credible intervals (pink region).