key: cord-0907299-jswt3pun authors: griette, q.; magal, p.; Seydi, O. title: Unreported cases for Age Dependent COVID-19 Outbreak in Japan date: 2020-05-12 journal: nan DOI: 10.1101/2020.05.07.20093807 sha: 257abbf0d0b84815d4a1d0b820d6fc04b4bece7e doc_id: 907299 cord_uid: jswt3pun We investigate the age structured data for the COVID-19 outbreak in Japan. We consider epidemic mathematical model with unreported infectious patient with and without age structure. In particular, we build a new mathematical model which allows to take into account differences in the response of patients to the disease according to their age. This model also allows for a heterogeneous response of the population to the social distancing measures taken by the local government. We fit this model to the observed data and obtain a snapshot of the effective transmissions occurring inside the population at different times, which indicates where and among whom the disease propagates after the start of the public measures. COVID-19 disease caused by the corona virus SARS-CoV-2 first appeared in Wuhan, China on December 31, 2019. Beginning in Wuhan as an epidemic, it then spreads very quickly around the world to become a global pandemic within a month. Symptoms of this disease include fever, shortness of breath, cough, and a non-negligible proportion of infected individuals may develop severe forms of the symptoms leading to their transfer to intensive care units and, in some cases, death. However it is also worth noting that symptomatic and asymptomatic individuals are both infectious [19, 23, 26] making challenging the control of the disease. The virus is characterized by its rapid progression among individuals, most often exponential in the first phase, but also a marked heterogeneity in populations and geographic areas. The number of reported cases worldwide exceeded 3 millions as of May 3, 2020 [28] . The heterogeneity of the number of cases and the severity according to the age groups, especially for children and elderly people, aroused the interest of several researchers [3, 17, 14, 20, 21] . Indeed, several studies have shown that the severity of the disease increases with the age and co-morbidity of hospitalized patients [21, 25] . Let us mention that Wu et al. [24] have shown that the risk of developing symptoms increases by 4% by age in adults aged between 30 and 60 years old while Davies et al. [4] found that there is a strong correlation between chronological age and the likelihood of developing symptoms. However let us mention that a higher probability of developing symptoms does not necessarily imply greater infectiousness as completely asymptomatic individuals can also be contagious. In fact in [26] it has been found that the viral load in the asymptomatic patient was similar to that in the symptomatic patients. Moreover while adults are more likely to develop symptoms, it has been shown in [7] that the viral loads in infected children do not differ significantly from those of adults. These findings suggest that a study of the dynamics of inter-generational spread is fundamental to better understand the spread of the corona virus and most importantly to efficiently fight the COVID-19 pandemic. To this end the distribution of contacts between age groups in society (home, workplace, school ...) is an important factor to take into account when modeling the spread of the epidemic. To account for these facts, some mathematical models have been developed in [1, 2, 4, 17, 20] . In [1] the authors studied the dependence of the COVID-19 epidemic on the demographic structures in several countries but did not focus on the contacts distribution of the populations. In [2, 4, 17, 20] a focus on the social contact patterns with respect to the chronological age has been made by using the contact matrices provided in [16] . While [1, 4] used the example of Japan in their study, their approach is significantly different from ours. In this article we focus on an epidemic model with unreported infectious symptomatic patients (i.e. with mild or no symptoms). Our goal is to investigate the age structured data of the COVID-19 outbreak in Japan. In section 2 we present the age structured data and the mathematical models (with and without age structure). One of the difficulties in fitting the model to the data is that the growth rate of the epidemic is different in each age class, which lead us to adapt our early method presented in [9] . The new method is presented in the Appendix A. In section 3 we present the comparison of the model with the data. In the last section we discuss our results. Patient data in Japan have been made public since the early stages of the epidemic with the quarantine of the Diamond Princess in the Haven of Yokohama. We used data from [29] which is based on reports from national and regional authorities. Patients are labeled "confirmed" when tested positive to COVID-19 by PCR. Interestingly, the age class of the patient is provided for 13 660 out of 13970 confirmed patients (97.8% of the confirmed population) as of April 29. The age distribution of the infected population is represented in Figure 1 and compared to the total population per age class (data from the Statistics Bureau of Japan estimate for October 1, 2019). Both datasets are given in Table 1 and a statistical summary is provided by Table 2 . Note that the high proportion of 20-60 years old confirmed patients may indicate that the severity of the disease is lower for those age classes than for older patients, and therefore the disease transmits more easily in those age classes because of a higher number of asymptomatic individuals. Elderly infected individuals might transmit less because they are identified more easily. The cumulative number of death ( Figure 4 ) is another argument in favor of this explanation. We also reconstructed the time evolution of the reported cases in Figure 2 and Figure 3 . Note that the steepest curves precisely concern the 20-60-years old, probably because they are economically active and therefore have a high contact rate with the population. Figure 1 : In this figure we plot in blue the age distribution of the Japanese population for 10 000 people and we plot in orange the age distribution of the number of reported cases of SARS-CoV-2 for 13660 patients on April 29. We observe that 77% of the confirmed patients belong to the 20-60 years age class. 2 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 12, 2020. Figure 4 : Cumulated number of SARS-CoV-2-induced deaths per age class. We observe that 83% of death occur in between 70 and 100 years old. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 12, 2020. The model consists of the following system of ordinary differential equations: (2.1) This system is supplemented by initial data Here t ≥ t 0 is time in days, t 0 is the starting date of the epidemic in the model, S(t) is the number of individuals susceptible to infection at time t, I(t) is the number of asymptomatic infectious individuals at time t, R(t) is the number of reported symptomatic infectious individuals at time t, and U (t) is the number of unreported symptomatic infectious individuals at time t. Asymptomatic infectious individuals I(t) are infectious for an average period of 1/ν days. Reported symptomatic individuals R(t) are infectious for an average period of 1/η days, as are unreported symptomatic individuals U (t). We assume that reported symptomatic infectious individuals R(t) are reported and isolated immediately, and cause no further infections. The asymptomatic individuals I(t) can also be viewed as having a low-level symptomatic state. All infections are acquired from either I(t) or U (t) individuals. Our study begins in the second phase of the epidemics, i.e. after the pathogen has succeeded in surviving in the population. During this second phase τ (t) ≡ τ 0 is constant. When strong government measures such as isolation, quarantine, and public closings are implemented, the third phase begins. The actual effects of these measures are complex, and we use a time-dependent decreasing transmission rate τ (t) to incorporate these effects. The formula for τ (t) during the third phase is The date D is the first day of public intervention and µ characterises the intensity of the public intervention. A similar model has been used by [6, 9, 10, 11, 12, 13] to describe the epidemics in mainland China, South Korea, Italy, and other countries, and predict the future evolution of the epidemic based on actual data. Interpretation Method t 0 Time at which the epidemic started fitted S 0 Number of susceptible at time t 0 fixed I 0 Number of asymptomatic infectious at time t 0 fitted U 0 Number of unreported symptomatic infectious at time t 0 fitted τ (t) Transmission rate at time t fitted N First day of public intervention fitted µ Intensity of the public intervention fitted 1/ν Average time during which asymptomatic infectious are asymptomatic fixed f Fraction of asymptomatic infectious that become reported symptomatic infectious fixed ν 1 = f ν Rate at which asymptomatic infectious become reported symptomatic fixed Rate at which asymptomatic infectious become unreported symptomatic fixed 1/η Average time symptomatic infectious have symptoms fixed Table 3 : Parameters of the model. At the early stages of the epidemic, the infectious components of the model I(t), U (t) and R(t) must be exponentially growing. Therefore, we can assume that The cumulative number of reported symptomatic infectious cases at time t, denoted by CR(t), is Since I(t) is an exponential function and CR(t 0 ) = 0 it is natural to assume that CR(t) has the following special form: As in our early articles [9, 10, 11, 12, 13] , we fix χ 3 = 1 and we evaluate the parameters χ 1 and χ 2 by using an exponential fit to We use only early data for this part, from day t = d 1 until day t = d 2 , because we want to catch the exponential growth of the early epidemic and avoid the influence of saturation arising at later stages. The estimated parameters χ 1 and χ 2 will vary if we change the interval [d 1 , d 2 ]. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.07.20093807 doi: medRxiv preprint Once χ 1 , χ 2 , χ 3 are known, we can compute the starting time of the epidemic t 0 from (2.5) as : We fix S 0 = 126.8 × 10 6 , which corresponds to the total population of Japan. We fix the fraction f of symptomatic infectious cases that are reported. We assume that between 80% and 100% of infectious cases are reported. Thus, f varies between 0.8 and 1. We assume that the average time during which the patients are asymptomatic infectious 1/ν varies between 1 day and 7 days. We assume that the average time during which a patient is symptomatic infectious 1/η varies between 1 day and 7 days. In other words we fix the parameters f , ν, η. Since f and ν are known, we can compute Computing further (see below for more details), we should have (2.10) By using the approach described in [5, 22] the basic reproductive number for model (2.1) is given by By using (2.8) we obtain (2.11) In what follows we will denote N 1 , . . . , N 10 the number of individuals respectively for the age classes (2.12) The model for the number of asymptomatic infectious individuals I 1 (t), . . . , I 10 (t), respectively for the age classes [0, 10[, . . . , [90, 100[, is the following . . . (2.13) The model for the number of reported symptomatic infectious individuals R 1 (t), . . . , R 10 (t), respectively for the age classes [0, 10[, . . . , [90, 100[, is (2.14) 6 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. In their survey [16] , Prem and co-authors present a way to reconstruct contact matrices from existing data and provide such contact matrices for a number of countries including Japan. Based on the data provided by Prem et al. [16] for Japan we construct the contact probability matrix φ. More precisely, we inferred contact data for the missing age classes [80, 90[ and [90, 100 [. The precise method used to construct the contact matrix γ is detailed in Appendix B. The precise contact matrix γ we used is the following 17) where the i th line of the matrix γ ij is the average number of contact made by an individuals in the age class i with an individual in the age class j during one day. Notice that the higher number of contacts are achieved within the same age class. The matrix of conditional probability φ of contact between age classes is the following . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. Figure 6 : Cumulative number of cases. We plot the cumulative data (reds dots) and the best fits of the model CR(t) (black curve) and CU (t) (green curve). We fix f = 0.8, 1/η = 7 days and 1/ν = 7 and we apply the method described in [13] . The best fit is d 1 = April 2, d 2 = April 5, N = April 27, µ = 0.6, χ 1 = 179, χ 2 = 0.085, χ 3 = 1 and t 0 = January 13. Figure 7 : Daily number of cases. We plot the daily data (black dots) with DR(t) (blue curve). We fix f = 0.8, 1/η = 7 days and 1/ν = 7 and we apply the method described in [13] . The best fit is d 1 = April 2, d 2 = April 5, N = April 27, µ = 0.6, χ 1 = 179, χ 2 = 0.085, χ 3 = 1 and t 0 = January 13. The daily number of reported cases from the model can be obtained by computing the solution of the following equation: The model to compute the cumulative number of death from the reported individuals is the following where η D is the death rate of reported infectious symptomatic individuals and p is the case fatality rate (namely the fraction of death per reported infectious individuals). In the simulation we chose 1/η D = 6 days and the case fatality rate p = 0.286 is computed by using the cumulative number of confirmed cases and the cumulative number of deaths (as of April 29) as follows p = cumulated number of deaths cumulated number of reported cases = 393 13744 . 8 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. In order to describe the confinement for the age structured model (2.12)-(2.15) we will use for each age class i = 1, . . . , 10 a different transmission rate having the following form The date D i is the first day of public intervention for the age class i and µ i is the intensity of the public intervention for each age class. In Figure 9 we combine the method described in the Appendix A to estimate the parameters τ i from the data. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. In order to understand the role of transmission network between age groups in this epidemic, we plot in Figure 10 the transmission matrices computed at different times. The transmission matrix is the following where the matrix φ describes contacts and is given in (2.18) , and the transmission rates are the ones fitted to the data as in Figure 9 τ During the early stages of the epidemic, the transmission seems to be evenly distributed among age classes, with a little bias towards younger age classes (Figure 10 (a) ). Younger age classes seem to react more quickly to social distancing policies than older classes, therefore their transmission rate drops rapidly (Figure 10 (b) and (c)); one month after the start of social distancing measures, the transmission mostly occurs within elderly classes (60-100 years, Figure 10 (d) ). The recent COVID-19 pandemic has lead many local governments to enforce drastic control measures in an effort to stop its progression. Those control measures were often taken in a state of emergency and without any real visibility concerning the later development of the epidemics, to prevent the collapse of the health systems under the pressure of severe cases. Mathematical models can precisely help see more clearly what could be the future of the pandemic provided that the particularities of the pathogen under 10 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . consideration are correctly identified. In the case of COVID-19, one of the features of the pathogen which makes it particularly dangerous is the existence of a high contingent of unidentified infectious individuals who spread the disease without notice. This makes non-intensive containment strategies such as quarantine and contact-tracing relatively inefficient but also renders predictions by mathematical models particularly challenging. Early attempts to reconstruct the epidemics by using SIUR models were performed in [6, 9, 10, 11, 12] , who used them to fit the behavior of the epidemics in many countries, by including undetected cases into the mathematical model. Here we extend our modeling effort by adding the time series of deaths into the equation. In section 3 we present an additional fit of the number of disease-induced deaths coming from symptomatic (reported) individuals (see Figure 8 ). In order to fit properly the data, we were forced to reduce the length of stay in the R-compartment to 6 days (on average), meaning that death induced by the disease should occur on average faster than recovery. The major improvement in this article is to combine our early SIUR model with chronological age. Early results using age structured SIR models were obtained by Kucharski et al. [8] but no unreported individuals were considered and no comparison with chronological data were performed. Indeed in this article we provide a new method to fit the data and the model. The method extends our previous method for the SIUR model without age (see Appendix A). The data presented in section 2 suggests that the chronological age plays a very important role in the expression of the symptoms. The largest part of the reported patients are between 20 and 60 years old (see Figure 1 ), while the largest part of the deceased are between 60 and 90 years old (see Figure 4 ). This suggests that the symptoms associated with COVID-19 infection are more severe in elderly patients, which has been reported in the literature several times [14, 25] . In particular, the probability of being asymptomatic (our parameter f ) should in fact depend on the age class. Indeed, the best match for our model (see Figure 9 ) was obtained under the assumption that the proportion of symptomatic individual among the infected increases with the age of the patient. Moreover, our model reveals the fact that the policies used by the government to reduce contacts between individuals have strongly heterogeneous effects depending on the age classes. Plotting the transmission matrix at different times (see Figure 10 ) shows that younger age classes react more quickly and more efficiently than older classes. This may be due to the fact that the number of contacts in a typical day is higher among younger individuals. As a consequence, we predict that one month after the effective start of public measures, the new transmissions will almost exclusively occur in elderly classes. A Appendix: Method to fit of the age structured model to the data We first choose two days d 1 and d 2 between which each cumulative age group grows like an exponential. By fitting the cumulative age classes [0, 10[, [10, 20[, . . . and [90, 100 [ between d 1 and d 2 , for each age class j = 1, . . . 10 we can find χ j 1 and χ j We choose a starting time t 0 ≤ d 1 and we fix and we obtain where χ i j ≥ 0, ∀i = 1, . . . , n, ∀j = 1, 2, 3. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. Figure 11 : We plot an exponential fit for each age classes using the data from Japan. We assume that CR 1 (t) = ν 1 1 I 1 (t), . . . where By assuming that the number of susceptible individuals remains constant we have . . . and U n (t) = ν n 2 I n (t) − ηU n (t). (A.5) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. If we assume that the U j (t) have the following form U j (t) = U j e χ j 2 t , (A. 6) then by substituting in (A.5) we obtain (A.7) We define the error between the data and the model as follows . . . Let the matrix φ be fixed. We look for the vector τ = (τ 1 , . . . , τ n ) which minimizes of Define for each j = 1, . . . , n K j (t) := χ j 2 + ν I j e χ j 2 t and H j (t) := S j φ j1 Hence for each j = 1, . . . , n d2 d1 and by setting we deduce that 13 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . whenever φ is diagonal. Therefore the optimum is reached for any diagonal matrix. Moreover by using similar considerations, if several χ 2 j are equal, we can find a multiplicity of optima (possibly with φ not diagonal). This means that trying to optimize by using the matrix φ does not yield significant and reliable information. In the figure 12 below, we present an example of application of our method to fit the Japanese data. We use the period going from March 20 to April 15. The survey [16] presents reconstructed contact matrices for a number of countries including Japan for the 5-years age classes [0, 5), [5, 10) , ..., [75, 80) at various locations (work, school, home, and other locations) and a compilation of those contact matrices to account for all locations. The precise description of the compilation is presented in the paper. Note that this paper is a follow-up of Mossong et al. [15] where the survey procedure is described (including the data collection protocol) for several European countries participating in the POLYMOD study. The data is publicly available online [32] and is presented in the formed of a zipped collection of spreadsheets, containing the data for several countries in columns X1 X2 ... X16. The columns stand for the average number of contact of one individual of the corresponding age class (0-5 years for X1, 5-10 years for X2, etc...), with an individual of the age class indicated by the row (first row is 0-5 years, second is 5-10 years etc...). Since the age span covered by the study stops at 80, we had to infer the number of contacts for people over the age of 80. We postulated that most people aged 80 or more are retired and that their behaviour does not significantly differs (statistically speaking) from the behaviour 14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) of people in the age class [75, 80). Therefore we completed the missing columns by copying the last available information and shifting it to the bottom. We repeated the procedure for lines. We believe that the introduced bias is kept to a minimum since the numerical values are relatively low compared to the diagonal. Because we use 10-years ages classes and the data is given in 5-years age classes, we had to combine adjacent columns to recover the average number of contacts. To combine columns together, we used the following formula C i = N 2(i−1)+1 C 2(i−1)+1 + N 2(i−1)+2 C 2(i−1)+2 N 2(i−1)+1 + N 2(i−1)+2 , where the column C i corresponds to the average number of contacts of an individual taken at random in the [10(i − 1), 10i) and C i is the average number of contacts of an individual taken at random in the age class [5(i − 1), 5i). To combine two lines, we simply use the sum of the data L i = L 2(i−1)+1 + L 2(i−1)+2 . The matrix γ in (2.17) is the transpose of the array obtained by the former procedure applied to the "all locations" dataset. Then φ is obtained by scaling the lines of γ to 1, i.e. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.07.20093807 doi: medRxiv preprint Age could be driving variable SARS-CoV-2 epidemic trajectories worldwide, medRxiv Modeling strict age-targeted mitigation strategies for COVID-19, arXiv SARS-CoV-2 infection in children: Transmission dynamics and clinical characteristics Age-dependent effects in the transmission and control of COVID-19 epidemics On the definition and the computation of the basic reproduction ratio R 0 in models for infectious diseases in heterogeneous populations Estimating the last day for COVID-19 outbreak in mainland China An analysis of SARS-CoV-2 viral load by patient age, Online preprint Early dynamics of transmission and control of COVID-19: a mathematical modelling study Understanding unreported cases in the 2019-nCov epidemic outbreak in Wuhan, China, and the importance of major public health interventions Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data A COVID-19 epidemic model with latency period A model to predict COVID-19 epidemics with applications to South Korea Predicting the number of reported and unreported cases for the COVID-19 epidemic in China SARS-CoV-2 Infection in Children Social contacts and mixing patterns relevant to the spread of infectious diseases Projecting social contact matrices in 152 countries using contact surveys and demographic data The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study Effect of a one-month lockdown on the epidemic dynamics of COVID-19 in France, medRxiv preprint Transmission of 2019-nCoV infection from an asymptomatic contact in Germany Age-structured impact of social distancing on the COVID-19 epidemic in India Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: an observational cohort study Reproduction numbers and subthreshold endemic equilibria for compartmental models of disease transmission Presymptomatic Transmission of SARS-CoV-2-Singapore Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Report of the WHO-China Joint Mission on Coronavirus Disease WHO Coronavirus disease (COVID-19) situation report number 104 Acknowledgements: Data from covid19japan.com. Funding: Q.G. and P.M. acknowledge the support of ANR flash COVID-19 MPCUII.Keywords: corona virus, age-structured data, reported and unreported cases, isolation, quarantine, public closings; epidemic mathematical model