key: cord-0843606-p2gyj79j authors: Omori, Ryosuke; Mizumoto, Kenji; Nishiura, Hiroshi title: Ascertainment rate of novel coronavirus disease (COVID-19) in Japan date: 2020-05-08 journal: International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases DOI: 10.1016/j.ijid.2020.04.080 sha: 2576f0a42d87fc3b3ce65caae7d70c2f5fdd7536 doc_id: 843606 cord_uid: p2gyj79j Abstract Objective To estimate the ascertainment rate of novel coronavirus (COVID-19). Methods We analyzed the epidemiological dataset of confirmed cases with COVID-19 in Japan as of 28 February 2020. A statistical model was constructed to describe the heterogeneity of reporting rate by age and severity. We estimated the number of severe and non-severe cases, accounting for under-ascertainment. Results The ascertainment rate of non-severe cases was estimated at 0.44 (95% confidence interval: 0.37, 0.50), indicating that unbiased number of non-cases would be more than twice the reported count. Conclusions Severe cases are twice more likely diagnosed and reported than other cases. Considering that reported cases are usually dominated by non-severe cases, the adjusted total number of cases is also about a double of observed count. Our finding is critical in interpreting the reported data, and it is advised to interpret mild case data of COVID-19 as always under-ascertained. Objective: To estimate the ascertainment rate of novel coronavirus (COVID-19). 2 We analyzed the epidemiological dataset of confirmed cases with 3 COVID-19 in Japan as of 28 February 2020. A statistical model was constructed 4 to describe the heterogeneity of reporting rate by age and severity. We estimated 5 the number of severe and non-severe cases, accounting for under-ascertainment. 6 Results: The ascertainment rate of non-severe cases was estimated at 0.44 (95% 7 confidence interval: 0.37, 0.50), indicating that unbiased number of non-cases 8 would be more than twice the reported count. 9 Conclusions: Severe cases are twice more likely diagnosed and reported than 10 other cases. Considering that reported cases are usually dominated by non-severe 11 cases, the adjusted total number of cases is also about a double of observed count. 12 Our finding is critical in interpreting the reported data, and it is advised to 13 interpret mild case data of COVID-19 as always under-ascertained. 14 Keywords: coronavirus; outbreak; diagnosis; reporting; statistical model; The majority of COVID-19 cases exhibit limited severity; 81% of 7 reported cases in China has been mild and only 16% are severe (Guan et al., 8 2020) . It is natural that the ascertainment rate would be different between severe 9 and non-severe cases. The present study aims to estimate the ascertainment rate 10 of non-severe cases, employing a statistical model. 11 We analyzed the epidemiological dataset of confirmed cases with 13 COVID-19 in Japan as of 28 February 2020. The confirmatory diagnosis was 14 made by means of reverse transcriptase polymerase chain reaction (RT-PCR). 15 The present study specifically analyzed cases by (i) prefecture, (ii) age, and (iii) 16 severity. Severe case was defined as (i) severe dyspnea that required oxygen 17 support plus pneumonia or intubation or (ii) case that required management in 18 intensive care unit. 19 We estimated the number of severe and non-severe cases using the ratio 20 of non-severe to severe reported cases (Guan et al., 2020 , Novel, 2020 . We 21 estimated the ascertainment rate among non-severe cases by 1/k, describing data 22 generating process of both severe and non-severe generated from Poisson process 23 with probabilities p x,a for severe cases and kf a p x,a for non-severe cases in age group a and prefecture x, respectively. Here f a denotes the ratio of non-severe to 1 severe reported case of age group a, as estimated from age-specific severity and 2 incidence rate ratio in China (Guan et al., 2020 , Novel, 2020 . We estimate k and 3 p x,a using the loglikelihood function: 4 (1) where N x,a , D ns,x,a and D ns,x,a represent the population size, the observed counts of 5 non-severe and severe cases of age group a in prefecture x, respectively. 6 Maximum likelihood estimates were obtained by maximizing the equation (1) 7 and the profile likelihood-based confidence intervals were computed. 8 The ascertainment rate of non-severe cases, k, was estimated at 0.44 (95% 10 confidence interval (CI): 0.37, 0.50). Resulting estimate of non-severe cases is 11 shown in Figure 1A , showing along with reasonably good fit to severe case data 12 in Figure 1B . Age-specific pattern of estimated non-severe cases was similar to 13 that among severe cases. The largest estimated number of non-severe cases was 14 80 cases (95% CI: 63, 98) among those aged 50-59 years and 78 (95% CI: 61, 15 95) among cases aged 60-69 years, respectively. Such adjustment gives adjusted 16 estimate of the total cases by age group. 17 The present study estimated the ascertainment-adjusted number of cases in 19 Japan, using age-specific severe fraction of cases. We assumed that the ratio of 20 severe to non-severe cases in a given age group is a constant and that the age-21 independent gap is explained by the under-diagnosis and under-reporting, 22 estimating the ascertainment rate of non-severe cases was estimated to be 0.44. 23 As a take home, it must be remembered that severe cases are twice more 1 likely diagnosed and reported than other cases. Reported cases are usually 2 dominated by non-severe cases, and the adjusted total number of cases is about a 3 double of observed count. Our finding is critical in interpreting the reported data, 4 and it is advised to regard the mild case data as always under-ascertained. 5 In addition to the proposed adjustment, it should be noted that the 6 ascertainment rate of severe cases needs to be additionally estimated, and such 7 estimation requires direct measurement of the total number of cases or infected 8 individuals by means of seroepidemiological study or other testing methods of all 9 samples (Nishiura et al., 2020) . That is, the actual total number of cases is greater 10 than what it was adjusted in the present study. Using seroepidemiological 11 datasets, we plan to address relevant issues in the future. Other limitations 12 include that (i) we did not explore detailed natural history, e.g. dynamically 13 changing symptoms over the course of infection, and underlying comorbidities, 14 (ii) we ignored right-censored data, e.g. the time delay from illness onset to 15 severe manifestations, for simplicity. The latter led us to underestimate the 16 ascertainment rate. (iii) it is worth noting that the data of age dependent severity 17 employed in our analysis is only based on the observed data in China. 18 Considering the possibility of underreporting or biased age distribution, the 19 nature of this age distribution may lead to underestimation. Clinical 22 Characteristics of Coronavirus Disease 2019 in China The rate of underascertainment of novel coronavirus (2019-nCoV) infection: 2 Estimation using Japanese passengers data on evacuation flights Multidisciplinary Digital Publishing Institute The epidemiological characteristics of an outbreak of 2019 5 novel coronavirus diseases (COVID-19) in China. Zhonghua liu xing bing xue 6 za zhi= Ministry of Health, Labour and Welfare Despite multiple future tasks, we believe that the present study successfully 1 demonstrated that the ascertainment rate can be partly adjusted by examining 2 age-dependent number of cases including severe cases. The proposed adjustment 3 should be practiced in other country settings and also for other diseases. 4 5 The authors declare no conflicts of interest. 7 analysis, decision to publish, or preparation of the manuscript. 16 This study was based on publicly available data and did not require ethical 18 approval. 19 20