key: cord-0746740-ptvaopgj authors: Li, Jing; Wang, Lishi; Guo, Sumin; Xie, Ning; Yao, Lan; Cao, Yanhong; Day, Sara W.; Howard, Scott C.; Graff, J. Carolyn; Gu, Tianshu; Ji, Jiafu; Gu, Weikuan; Sun, Dianjun title: The Data set for Patient Information Based Algorithm to Predict Mortality Cause by COVID-19 date: 2020-04-24 journal: Data Brief DOI: 10.1016/j.dib.2020.105619 sha: c03623f7e5a36dd98c52bc35af23151d8789b37a doc_id: 746740 cord_uid: ptvaopgj Abstract The data of COVID-19 disease in China and then in South Korea were collected daily from several different official websites. The collected data included 33 death cases in Wuhan city of Hubei province during early outbreak as well as confirmed cases and death toll in some specific regions, which were chosen as representatives from the perspective of the coronavirus outbreak in China. Data were copied and pasted onto Excel spreadsheets to perform data analysis. A new methodology, Patient Information Based Algorithm (PIBA) [1], has been adapted to process the data and used to estimate the death rate of COVID-19 in real-time. Assumption is that the number of days from inpatients to death fall into a pattern of normal distribution and the scores in normal distribution can be obtained by observing 33 death cases and analysing the data [2]. We selected 5 scores in normal distribution of these durations as lagging days, which will be used in the following estimation of death rate. We calculated each death rate on accumulative confirmed cases with each lagging day from the current data and then weighted every death rate with its corresponding possibility to obtain the total death rate on each day. While the trendline of these death rate curves meet the curve of current ratio between accumulative death cases and confirmed cases at some points in the near future, we considered that these intersections are within the range of real death rates. Six tables were presented to illustrate the PIBA method using data from China and South Korea. One figure on estimated rate of infection and patients in serious condition and retrospective estimation of initially occurring time of CORID-19 based on PIBA. The data of COVID-19 disease in China and then in South Korea were collected daily from several different official websites. The collected data included 33 death cases in Wuhan city of Hubei province during early outbreak as well as confirmed cases and death toll in some specific regions, which were chosen as representatives from the perspective of the coronavirus outbreak in China. Data were copied and pasted onto Excel spreadsheets to perform data analysis. A new methodology, Patient Information Based Algorithm (PIBA) [1] , has been adapted to process the data and used to estimate the death rate of COVID-19 in real-time. Assumption is that the number of days from inpatients to death fall into a pattern of normal distribution and the scores in normal distribution can be obtained by observing 33 death cases and analysing the data [2] . We selected 5 scores in normal distribution of these durations as lagging days, which will be used in the following estimation of death rate. We calculated each death rate on accumulative confirmed cases with each lagging day from the current data and then weighted every death rate with its corresponding possibility to obtain the total death rate on each day. While the trendline of these death rate curves meet the curve of current ratio between accumulative death cases and confirmed cases at some points in the near future, we considered that these intersections are within the range of real death rates. Six tables were presented to illustrate the PIBA method using data from China and South Korea. Table Subject Death rate estimation using normal distribution, of mean, standard deviations and formulas. The data estimation focuses on the early estimation of death rate of infectious diseases, in particular, the disease COVID-19 caused by 2019-nCoV. Collected data are formatted on Excel spreadsheets for analysing. Data include the total number of patients, total number of deaths, daily numbers of new patients, daily number of new deaths, from starting data of official report to the presented time, e.g., March 22, 2020. Data were collected through the cyberlinke of each official websites and copied and pasted the desired data onto Excel spreadsheets.  These data provide the scientific community with a new methodology to estimate the death rate and then predict the death cases during an epidemic.  Scientific researchers, CDC employees, government officers for disease control and management, and public population, will benefit from these data.  These data will be very useful for the studies with the purpose either of disease control management or of related sources preparation to combat against an outbreak.  Due to the limited amount of data samples collected in this article, some factors, such as the phases of an outbreak and the measurements issued by the department of disease control that might impact the death rate of an epidemic, could be taken into for further insights and development of experiments with a large amount of data. CHD-Coronary heart disease The data of 33 death cases in table 1 have been collected from the official website of the Health Commission of Hubei Province in China, which include the date that patients have onset of symptoms, the date that patients began to be taken into ICU and the date of decease. With these data, the days both from symptoms appearance to death and from ICU intake to death can be calculated. Following normal distribution, the mean score μ and standard deviation σ can be calculated either. Thus the 5 selected scores (μ, μ ± σ and μ ± 2σ) in normal distribution can be obtain as the basic elements for the following estimation and prediction of death rate, which are respectively 2, 8, 13, 19, 25 days. The disease information in table 2 has been collected from the public media before we resume data analysing with the same method of death rate estimation and prediction in South Korea as in China [1] . We have collected accumulative confirmed cases and deaths and then new confirmed cases and new deaths in South Korea. Death rate 1 from the date Symptoms 2020-03-15 2020-03-14 2020-03-13 2020-03-12 2020-03-11 Each score we selected in normal distribution has a specific possibility when we take them into consideration of representatives in bell curve [1] . When we weighted each death rate on a day with their corresponding possibilities and then sum, the total death rate on each day can be obtained. Each curve consisting of several death rate will have a trendline and thus a formula to describe this trend as well as the current ratio between accumulative death cases and confirmed cases on each day (Table 4 ). 2020-03-15 2020-03-14 2020-03-13 2020-03-12 2020-03-11 Current ratio between accumulative death cases and confirmed cases The current ratio between accumulative death cases and confirmed cases is calculated by dividing accumulative death cases with accumulative confirmed cases on each day. The intersect points of three trendlines intersect 1 intersect 2 Death rate in South Korea 0.92% 1.06% The trendlines of death rate 1 and death rate 2 tend to intersect with the trendline of the current ratio finally, because the current ratio will be the real death rate at the end of epidemic. We considered that the intersection value of three trendline (death rate1 and 2, current ratio) will drop in the range of real death rate. When we calculated the death rate separately with the corresponding formula of their trendlines, two intersections have been acquired (Table 5-B) . We pick the maximum value between them to predict new death cases in the following days (Table 6 ). Tables are produced based on the Patient Information Based Algorithm (PIBA) [1] . PIBA has been adapted when estimating the death rate of COVID-19 in Real-time with publicly posted data. Following normal distribution, the different durations with different possibilities between symptom appearance and death have been derived from analysing 33 death cases in Wuhan city of Hubei province in China [2] . Based on these results, the total death rate in regions can be calculated specifically by putting in the different death rates with different durations together. While the trendline of these death rate curves meet the curve of current ratio between accumulative death cases and confirmed cases at some points in the near future, we considered that these intersections are within the range of real death rates. The data analysis was all following normal distribution, either in calculating the possibility of every selected score or in estimating the death rate. After collection of data of COVID patients from South Korea, the data was analysed with PIBA method as indicated above ( Table 2 ). The death rate was first estimated ( Table 3 ). The death rate then was calculated (Table 4 ). Following estimations, the PIBA method then was used to predict the number of deaths in the following week (Table 5 ). The predicated death numbers then were compared to the real death numbers (Table 6 ). considerably lower than expected. Prior expectation has been much higher, based on multiple infectious routes [3] [4] . Using our formula, the results indicate that the current infectious rate is even lower than the rate based on the total numbers (see Fig. 1A ). The infectious rate in Hubei province is currently around 4%, although previously the rate was as high as 39%. On average, the infectious rate overall in China is about 4%, while in Hubei it is 10%. In the rest of the country, it is 0.46%. Among the inpatients, the rate in serious medical condition ranges from 10% to 30% (see Fig. 1B ), while it averages at 18% in China, 19% in Hubei, and 13% in the rest of country (except Hubei). Based on the estimated death rate, on January 22, there should be a total of 150 to 300 inpatients (see Fig. 5C ). Based on the rate of patients who are severely ill among all patients, on January 2, there should be 216 to 315 patients. Based on the effective infection rate and based on the assumption of one week or 14 days from close contact to the onset of symptoms, there might be 2,160 to 68,478 people who were infected around December 20, 2019. If we believe the epidemic doubling time is approximately 6 days, the initial infection source may date back to as early as November or October 2019. Dianjun Sun. Real-time Estimation and Prediction of Mortality Caused by COVID-19 with Patient Information Based Algorithm Clinical features of patients infected with 2019 novel coronavirus in Wuhan Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study This work was partially supported by funding from merit grant I01 BX000671 to WG from the The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.