key: cord-0710007-u7d46q0u authors: Lee, Keng-Wei; Chien, Tsair-Wei; Yeh, Yu-Tsen; Chou, Willy; Wang, Hsien-Yi title: An online time-to-event dashboard comparing the effective control of COVID-19 among continents using the inflection point on an ogive curve: Observational study date: 2021-03-12 journal: Medicine (Baltimore) DOI: 10.1097/md.0000000000024749 sha: 03a30fbcdc119fe3c88c80c06c798721702e7d50 doc_id: 710007 cord_uid: u7d46q0u BACKGROUND: During the COVID-19 pandemic, one of the frequently asked questions is which countries (or continents) are severely hit. Aside from using the number of confirmed cases and the fatality to measure the impact caused by COVID-19, few adopted the inflection point (IP) to represent the control capability of COVID-19. How to determine the IP days related to the capability is still unclear. This study aims to (i) build a predictive model based on item response theory (IRT) to determine the IP for countries, and (ii) compare which countries (or continents) are hit most. METHODS: We downloaded COVID-19 outbreak data of the number of confirmed cases in all countries as of October 19, 2020. The IRT-based predictive model was built to determine the pandemic IP for each country. A model building scheme was demonstrated to fit the number of cumulative infected cases. Model parameters were estimated using the Solver add-in tool in Microsoft Excel. The absolute advantage coefficient (AAC) was computed to track the IP at the minimum of incremental points on a given ogive curve. The time-to-event analysis (a.k.a. survival analysis) was performed to compare the difference in IPs among continents using the area under the curve (AUC) and the respective 95% confidence intervals (CIs). An online comparative dashboard was created on Google Maps to present the epidemic prediction for each country. RESULTS: The top 3 countries that were hit severely by COVID-19 were France, Malaysia, and Nepal, with IP days at 263, 262, and 262, respectively. The top 3 continents that were hit most based on IP days were Europe, South America, and North America, with their AUCs and 95% CIs at 0.73 (0.61–0.86), 0.58 (0.31–0.84), and 0.54 (0.44–0.64), respectively. An online time–event result was demonstrated and shown on Google Maps, comparing the IP probabilities across continents. CONCLUSION: An IRT modeling scheme fitting the epidemic data was used to predict the length of IP days. Europe, particularly France, was hit seriously by COVID-19 based on the IP days. The IRT model incorporated with AAC is recommended to determine the pandemic IP. Over 87,186,540 confirmed cases and 1,883,761 death toll during the COVID-19 pandemic have been reported as of October 19, 2020 . [1, 2] The total number of deaths (1, 883, 761) substantially surpassed the severe acute respiratory syndrome in 2003 (final death toll of 774) and the Middle East respiratory syndrome in 2012 (final death toll of 858). [3] [4] [5] 1.1. The inflection point on the ogive curve When a new disease (e.g., starts to spread, one of the frequently asked questions is which countries are severely hit so far. [5, 6] To measure the impact of COVID-19, the number of cumulative infected cases (NCIC) or death toll is commonly reported. On the other hand, the inflection point (IP) is described as the proxy of the effective control of COVID-19 if the IP days are shorter [7] [8] [9] [10] in comparison to the longer IP days representing the extent of struggle to fight against COVID-19. More than 90,271 articles related to COVID-19 have been published in the PubMed database. [11] However, none described the IP determination in detail and the use of IP days to measure the impact of COVID-19 in academics. An IP is a point on a smooth plane curve in which the curvature changes sign from an increasing concave (concave downward) to a decreasing convex (concave upward) shape, or vice versa. [12] According to the trend of NCICs using the ogive curve to display, an epidemic is divided into 4 stages. [13] The IP is currently between Stage II and III. (1) Stage I: Initial stage. In the beginning, few cases are diagnosed, and there is a flattened curve with a slow upward trend. (2) Stage II: Outbreak stage. The virus has accelerated and spread widely, and the number of daily confirmed cases has exponentially multiplied. (3) Stage III: Post-peak stage. The number of daily confirmed cases begins to decrease after reaching a peak in Stage II. In which, the IP appears and begins to present a downward trend. (4) Stage IV: Rehabilitation stage. The number of newly confirmed cases has continued to decline, and the epidemic is basically well under control. Many researchers [14] [15] [16] [17] [18] [19] [20] [21] proposed mathematical models to predict the number of COVID-19 cases. A few investigated the IP. [7] [8] [9] [10] Nonetheless, none considered applying the IP as effective control of COVID-19. One possible reason is the difficulty in objectively determining the IP of countries. [8] The premise of using the IP to represent the capacity to control COVID-19 is to make the modeling easily and accurately fit the data and then apply the IP in a comparison between countries. In the past, most of the parameters in the mathematical model are assumed that other conditions are constant. [13] However, the parameters that characterize the diagnosis rate, cure rate, and mortality rate in countries are varied and changeable. That is, each country has its own model parameters instead of the overall parameters in all epidemic situations. [9, 10] Moreover, numerous studies [22] [23] [24] [25] [26] [27] only described mathematical methods without addressing the approach (or method) of the parameter estimation in detail (such as providing MP4 video and the original dataset to verify the study results). As such, it is a challenge for epidemiologists to use the IP and compare the effective control of COVID-19 in countries; the shorter length of IP days means effective control of COVID-19. We are motivated to develop a modeling method not only describing the model building and parameter estimation in detail but also providing an MP4 video and original data to interpret how to predict the IP using the model parameters. Several methods, such as quadratic, cubic, and exponential functions, have been applied to predict epidemic cases. [9] Taylor's power law [28, 29] and Ma's population aggregation critical density [30, 31] were reported in the literature. [32] None of them used item response theory (IRT) [33, 34] to build the model of COVID-19 for each country. The IRT probability model with an ogive curve, like the epidemic trend, is based on the NCIC. Only 3 parametersguessing, discrimination, and item difficultyare required in the IRT model. The infected days (deemed as ability parameter of theta in IRT) are located as a continuum scale on Axis X, denoted by either probit or logit, [35, 36] from the left to the right side (e.g., between À5 and 5 on a scale). The longer infected days yield more NCIC in nature. The corresponding probabilities are present on Axis Y, which can be converted further to the expected NCIC using a transformation formula (see the next sections in Methods). As such, the IP on a given ogive curve can be determined for comparing capacities of the effective control of COVID-19 between countries. The aims of the current study are to (i) build a model fitting the epidemic NCIC data, (ii) demonstrate a feasible way to estimate model parameters, and (iii) determine the IP for each country and then compare the differences in IPs among continents. We downloaded COVID-19 outbreak NCICs in countries/ regions on October 19, 2020, from the GitHub website [37] in 296 countries/regions, including the United States and provinces in China (see Supplemental Digital Content 1, http://links.lww. com/MD/F736). All downloaded data were publicly deposited on the website. [37] Ethical approval was not necessary for this study Lee et al. Medicine (2021) (2), where parameter c equals zero using Eq. (3) to compress the NCIC to a percentage value, D(=1.7) is an adjustment factor from a probit scale to a logit scale. [35, 36] PðuÞ ¼ c þ ð1 À cÞ 1 þ e ÀDaðuÀbÞ ; ð1Þ PðuÞ ¼ 1 1 þ e À1:7ÃaðuÀbÞ ¼ e À1:7ÃaðuÀbÞ 1 þ e À1:7ÃaðuÀbÞ ; ð2Þ Parameters (i.e., a and b) represent the discrimination (i.e., the slop) and the item difficulty (i.e., the location position on Axis X, the more toward the left, the earlier the outbreak occurred). These 2 parameters were set in a range between 0 and 4 for a (or À5 and 5 for a b) when modeling the epidemic situation for each item (or, say, country/region in this study). In countries/ regions, all NCICs were transformed into percentage values from 0 to 1 [38] [39] [40] shown on Axis Y (right in Fig. 1 ); see Eq. (3). Where O i denotes the originally observed case numbers, the maximum and minimum are symbolled by Max and Min, respectively. As such, the parameter c would be set at zero when transformed by Eq. (3). The parameter theta in Eq. (2) (denoted by the control ability of CIVID-19) was then transformed by Eq. (4) to a continuum scale (on Axis X) from À5 to 5. where N is the infected days, n i represents the ith day (the longer days respond to a higher OP i in Eq. (3)). The probability (or denoted by the expected percentage, EP) can be obtained by Eq. (2) if parameters a and b are known (see the next section about the parameter estimation). For instance, the Theta(=0) on the scale (Axis X) can be yielded by Eq. (4) when the day is at 50 and a total infected day equals 100. The feature of an ogive curve. The IP is at the moment between Stage II (Outbreak from points O to A) and III (postpeak from points B to Q). [13] The relationship between IP days, the transformed OP i (right on Axis Y) and the theta (on Axis X) is present in Fig. 1 where EP i is denoted by p(À) in Eq. (2) . For instance, the NCIC i for a country is estimated at 50 when EP i = 0.5, Max is 100, Min is 0, and the footnote i stands for the ith day on Axis X. It is worth noting that the EP i is computed by Eq. (2) when parameters a and b have been known. How to estimate the parameters (i.e., a and b) and determine the IP for comparison in countries will be introduced in the next sections. Several formulas and functions (i.e., based on Eqs. from (1) to (5) A. To minimize the total residuals using the Microsoft function (i.e., setting objective): B. To estimate model parameters (i.e., assigning variable cells): All those parameters a and b in Eq. (2) were required for estimation. Parameters (i.e., a and b) were constrained within a range between (0, 4) and (À5, 5), respectively. D. Data arrangement The observed NCIC for a specific country was transformed into a percentage based on Eq. (3). The residuals across all countries/regions were summed; see Eq. (6) used to reach the bestfit to the epidemic data. E. To perform the Solver add-in tool The Microsoft Solver add-in [41] [42] [43] was performed to estimate model parameters; see Multimedia Appendix 2, http://links.lww. com/MD/F739. The ogive curve can be used for determining the IP and predicting the future NCIC (see the next section). The IP-search scheme on a given ogive curve is based on the model fitted to the data. The IP is determined by the computation of absolute advantage coefficient (AAC) [44, 45] (or, say, dimension coefficient (DC) [46] ) in Eq. (7). where AAC is computed by the 3 consecutive EP i (denoted by g 1 , g 2 , and g 3 in Eq. (7)) (referred to Fig. 1 ). The IP is located at the minimum point across all possible AACs on an ogive curve. Those IP days that fail to obtain from Eq. (7) or just exist in the last week (e.g., at Stage I or II in epidemic [13] ) are deemed as censored data, implying that the NCICs do not approach to an appropriate IP (e.g., at Stages I and II). As such, the time-to-event analysis (a.k.s. survival analysis) was performed using Kaplan-Meier [47] in 6 groups of continents. The area under the curve (AUC) was computed in Eq. (8) and the 95% confidence intervals (CIs = AUC ± 1.96 * SE) in Eq. (9) . where P i is the probability of a given country. The h pi is the distance between the 2 probabilities of P i and P i+1 . The n is the sample size. All IP days in counties/regions were compared on a choropleth map [48] : the darker color means the more number of IP days in the epidemic. During the COVID-19 epidemic, not all situations include the entire 4 stages (Fig. 1) . For example, the epidemic situations might be merely included at Stage I or II. As such, the maximal OP i will reach a percentage less than 1.0 (called compressed rate, CR), such as 0.50 on Axis Y when theta at 5.0 on Axis Y using Eq. (2). The OP i and the expected NCICi are redefined by Eqs. (10) and (11). We expect that the CR involved into the IRT model (called IRT-CR model) can make the model residuals significantly smaller than those in the IRT model due to the IRT-CR model having a small number of residual when taking scenarios of Stages I, II (or III) into account. The paired t test was performed to examine the difference in residuals between the 2 models (e.g., IRT and IRT-CR models). Residuals are significantly different (t = 12.45, df = 295, P < .0001) between IRT and IRT-CR models with means (0.63 and 1.25) and variances (0.82 and 1.25), respectively. The CR is critical to make the IRT-CR model have a smaller residual than that in the IRT model. The following analyses were based on the IRT-CR model. A total data of 296 countries/regions were collected, consisting of 7 censored data [e.g., Tibet (China) with only one case]. The top 3 countries hit severely by COVID-19 were France, Malaysia, and Nepal, with the length of IP days at 263, 262, and 262, respectively (Fig. 2) . The length of IP days appears once the country/region is clicked at the link. [49] It can be seen that most areas in China are colored with light yellow, indicating earlier IP days in Chinese provinces. The difference in IP across continents is analyzed in the next 2 sections. Readers are invited to click on the colored area on the choropleth map ( Figure 2 ) to examine the ogive curve plotted on Google Maps. The epidemic in France is shown in Figure 3 . We can see that 2 waves exist in an unacceptable fitting effect (Panel A in Fig. 3) . The epidemic on a daily basis is demonstrated in Panel B in Figure 4 . If the infected days in the first wave were constrained between January 24 and May 30, 2020, the model fits the data rather well (Panel C in Fig. 3) . The day on April 4 was found at the IP on the ogive curve by viewing the minimal ACC (or say DC in Eq. (7) and see Panel A in Fig. 4) . [44] [45] [46] If the second wave was observed (Panel B in Fig. 4) , the IP at the minimal ACC was located on October 17, 2020 (Panel C in Fig. 4) . The other 2 countries (i.e., Malaysia and Nepal) are present in Figures 5 and 6 . The epidemic trend in Malaysia (Fig. 5) is similar to France. However, the epidemic in Nepal is extremely different from France and Malaysia because only one wave is present, but no IP was found in Nepal because the epidemic is still at stages I and II (Fig. 1 ). The probabilities of time-to-event analysis are shown in Figure 7 . We found that the top 3 continents most severely hit using the IP to interpretation were Europe, South America, and North America, with their AUCs and 95% CIs at 0.73 (0.61 to 0.86), 0.58 (0.31 to 0.84), and 0.54 (0.44 to 0.64), respectively (see Table 1 ). An online time-event result was demonstrated and shown on Google Maps [50] in comparison to the IP probabilities across continents. The difference was found in residuals between IRT-CR and IRT models, indicating the IRT-CR model fitting the epidemic data better than the IRT model. Seven equations were constructed in Microsoft Excel to reach the study goals: building a model fitting the epidemic data for all countries/regions in the world. The Solver add-in was applied to estimate model parameters, which is common to ordinary researchers who are familiar with the spreadsheet in Microsoft Excel. All IPs in 296 countries/regions were successfully constructed. A significant difference was found in IP among continents using Eqs. (10) and (11) to calculate AUCs and 95% CIs. We observed that the top 3 countries most severely hit by COVID-19 were France, Malaysia, and Nepal, with IPs over 262 days. The top 3 continents that hit hardest were Europe, South America, and North America. During the COVID-19 outbreak, one of the frequently asked questions is which countries (or continents) are hit hardest. Besides the NCIC and the fatality that were used to evaluate the struggle to fight against the COVID-19 pandemic, none applied IP days to measure the effective control of COVID-19. For instance, South Korea is one of the few countries in the world to have successfully maintained a flat infection curve for more than 50 days, [51] but no such information about the 50 days' was provided to readers: how to determine the start IP point. Although many mathematical models [14] [15] [16] [17] [18] [19] [20] [21] were proposed to predict the NCIC and some IP topics were discussed, [7] [8] [9] [10] none applied IRT to construct a predictive model for each country during the COVID-19 pandemic describing in detail the IP search, not to mention comparing the IPs in countries/regions/ continents. In this study, the model can be fitted well to the pandemic NCIC data using the IRT-CR model. The IP search scheme [44] [45] [46] is unique and viable, even if determining the IP days is reportedly different and difficult. [8] The IP days can be objectively determined on the ogive curve. Furthermore, the time-to-event analysis was applied to compare the difference in IP among continents due to existing Although using the mean of the previous several daily case numbers (e.g., 2 or 7 days) to gain the approximate IP is applicable, [10, 52] this approach is extremely unreliable and problematic due to different days used to determine the maximum of the mean daily cases that result in disparate IP days. The comparison of IPs across countries/regions/continents is feasible and viable on a choropleth map (Fig. 2) . A visual time-toevent dashboard (Fig. 7) is also worth recommending to readers. The computation of AUC and 95% CIs based on Eqs. (8) and (9) is also equally suggested to readers. Over 2613 articles published in the PubMed database were searched by using the keyword "survival analysis" in the title. [53] No such comparisons were made and seen with an online dashboard [50] and AUCs and 95% CIs of survival (or time-toevent) analysis as we did in this study. First, in our suggested IP determination using AAC to search IP on a given ogive curve (e.g., on Panel A and C in Fig. 4) , the comparison of effective control of COVID-19 in countries/ regions/continents can be made. We found the second wave (or peak) occurring in France and Malaysia, and the first wave still exists in Nepal (e.g., Panel B in Fig. 5 ) by observing the most misfit pattern (e.g., a large model residual). Second, no such MP4 video about how to model NCIC and parameter estimation was provided to readers who are interested in replicating a study on their own in the future. Third, using Solver add-in is common, but few were illustrated in modeling epidemic situations. Data and model building in Microsoft Excel were provided in Supplemental Digital Contents 1, http://links.lww.com/MD/F736 and 3, http://links.lww.com/ MD/F738. It is easy to understand the approach of searching IP days for countries affected by the COVID-19 pandemic. Notice that an error of executing the Solver add-in in a macro (e.g., coding the statement of SolverSolve UserFinish: = True, Our study has some limitations. First, although the data were downloaded from Google Sheets on a daily basis, we are concerned with different countries that were properly combined into their respective countries/regions for collecting NCICs and searching IP. This is a point to consider in future relevant research. Again, a lot of COVID-19 countries/regions with mild and asymptomatic cases were not detected and documented. [54] For instance, up to half of the infected cases in Iceland and the Diamond Princess cruise ship are asymptomatic [55] [56] [57] and a large amount of asymptomatic carriers of SARS-CoV-2 existed after elimination of clinical cases of COVID-19 in Wuhan City. [58] Therefore, SARSCoV-2 may exist in a population without clinical cases for a long period that has not been taken into in this study. The model building and IP search would be biased. Second, the minimal AAC [44] [45] [46] defined as the location of IP on an ogive curve should be verified in future studies. Similarly, the method, using the mean of the previous several daily case numbers (e.g., 2 or 7 days), might be applicable, [10, 52] feasible, and simple, but it is necessary to compare the difference in effect from this study in the future. Third, the case number is changeable and varied day by day. The model parameters during the COVID-19 pandemic in countries/regions should be optimized on a daily or weekly basis to make the prediction and the IP determination as accurate as possible. Fourth, several issues related to IP which were not discussed in this study need further research, such as whether (i) the ratio of IP to the total epidemic days is suitable for representing the effective control of COVID-19 (e.g., IP and the ratio are 150 (62.7%), 24 (8.8%), 163 (97.4%) for the US, China, and France, respectively), (ii) the shorter IP really implies less suffering and struggling to fight against the COVID-19 pandemic (e.g., Wuhan's IP= 18), and (ii) the association between IP and the corresponding NICIC exists (e.g., India's IP= 238 and the NCIC = 102,526) or not. Fifth, Solver add-in is not unique approach to estimate model parameters. Many other methods can be applied to estimation, such as Warm's Weighted Mean Likelihood Estimate, [59] Anchored Maximum Likelihood Estimation, [60] and Weighted Likelihood Estimation. [61] It is worthy of comparison further in the future. Finally, although the time-to-event analysis was applied to compare the difference in IP among continents due to censored data in existence, whether another statistical method can be used to compare their IPs is worthy of discussion. For instance, the bootstrapping method [62] can be used to examine their 95% CIs among continents and shown on a dashboard [63] if censored data are ignored. An IRT model fitting the epidemic data was applied to determine the IP days on a given ogive curve. Europe, particularly France, was severely hit by COVID-19. The IRT model incorporated with the AAC is recommended to determine the pandemic IP and is not just limited to COVID-19 as illustrated in this study. Coronavirus disease 2019 (COVOD-19) outbreak Coronavirus disease 2019(COVOD-19) outbreak Risks to healthcare workers with emerging diseases: lessons from MERS-CoV, Ebola, SARS, and avian flu Are coronavirus diseases equally deadly? Comparing the latest coronavirus to MERS and SARS Estimation of MERScoronavirus reproductive number and case fatality rate for the Spring 2014 Saudi Arabia Outbreak: insights from publicly available data The computation of case fatality rate for novel coronavirus (COVID-19) based on Bayes theorem: an observational study Predication of inflection point and outbreak size of COVID-19 in new epicentres Treating Covid-19 at the inflection point The inflection point about COVID-19 may have passed SEIR-Based COVID-19 transmission model and inflection point prediction analysis Articles related to 2019-nCoV in PubMed Definition of inflection point Analysis of second outbreak of COVID-19 after relaxation of control measures in India Forecasting COVID-19 Transmission dynamics of the COVID-19 outbreak and effectiveness of government interventions: a datadriven analysis Now casting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Data-based analysis, modeling and forecasting of the COVID-19 outbreak Modeling the epidemic dynamics and control of COVID-19 outbreak in China Effect of delay in diagnosis on transmission of COVID-19 A model based study on the dynamics of COVID-19: prediction and control Effects of control measures on the dynamics of COVID-19 and double-peak behavior in Spain How data analytics and big data can help scientists in managing COVID-19 diffusion: modeling study to predict the COVID-19 diffusion in Italy and the Lombardy region A data driven epidemic model to analyze the lockdown effect and predict the course of COVID-19 progress in India Epidemic trend of COVID-19 transmission in India during lockdown-1 phase Outbreak trends of coronavirus disease-2019 in India: a prediction Modeling and forecasting the COVID-19 pandemic in India The prediction for development of COVID-19 in global major epidemic areas through empirical trends in China by utilizing state transition matrix model Aggregation, variance, and mean Taylor's Power Law: Order and Pattern in Nature. The Netherland Further interpreted Taylor's Power Law and population aggregation critical density Power law analysis of the human microbiome Predicting the outbreak risks and inflection points of COVID-19 pandemic with classic ecological theories Practical applications of item characteristic curve theory Applications of Item Response Theory to Practical Testing Problems Logit and Probit: What are they? Origin of the scaling constant d = 1.7 in item response theory Novel Coronavirus (nCoV) Data Repository Rasch analysis for continuous variables Development of a Microsoft Excel tool for one-parameter Rasch model of continuous items: an application to a safety attitude survey A Rasch model for continuous ratings An app for classifying personal mental illness at workplace using fit statistics and convolution neural networks: survey-based quantitative study An app for detecting bullying of nurses using convolution neural networks and web-based computerized adaptive testing: development and usability study An app developed for detecting nurse burnouts using the convolution neural networks in Microsoft Excel: population-based questionnaire study Using the separation index for identifying the dominant role in an organization: a case of publications in organization innovation Using the separation index to identify the most dominant role: a case of application on COVID-19 outbreak Cronbach's alpha with the dimension coefficient to jointly assess a scale's quality Understanding Kaplan-Meier and survival statistics Choropleth map legend design for visualizing the most influential areas in article citation disparities: a bibliometric study Comparison of inflection points in epidemic around the world Time-to-event analysis of inflection points in continents Effective control of COVID-19 in South Korea: cross-sectional study of epidemiological data Articles related to survival analysis in title in PubMed Factors associated with the delayed termination of viral shedding in COVID-19 patients with mild severity in South Korea Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship COVID-19 in Iceland -Statistics. Reykjavik Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship Seroprevalence and asymptomatic carrier status of SARS-CoV-2 in Wuhan City and other places of China The efficacy of warm's weighted mean likelihood estimate (WLE) correction to maximum likelihood estimate (MLE) bias Estimating Rasch measures with known polytomous (or rating scale) item difficulties: anchored maximum likelihood estimation (AMLE) Weighted likelihood estimation of ability in item response theory Using the bootstrapping method to verify whether hospital physicians have different h-indexes regarding individual research achievement: a bibliometric analysis Bootstrapping method to compare inflection points in continents We thank the grant funds of CMFHR10936 and CMFHR10892 from Chi Mei Medical Center (Taiwan) and Enago (www.enago. tw) for the English language review of this manuscript. TWC developed the study concept and design KW, WC and YT analyzed and interpreted the data. SC monitored the process of this study and helped in responding to the reviewers' advice and comments. TWC drafted the manuscript, and all authors provided critical revisions for important intellectual content. The study was supervised by HY. All authors read and approved the final manuscript.