key: cord-0947432-grv1nvot
authors: Qu, Yuanke; Yin Lee, Chun; Lam, KF
title: A sequential test to compare the real-time fatality rates of a disease among multiple groups with an application to COVID-19 data
date: 2022-02-03
journal: Stat Methods Med Res
DOI: 10.1177/09622802211061927
sha: cb547c121a8a58970c01ee681989346f7baec2b0
doc_id: 947432
cord_uid: grv1nvot

Infectious diseases, such as the ongoing COVID-19 pandemic, pose a significant threat to public health globally. Fatality rate serves as a key indicator for the effectiveness of potential treatments or interventions. With limited time and understanding of novel emerging epidemics, comparisons of the fatality rates in real-time among different groups, say, divided by treatment, age, or area, have an important role to play in informing public health strategies. We propose a statistical test for the null hypothesis of equal real-time fatality rates across multiple groups during an ongoing epidemic. An elegant property of the proposed test statistic is that it converges to a Brownian motion under the null hypothesis, which allows one to develop a sequential testing approach for rejecting the null hypothesis at the earliest possible time when statistical evidence accumulates. This property is particularly important as scientists and clinicians are competing with time to identify possible treatments or effective interventions to combat the emerging epidemic. The method is widely applicable as it only requires the cumulative number of confirmed cases, deaths, and recoveries. A large-scale simulation study shows that the finite-sample performance of the proposed test is highly satisfactory. The proposed test is applied to compare the difference in disease severity among Wuhan, Hubei province (exclude Wuhan) and mainland China (exclude Hubei) from February to March 2020. The result suggests that the disease severity is potentially associated with the health care resource availability during the early phase of the COVID-19 pandemic in mainland China.

The incidence of emerging infectious diseases has increased worldwide in recent decades and has posed one of the greatest threats to public health globally. 1 In particular, the ongoing coronavirus pandemic (COVID- 19) , first identified in Wuhan city of China in December 2019, is affecting 217 countries and territories across the world with a death toll of over 1.7 million out of around 79 million cases by the end of 2020. 2 The COVID-19 crisis has become a public health emergency and has seriously disrupted every aspect of our life, economies, and societies. For this deadly infectious disease caused by a novel pathogen, its lethality is one of the most important characteristics of the virulence of the disease for evaluating the effectiveness of responding strategies.

The case fatality rate (CFR) is one of the most essential epidemiological quantities to measure the virulence of an infectious disease, which is commonly defined as the proportion of deaths among all confirmed cases. The CFR has been adopted by health authorities in the current COVID-19 pandemic as a severity indicator. 3 However, it was reported that this simple estimator only performs well at the end of an epidemic when all the cases have been resolved (affected individuals either died or recovered), but may not be a reliable indicator during an ongoing epidemic. 4, 5 Various statistical approaches have been proposed to provide a more accurate estimate for disease severity by adjusting for the reporting delay from illness onset to death during an epidemic. [6] [7] [8] Among others, Yip et al. 9 suggested that the fatality rate of an emerging epidemic should be time-varying in nature, and a decreasing trend in fatality rate could be a reflection of an effective measure. To provide some critical guidance on developing prompt decisive policies during an outbreak, they proposed to use the real-time fatality rate (RTFR) to measure the severity of an epidemic as opposed to the traditional CFR. Specifically, the RTFR is defined as the probability of death conditioned on a transition to death or recovery based on a counting process approach. Relative to CFR, the RTFR was shown to be more sensitive to capture changes in fatality rate during the course of an epidemic. To detect a change in RTFR statistically, Lam et al. 10 developed a one-sample sequential test for the null hypothesis of constant fatality rate, which is applied to investigate the effectiveness of the interventions in Hong Kong and Beijing during the severe acute respiratory syndrome epidemic in 2003. Therein, the testing procedure starts before the implementation of a potential intervention. Under the null hypothesis, the RTFR remains constant, which means that the intervention is not effective in suppressing the fatality, at least, in a short-term period, say two months. Hence, a significant reduction in RTFR in a short run can be assumed to be attributed to an effective intervention. However, one should be cautious about the test results in the long run, as a progressive reduction in RTFR can be caused by other factors, such as the rise in temperature, improved medical health care, and mutation that the virus becomes less lethal. A more promising test to identify a potential factor that affects the severity is to compare the RTFRs among multiple independent groups over time that the effects of the above-mentioned confounding factors are shared by all groups.

There exists a modest statistical literature for comparing virulence among different subgroups. Reich et al. 11 defined the relative CFR as the CFR of one group divided by that of another reference group. They compared the group-specific fatality rates using a generalized linear model framework, which was adopted by Chen et al. 12 for estimating CFR based on the maximum profile-likelihood approach. However, the assumption of time-invariant CFRs in their approach is quite restrictive and is presumably more suitable for chronic diseases rather than novel emerging infectious diseases. Apart from that, most of the contemporary studies for epidemics compared the disease severity among different subgroups in a pre-specified study period and drew a conclusion on the performance of a certain intervention only at the end of the study. This approach fails to assess the efficacy of the implemented measures in a timely manner even if there is strong statistical evidence supporting differences in performance among subgroups during the observation period. Moreover, these conclusions may not be applied directly to future episodes of the same epidemic even if the viruses are of the same strain, because the characteristics of the viruses may change. During the outbreak of a novel infectious epidemic, there is an urgent need to identify an effective intervention at the earliest possible time so that prompt action can be taken to secure public health in an effective way. Also, the collection of complete and complex data is extraordinarily difficult due to various administrative reasons. For instance, information such as the times to death and recovery of patients is non-trivial and hard to assess. On the other hand, it is relatively easy to obtain the summary data on the number of confirmed cases, deaths, and recoveries from countries, such as those compiled daily during the ongoing COVID-19 pandemic. In addressing the aforementioned problems, a statistical test that provides a timely comparison of the fatality rates among multiple groups based on a simple data structure is warranted.

Motivated by the idea of Yip et al. 9 where they captured the progressive changes in disease severity efficiently using RTFR, we propose a sequential test for the null hypothesis of equal RTFRs among different subgroups over a time period [0, τ]. The null hypothesis can be rejected at time t where t ∈ [0, τ] as soon as statistical evidence accumulates. With the proposed method, one can test for the difference in RTFRs among neighboring areas, different age groups, different treatment arms of a clinical trial to inform and formulate public health strategies. For example, a single-arm clinical trial conducted in March 2020 found clinical improvement in patients with severe COVID-19 receiving Remdesivir, the first drug recommended for treating COVID-19. 13 To further study this potential antiviral agent, a growing number of controlled clinical trials are conducted to judge its efficacy. 14, 15 In this case, a potential usage of a multiple sub-group test is to compare the RTFRs of patients receiving Remdesivir and standard treatment over time. Essentially, a rejection of the null hypothesis before the end of the study indicates the superior effectiveness of one treatment over the other(s). Another example, as will be illustrated in the "Application" section, is to compare the difference in RTFRs across different neighboring areas to identify the target areas that need assistance in medical health care resources during public health emergencies.

The test statistic for two-sample comparison and its asymptotic properties are studied in section 2. The generalization of the two-sample case to the K-sample case (K > 2) is delineated in section 3. In section 4, a large-scale simulation study is carried out to evaluate its finite-sample performance in various scenarios. In section 5, the proposed test is applied to the COVID-19 epidemic data of mainland China to investigate the difference in disease severity among three separate area clusters: Wuhan, Hubei province (exclude Wuhan) and mainland China (exclude Hubei) during the disease outbreak. Discussions and recommendations are given in section 6.

We consider two populations, classified by age, treatment, or any other categories that are of our interest, subject to infection during an epidemic. Some basic epidemiological data are collected in real-time. Very often, public health officials aim to examine the difference in disease severity among two subgroups. Typically, clinicians are interested in tackling the following questions:

• Is the newly proposed treatment more effective than the standard treatment (placebo) in treating the specific infectious disease? • Compared with area A where no measures have been taken, is the fatality rate lower in area B with effective policies? • Do patients from resource-poor area A have a higher fatality rate than those from area B?

These questions have primary importance to guide the decision-making process during the outbreak of an infectious disease.

We set the observation period to be [0, τ] in the hope that a reliable decision can be made at time τ. Time 0 can be set as the day where a certain intervention, treatment, or response strategy is implemented on a particular group. We partition [0, τ] into H regular intervals (naturally in days or weeks) and the information regarding the numbers of inpatients, deaths and recoveries for the two subgroups are collected in sequence at the end of the hth interval, h = 1, . . . , H. Denote the numbers of deaths and recoveries in group k (k = 1, 2) in the hth interval by n k,D (h) and n k,R (h), respectively. We further denote the cumulative numbers of deaths and recoveries in group k at the end of the hth interval by N k,D (h) and N k,R (h), respectively, where N k,D (h) = h j=1 n k,D ( j) and N k,R (h) = h j=1 n k,R ( j). Let I k (h − 1) be the number of inpatients just before the start of the hth interval for group k. We assume that in the hth interval, each of the I k (h − 1) inpatients will either die, recover or remain in the hospital with respective probabilities p k,D (h), p k,R (h) and 1 − p k,D (h) − p k,R (h). Conditional on the past information, we have

Let F k,h = {I k ( j), n k,D ( j), n k,R ( j), N k,D ( j − 1), N k,R ( j − 1), j ≤ h} be the filtration or history generated by the observed data, which satisfies the usual regularity conditions. 16 For group k, a discrete RTFR 17 for the hth interval by considering recovery and death as two competing risks is defined as

which can be treated as the probability of a death conditioned on an event of death or recovery. The maximum likelihood estimator (MLE) of π k (h) can be easily shown to be

] are the respective sample death and recovery proportions for group k in the hth interval. The above framework, together with a smoothed version of the RTFR estimator, was summarized in Yip et al. 17 With the sensitivity in picking up the changes in severity over time, the RTFR in (2) can be used to compare the virulence of the disease between two subgroups. That is, to test

which can be reformulated as

When the null hypothesis of equal RTFRs between two subgroups (K = 2) holds true, we expect that the ratios n 1,D (h)/n 1,R (h) and n 2,D (h)/n 2,R (h) are similar throughout the whole observation period. Therefore, we propose the following two-sample test statistic:

where w( j) is a locally bounded, non-negative F j−1 predictable weight process. The subscript 2 in Z 2 (h) corresponds to the 2-sample case discussed in this section. The proposed test statistic has mean zero under H 0 for all h ≤ H, but has a positive expected value under H 1 for some h ≤ H. Let Z † 2 (h) be the test statistic in (4) with the typical weights w( j) = 1 for j = 1, . . . ,h, which represents the situation that the contribution from every interval is weighted equally. Presumably, one can introduce different sets of weights to the test statistic Z 2 (h) to allow extra flexibility. For example, the choice of weights w( j) = I 1 ( j − 1) + I 2 ( j − 1) allocates a heavier weight to the period with more inpatients in groups, but a lighter weight to the time period with fewer inpatients as the fluctuations can be erratic in these intervals. Another set of intuitive weights is w( j) = 1/[I 1 ( j − 1) × I 2 ( j − 1)], which makes the changes in fatality rate contribute equally to the test statistic throughout the whole study period regardless of the size of inpatients, and the resulting test statistic is denoted by

Denote T = {1, 2, . . . , H} as the time indices for the discrete time process. For j ∈ T,

and by the Delta method, 18 we have

When the null hypothesis of equal RTFRs between the two groups holds true, g(p j ) = 0 for all j ∈ T, which implies

Moreover, for any s ≠ t, g( p s ) and g( p t ) are independent. Consequently, we have

where σ 2 (h) = h j=1 w( j) 2 ∂g(p j )/∂p T j Σ j ∂g(p j )/∂p j and it can be consistently estimated by

With the above definitions, it is easy to obtain the test statistic Z 2 (H) and its corresponding variance estimate σ 2 (H) evaluated at the endpoint τ. It follows that a straightforward test statistic for equal RTFRs between two subgroups is given by

which is distributed according to a standard normal distribution under the null hypothesis. Therefore, a decision can be made at the end of the observation period τ and one can reject

is the distribution function of a standard normal random variable. Note that σ 2 (.) in (6) is a non-decreasing function and for ∀s < t, we have cov{Z 2 (s), Z 2 (t)} = min {σ 2 (s), σ 2 (t)} = σ 2 (s). Therefore, Z 2 (h) is a Gaussian process with independent increment. Hence, we have

where W • σ 2 is a standard Brownian motion. With these properties, the asymptotic normality of {Z 2 (h), h ∈ T} with variance estimate σ 2 (h) can be applied to develop a sequential testing procedure, discussed in the next section.

Consider the above statistical test conducted over the observation period [0, τ] with a pre-specified value of H. The idea of the sequential test is to conduct a test at the end of each of the H non-overlapping intervals until a decision is made. We can think of it as a test running for H days or weeks, for example, H = 50. To be specific, for h = 1, 2, . . . , H, a test statistic Z 2 (h) is calculated based on the filtration F k,h and a simple rule is used to decide when to stop: if Z 2 (h) exceeds the corresponding critical value b h , we reject the null hypothesis and stop the test at the end of the hth interval. To maintain the overall type I error rate in a sequential design, the cumulative type I error is achieved by a non-decreasing function defined for each interval such that α(0) = 0 and α(H) = α, say α = 0.05. Adjustment to the significance level for each interval can be made through the α-spending function approach as proposed in Gordon Lan and DeMets 19 with

Then, the set of rejection boundaries

Based on the α-spending function in (8), we have the recursive relationship

We have shown in Section 2.2 that the sequence of test statistics {Z 2 (1), . . . , Z 2 (H)} is a Brownian motion with independent increments under the null hypothesis. We have, Z 2 (1) ∼ N(0, σ 2 (1)), and for each h = 2, 3, . . . , H,

. The calculation of the rejection boundaries can be simplified and obtained recursively by solving

and, for h > 1

The multiple integral is evaluated using a Gaussian quadrature that replaces each integral by a weighed sum. Details regarding the numerical computation for sequential methods are given in Chapter 19 of Jennison and Turnbull. 20 Therefore, we can compare the test statistic Z 2 (h) with its corresponding rejection boundary b h (h = 1, . . . , H) and one can reject the null hypothesis of equal RTFRs at the end of the h * th interval, where h * = min{j :

. . , H, then one may conclude that the null hypothesis is not rejected at the end of the observation period.

In the last section, we propose a sequential test for equal RTFRs between two subgroups. This test can be easily generalized to accommodate the K-sample cases (K > 2) for handling complicated clinical issues during an epidemic. For instance, the test can be applied to identify the most effective drug among several candidate treatments; the test is useful when the health authority wants to determine whether the disease severity is associated with some continuous or ordinal scale measurements, such as age or the level of hospital-based health care technology. Moreover, it is of epidemiological importance to compare the severity of the disease among different areas or countries. For the ongoing COVID-19, one can compare the RTFRs among different areas in China, or among different countries to exchange information and learn from the experiences of areas with comparatively improved fatality rates. Analogous to the two-sample test, we consider the observation period [0, τ] that contains H equally spaced time intervals. We aim to test for the null hypothesis that the RTFRs of a specific disease are equal across K (K > 2) independent subgroups against the alternative that the RTFRs increase with age or other measurements or factors of our interest over time. To be more specific, the hypotheses are

The test statistic is proposed as follows:

where n k,D ( j) and n k,R ( j) are the numbers of deaths and recoveries in jth interval for the kth subgroup, respectively. In particular, w k,k+1 (h) is a set of predetermined weights regarding the size of population in groups k and k + 1. For illustration, we set K = 3 in (12) to accommodate the comparison of RTFRs among three subgroups. The proposed test statistic becomes 

Similarly, let p * j = { p k,D ( j), p k,R ( j); k = 1, 2, 3} be the vector of the MLEs of the probabilities of death and recovery in the jth interval for the three groups, we can show that the test statistic can be rewritten as

where I k ( j − 1) denotes the number of inpatients for group k at the start of the jth interval, g * ( p * j ) is a function of p * j . It follows that the asymptotic variance of the test statistic in (13) can be derived easily by the Delta method, which is given by

for k = 1, 2, 3 and j = 1, . . . , h.

Analogous to the two-sample case, we denote Z † 3 (h) as the test statistic in (13) with a typical set of weights w k,k+1 (h) = 1 for all k = 1, 2 and h = 1, . . . H. Another set of intuitive weights is w k,k+1 ( j) =

We can easily show that the test statistic in (13) enjoys the same asymptotic properties as in the two-sample test. It converges to a Brownian motion under the null hypothesis, and the sequential testing procedure mentioned in Section 2.3 can be readily adopted using (13) and (14) . Therefore, the differences of the RTFRs among K > 2 independent groups can be identified at the earliest possible time when enough statistical evidence accumulates.

A large-scale simulation is carried out to assess the finite-sample performance of the proposed two-and three-sample sequential tests. We assume that surveillance data are routinely reported while the exact death and discharge times are generally unknown. This mimics the real-world epidemiological data that only a summary of aggregated counts is available during the outbreak. We assume a 50-day observation period (i.e. τ = 50), which is divided into H = 50 equal intervals and the daily number of inpatients is set to be I k (h) = 3000-30h (k = 1, 2, 3 and h = 1, 2, . . . , 50). We consider different scenarios that imitate how the RTFRs change over time in practice based on the prespecification of the death and recovery probabilities p k,D (h) and p k,R (h) on day h for group k (k = 1, 2, 3), respectively. The daily numbers of deaths and recoveries are then generated under the multinomial setting in (1) . Based on the filtration F k,h on day h, the test statistics Z † 2 and Z † 3 can be calculated and the sequential test can be conducted. The overall level of significance is set at α = 0.05, and the α-spending function described in (8) is adopted throughout the simulation.

For each scenario, 10, 000 independent simulated data sets were generated. Under H 0 , various scenarios with equal RTFRs among subgroups were considered to evaluate the empirical rejection rates of the sequential test, and the results for two-and three-sample tests are summarized in Tables 1 and 3 , respectively. We can see that the empirical sizes for both tests match closely with the nominal level of 0.05 in all cases, suggesting that the proposed tests are empirically unbiased.

We consider 24 scenarios under the alternative hypothesis for the two-sample comparison. The results are summarized in Table 2 , where h * in the last column represents the empirical average of h * , the day at which the null hypothesis is first rejected (among those with H 0 being rejected). In the first eight scenarios, the RTFRs of the two groups are different only between a specific interval [25, 40] or [30, 40] . The second eight scenarios correspond to the situation that the RTFRs of two groups remain the same at first, but the RTFR of group 2 drops suddenly at h = 30. One can see that the proposed test is reasonably powerful (with empirical powers over 95%) in detecting a sudden change in RTFR between groups. Also, the null hypothesis can be rejected within a short period of time, say 7 days, since a change has been imposed to the RTFR of group 2. Nevertheless, we may observe a relatively small power in some cases where the change in RTFR in group 2 is modest or small. For example, the scenario in the seventh row of Table 2 only attains a power of 50.80% due to a relatively small jump size in the recovery probabilities in group 2. When we increase the jump size from 0.01 to 0.02 (the next row in Table 2 ), the empirical power increases from 50.80% to 95.56%, and h * becomes closer to h = 30 where a change occurs. The remaining 8 scenarios correspond to the situation that the RTFR of group 1 is uniformly higher than that of group 2, and the empirical powers are high in general. Table 4 demonstrates the good performance of the proposed three-sample test under H 1 . Specifically, the RTFR is always the highest in group 1 and the lowest in group 3 throughout the observation period. The empirical powers are close to 1 in all cases and the null hypothesis can be rejected quickly as soon as there is enough statistical evidence supporting the alternative hypothesis.

In addition to the results reported in Tables 1 to 4 , we have tried different sequences of daily number of inpatients in the simulation setup, such as I k (h) = 3000 + 30h and I k (h) = 3000 + 30h − 60(h − 15) + where u + = max (0, u), k = 1, 2, 3 and h = 1, 2, . . . , 50 as well as small sample size with I k (h) around 800. We also tried another weight function corresponding to the test statistics Z * 2 and Z * 3 in replacement of Z † 2 and Z † 3 . It is noted that the results obtained in Tables  1 to 4 are quite robust to these changes, hence those findings are not reported here. Moreover, when compared with the non-sequential test V (H) discussed in (7), the sequential test achieves the same level of power in all cases with no additional cost but it allows conclusion to be made at a much earlier time. Table 3 . Simulation results of the proposed three-sample test under different scenarios when H 0 is true. In addition, to assess the effect of the choice of τ, and hence the number of intervals H (say, in days or weeks), on the performance of the proposed test, τ = 40 and 60 days were also considered. For the cases with a sudden change in the RTFR under H 1 , the power and h * for different values of τ are virtually identical. For the cases with a gradual increase in the difference in RTFRs over time, it is natural to expect a higher power based on a larger value of τ as more statistical evidence would accumulate over time, but the difference is minimal. On the other hand, a larger value of τ also means that the significance level assigned to each interim analysis is smaller, which will also lead to a slight increase in h * . In practice, we suggest to set τ to be reasonably large to allow accumulation of more statistical evidence, at the expense of a slight delay in the decision if there is a difference.

In December 2019, several cases of novel coronavirus infection, now known as COVID-19, were reported in Wuhan, Hubei province, China. Despite the implementation of strict lockdown in Wuhan on 23 January 2020, 21 this virus had rapidly spread from the epicenter to different regions across China. By the end of February 2020, 79,394 cases including 2838 deaths were reported in mainland China. 22 Thereof, 66,337 occurred in the Hubei province with a death toll of 2727, suggesting a CFR of 3.26% at first glance, which contrasts with 0.8% in other areas of mainland China. The study suggested that the accessibility level of health care resources may be the cause of the considerable gap in mortality among different areas. 23 According to the level of medical resource availability during the outbreak, we partition mainland China into three clusters, namely Wuhan city, Hubei province excluding Wuhan city, and mainland China excluding Hubei province. The main objective of the analysis is to explore the difference in disease severity based on the RTFR among these clusters and to investigate the potential effects of medical resource availability (i.e. the numbers of doctors and hospital beds) on the fatality rate in China. The cumulative numbers of confirmed cases, deaths, and recoveries between 1 February and 31 March for each cluster were summarized and extracted from the public domain. 24 A smoothed version of the RTFR estimator 17 for the three separate clusters over the observation period is shown in Figure 1 . We can see that there exist clear disparities in severity among areas in mainland China during the early phase of the COVID-19 epidemic. In this regard, we provide some explanatory notes to describe the observed pattern. As most of the cases were concentrated in Hubei province at the beginning of the outbreak, the hospitals and local health care systems were suddenly overwhelmed. Especially in Wuhan, many patients did not receive timely treatment, causing Wuhan to have the highest fatality rate, followed by the remaining cities in Hubei province. On the contrary, the lockdown measures implemented in Wuhan city delayed the epidemic growth in other provinces and provides valuable time for them to prepare. Therefore, the number of infections grew at a much slower rate compared with the supply of health care resources, contributing to a mitigating fatality rate in mainland China as a whole. To meet the shortage of medical resources in worst-hit areas, the Chinese government mobilized all the necessary resources nationwide to support virus control in Hubei and the city of Wuhan. Two new field hospitals, namely Huoshenshan and Leishenshan, were built in a few days and had been put into use in Wuhan in early February. Over 25,000 medical professionals from other provinces of China rushed to Wuhan for assistance as of 14 February. 25 The remaining 16 cities in Hubei province also received one-to-one paired assistance from other provinces. 26 In the meantime, a number of temporary hospitals, namely the Fangcang shelter hospitals, were constructed to provide enough beds to treat patients with mild to moderate symptoms. These temporary hospitals relieved the huge pressure on the health care system and allowed the designated hospitals to concentrate on treating patients with severe and critical conditions. 27 As of 28 February 2020, Wuhan had established 16 temporary hospitals, and the demand for hospital beds in Hubei was met. While the RTFR of mainland China stabilized at a low level, the RTFRs of Wuhan and Hubei declined continuously owing to increasing hospital beds and sufficient medical resources. Eventually, the RTFRs for the three clusters reach a similar level by the end of February and remained relatively low throughout March.

The proposed method is applied to examine the difference in RTFRs among areas over time. By treating day as the unit, there are H = 60 intervals in total with day 0 being 1 February 2020. We conduct the two-sample test for H 1 : π Wuhan (h) > π Hubei (h), and the three-sample test for H 1 : π Wuhan (h) > π Hubei (h) > π China (h) for some h < H = 60, respectively. The typical weight function w(h) = 1 is used and the overall significance level is set to be 0.05. Specifically, on day h (h = 1, 2, . . . , H), we compared the test statistics Z † 2 (h) and Z † 3 (h) with their corresponding critical values b h , respectively. The sequential test is terminated at 0 ≤ h * ≤ H if the test statistic at h * exceeds the critical value b h * . In line with the considerable gap in fatality rates among areas shown in Figure 1 , the null hypothesis of equal RTFRs is rejected quickly on the seventh day (7 February) and fourth day (4 February) based on the two-and three-sample tests, respectively. We then conduct the same pair of tests for the period from 1 March to 1 April with H = 32, during which the medical resources availability is more or less the same across different clusters. As expected, both the two-and three-sample tests fail to reject the null hypothesis, which is further supported by the similar fatality rates in March 2020 among the three areas as displayed in Figure 1 . This example shows that our proposed tests are sensitive in picking up changes in RTFRs, and it is useful to provide real-time signals at time t ∈ [0, τ] to the health authority on whether the existing measures or medical resources are adequate in some areas to contain the epidemic.

A statistical test is proposed in this paper to compare the RTFRs among independent groups during the course of an ongoing epidemic. As the implementation of an effective control measure can reduce disease severity and save more lives, our method can provide an evidence-based assessment of the effectiveness of the implemented intervention and inform the policy-making process during an emerging epidemic. The asymptotic Brownian motion of the test statistic under the null hypothesis allows one to adopt a sequential design naturally. Therefore, the null hypothesis of no difference in severity among subgroups can be rejected as soon as sufficient information has accumulated over time. This property is particularly useful during the emerging epidemic as the government officials can identify the effective control measures at the earliest time and issue the recommendation for disease control promptly. A large-scale simulation study shows the good performance of our proposed test in two-and three-sample cases in terms of unbiasedness and the sensitivity in picking up the difference in severity among groups.

The proposed statistical test is applied to the COVID-19 data in mainland China to examine the difference in severity among three separate clusters. The results suggest that the severity of COVID-19 in mainland China is possibly associated with the accessibility of local health care resources. This emphasizes that medical supplies and resources play an important role in lowering the RTFR. As many countries are now struggling with the COVID-19 outbreak, these findings may suggest on disease prevention and control worldwide. Especially for the resource-limited countries, they should at least slow down the surge of infections to avoid the local medical system being overwhelmed. The illustrated example demonstrates that our method is simple to use and is widely applicable to all emerging infectious diseases.

We have shown that the proposed two-sample test can be easily generalized to accommodate the K-sample situation. Essentially, this enables us to deal with more clinical questions in practice. For example, investigating the discrepancy in RTFRs between multiple age groups could help to minimize the confounding effect and help us gain an in-depth understanding of the other factors that affect fatality. Most importantly, noting the discrepancy of fatality rates between different treatment arms in clinical trials helps the clinicians to identify the most effective treatment for curbing the disease. Take COVID-19 as an example, over hundreds of clinical trials have been registered worldwide on clinical trials registries so far aiming to evaluate the performance of some possible treatments. [28] [29] [30] Our proposed method can be one of the essential tools to evaluate the efficacy of different potential treatments, where superiority over other candidate treatments is indicated by a relatively improved fatality rate along the timeline.

The proposed tests have the advantage of using minimal information to gain timely assessment on the effectiveness of potential treatments or implemented measures based on a quantitative approach. During an outbreak of the emerging epidemic, the surveillance data are always incomplete, and individual data such as the time-at-infection, time-to-recovery, and time-to-death, are difficult to obtain. This is true especially for those areas with low public health awareness and with a poor health care system. For the ongoing COVID-19, the epidemiological data are hard to obtain, and, for most of the countries, only the cumulative counts on cases, death, and recovery are recorded. It is important for public health officials to make use of this simplest data structure to gain more insight into the disease so that prompt actions can be taken to suppress the disease fatality at the earliest possible time.

Global trends in emerging infectious diseases

Coronavirus disease (COVID-19) weekly epidemiological update and weekly operational update

Estimating mortality from COVID-19

Methods for estimating the case fatality ratio for a novel, emerging infectious disease

Estimating mortality from COVID-19: Scientific brief

Case fatality rate for Ebola virus disease in west Africa

The many estimates of the COVID-19 case fatality rate

Estimating the risk of Middle East respiratory syndrome (MERS) death during the course of the outbreak in the Republic of Korea

A comparison study of realtime fatality rates: severe acute respiratory syndrome in Hong Kong

A test for constant fatality rate of an emerging epidemic: with applications to severe acute respiratory syndrome in Hong Kong and Beijing

Estimating absolute and relative case fatality ratios from infectious disease surveillance data

Estimating the case fatality rate using a constant cure-death hazard ratio

Compassionate use of remdesivir for patients with severe COVID-19

Remdesivir in adults with severe COVID-19: A randomised, double-blind, placebo-controlled, multicentre trial

Remdesivir for the treatment of COVID-19 preliminary report

Statistical models based on counting processes

A chain multinomial model for estimating the real-time fatality rate of a disease, with an application to severe acute respiratory syndrome

A note on the delta method

Discrete sequential boundaries for clinical trials

Group sequential methods with applications to clinical trials

The positive impact of lockdown in Wuhan on containing the COVID-19 outbreak in China

Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China

Potential association between COVID-19 mortality and health-care resource availability

Tencent News

Modeling the control of COVID-19: Impact of policy interventions and meteorological factors

Pairing assistance the effective way to solve the breakdown of health services system caused by COVID-19 pandemic

Fangcang shelter hospitals: a novel concept for responding to public health emergencies

In vitro antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)

A trial of lopinavir-ritonavir in adults hospitalized with severe COVID-19

Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

The authors received no financial support for the research, authorship, and/or publication of this article.

https://orcid.org/0000-0001-5453-994X Chun Yin Lee https://orcid.org/0000-0002-7207-2519

 Table 4 . Simulation results of the three-sample sequential test under the alternative hypothesis when H 1 is true.