key: cord-0980283-bpnhcp5l
authors: Lee, Chaeyoung; Li, Yibao; Kim, Junseok
title: The Susceptible-Unidentified infected-Confirmed (SUC) epidemic model for estimating unidentified infected population for COVID-19
date: 2020-07-04
journal: Chaos Solitons Fractals
DOI: 10.1016/j.chaos.2020.110090
sha: b58f98cb2e16574dde212385af98e100eaa960d6
doc_id: 980283
cord_uid: bpnhcp5l

In this article, we propose the Susceptible-Unidentified infected-Confirmed (SUC) epidemic model for estimating the unidentified infected population for coronavirus disease 2019 (COVID-19) in China. The unidentified infected population means the infected but not identified people. They are not yet hospitalized and still can spread the disease to the susceptible. To estimate the unidentified infected population, we find the optimal model parameters which best fit the confirmed case data in the least-squares sense. Here, we use the time series data of the confirmed cases in China reported by World Health Organization. In addition, we perform the practical identifiability analysis of the proposed model using the Monte Carlo simulation. The proposed model is simple but potentially useful in estimating the unidentified infected population to monitor the effectiveness of interventions and to prepare the quantity of protective masks or COVID-19 diagnostic kit to supply, hospital beds, medical staffs, and so on. Therefore, to control the spread of the infectious disease, it is essential to estimate the number of the unidentified infected population. The proposed SUC model can be used as a basic building block mathematical equation for estimating unidentified infected population.

and still can spread the disease to the susceptible. To estimate the unidentified infected population, we find the optimal model parameters which best fit the confirmed case data in the least-squares sense. Here, we use the time series data of the confirmed cases in China reported by World Health Organization.

In addition, we perform the practical identifiability analysis of the proposed model using the Monte Carlo simulation. The proposed model is simple but potentially useful in estimating the unidentified infected population to monitor the effectiveness of interventions and to prepare the quantity of protective masks or COVID-19 diagnostic kit to supply, hospital beds, medical staffs, and so on. Therefore, to control the spread of the infectious disease, it is essential to estimate the number of the unidentified infected population. The proposed SUC model can be used as a basic building block mathematical equation for estimating unidentified infected population. Keywords: Epidemic model, least-squares fitting, COVID-19

The coronavirus disease 2019 (COVID- 19) was first identified in Wuhan, China in December 2019 [1] . The numbers of the COVID-19 confirmed cases in China from 21 January to 24 February 2020 are shown in Fig. 1 . The data was reported by World Health Organization (WHO) as of 24 February 2020 [2] . Currently, there are many active research about COVID-19: In [3] , the authors presented the impact of reduced travel volume to and from China on the transmission dynamics of COVID-19 outside China. Roosa et al. [4] used phenomenological models to generate short-term forecasts of cumulative reported cases in Guangdong and Zhejiang, China. In [5] , the authors presented the distribution of incubation periods estimated for travellers from Wuhan with confirmed COVID-19 infection in the early outbreak phase. Hellewell et al. [6] developed a stochastic transmission model to assess the effects of isolation and contact tracing.

In this paper, we propose the Susceptible-Unidentified infected-Confirmed (SUC) epidemic model for estimating the unidentified infected population for COVID-19 in China. In the Susceptible-Unidentified infected-Confirmed (SUC) epidemic model, the total population N is divided into the susceptible S(t), unidentified infected U (t), and confirmed C(t) individuals at time t:

S(t) = susceptible; individuals who are not infected but are capable of contracting the disease and becoming infective.

U (t) = unidentified infected; individuals who are infected but have not yet been confirmed, and therefore are not isolated.

C(t) = confirmed; individuals who have been infected and confirmed, including all cases of recovery or death (i.e., the removed).

Based on the assumptions above, the equations governing the SUC model are as follows:

Here, N is the total population and thus we assume that N = S(t) + U (t) + C(t)

is always satisfied. We disregard changes in population due to birth and death irrelevant to the infectious disease. Therefore, Eq. (3) can be replaced by Eq.

(4).

The transmission is expressed by the standard incidence β SU N , where β represents the disease transmission rate [7] . We assume the unidentified infected U (t) are not yet hospitalized and still can spread the disease to the susceptible

The parameter γ is the probability of cases where disease is confirmed among the unidentified infected. We assume that the confirmed C(t) are all cases who have been confirmed to have COVID-19 and recovered or died from the disease. That is, C(t) is the cumulative number. Once confirmed, patients are no longer able to spread the disease because they become isolated completely from the susceptible and the unidentified infected population. Furthermore, in this paper we ignore specific cases, such as infection in medical staff or confirmed patients not isolated, to reduce the complexity of model. Figure 2 illustrates the transition diagram of the SUC model with three states. epidemic model [8] which is widely used to estimate transmission dynamics in emerging epidemics [9] . However, we impose different meanings of the epidemic variables. The susceptible, the unidentified infected, and the confirmed in the SUC model correspond to the susceptible, the infected, and the recovered in the SIR model, respectively. Various epidemic models have been proposed by modifying the SIR model, such as SIRS (Susceptible-Infected-Recovered-Susceptible) [10] , SIRD (Susceptible-Infected-Recovered-Dead) [11] , SIS (Susceptible-Infected-Susceptible) [12] , SEIR (Susceptible-Exposed-Infected-Recovered) [13] , SIIR (a modified SIR with a latent period) [14] , and SIR/V (Susceptible-Vaccinated-Infected-Recovered) [15] models. Moreover, fractionalorder epidemic models as applications of classical models have been studied [16, 17] . We intend to consider the epidemic with a similar framework but new interpretation in a different way. In this paper, we propose a simple model as the first step.

Let S n = S(n∆t), U n = U (n∆t), and C n = C(n∆t), where ∆t is a time step. The governing equations can be solved by discretizing time and applying the explicit Euler method. Then, we have the following equations:

Here, the unknown parameters are β, γ, U 0 . Once these parameter values are known, then we can solve the discrete system of equations (5)- (7). To find the optimal values of the parameters (β, γ, U 0 ) which best fit the confirmed case data in the least-squares sense, that is,

where p is the number of the given real dataĈ i (i = 1, 2, . . . , p) and C ni (i = 1, 2, . . . , p) are the numerical solutions from Eqs. (5)-(7) at the corresponding times. We use a MATLAB routine, lsqcurvefit, which is a nonlinear curvefitting solver function that uses the trust-region-reflective algorithm in a leastsquares sense [18] :

where β, γ, U 0 are the optimized parameters, SU Cmodel is the SUC model which returns the numerical confirmed cases at times Tdata, Cdata is the confirmed real case data, lb and ub are the lower and upper bound vectors of the parameters.

In this section, we estimate the number of the unidentified infected population using Eqs. (5)-(7) and lsqcurvefit (9) . We use the time series data of the confirmed cases listed in Table 1 . For all numerical computations, we use the following parameter values: ∆t = 0.001, β 0 = 1, γ 0 = 1, U 0 0 = 0.1C 0 , lb = (10 −3 , 10 −3 , 0.01C 0 ), and ub = (10, 10, 5C 0 ). Here, the time unit is one day, which corresponds to 1000 time steps when ∆t = 0.001. Note that we perform a practical identifiability analysis of the parameters, β and γ, in Section 4.

Let p be the number of data, Cdata and we take the most recent p data in Table 1 . Figure 3 shows the computational results with various N ; and p = 22, 14, and 7. In this test, we consider three different N (i.e., N = 10 9 , 10 8 , 10 7 ) to use the effective population appropriate to each situation. When investigating actual cases of epidemic spread, we can see that most infections have occurred in certain areas such as Wuhan in China rather than across the whole country, and then spread across the country. Therefore, it is good to choose an effective population size to suit the situation. As we can observe from the results of figures, if we use the recent small number of data, then we have better fitting results to the time series data. Furthermore, we can observe the number of the unidentified infected population decreases as time increases. Table 2 shows the computed numbers of unidentified infected population of COVID-19 on 11 February 2020 and a ratio β/γ. In a strict sense, the ratio is not equivalent to the basic reproduction number R 0 in the SIR model because our proposed model has a different meaning from the SIR model and we assume that the confirmed cases of infection are isolated completely from the susceptible population. Therefore, we present the ratio as a reference only.

Next, we perform the computational tests with various N and p = 8 from 17

February 2020. Figure 4 shows the computational results on 24 February 2020 with various N and p = 8. As shown in Fig. 4 , we have the best fitting data of the confirmed cases. Table 3 shows the computed numbers of unidentified infected population of COVID-19 on 24 February 2020 and the ratio β/γ. 

We perform the practical identifiability analysis of our proposed model using the Monte Carlo simulation (MCS) [19, 20] . We use the same data and parameter set as in Fig. 4 . First, we solve the SUC model numerically with the obtained parameters β and γ; and obtain the vector C i with ∆t = 0.001 for i = 0, 1, . . . , 7000. Second, we generate M parameter sets, (β j , γ j ) for j = 1, . . . , M . We take M = 1000. Here, (β j , γ j ) are the optimized parameters with which the SUC model best fits with randomly perturbed confirmed data P i,j from C i , where P i,j = C i + C i i,j , E( i,j ) = 0, and V ar( i,j ) = σ 2 0 for each j. σ 0 is the standard deviation. Third, we compute the average relative estimation errors (AREs):

Let us consider that a parameter is very sensitive to the noise. In [20] , the parameter is not practically identifiable if ARE is higher than the measurement error σ 0 . In this case, even with a moderate and reasonable level of measurement error, it may result in a seriously large ARE. Table 4 lists AREs for the parameters β and γ with respect to various noise levels σ 0 . As expected, increasing σ 0 increases the AREs. Both the parameters β and γ are practically identifiable because the AREs are smaller than the measurement error σ 0 . Therefore, the proposed model is practically identifiable, which implies the model parameters can be estimated from real data. 

We proposed a new approach for modeling an epidemic disease, COVID- However, the proposed model is as simple as possible under many constraints, assuming the ideal situation. In this paper, we reduced the complexity and focused on a basic building block. Thus, we excluded several realistic elements, for example, interventions, a latent period of virus, changes in population due to birth and death, infection in medical staff or confirmed patients not isolated, etc. In future works, we will complement various conditions for specific and realistic situations not covered in this paper to improve the model.

The accurate estimation of the unidentified infected using the proposed model depends on the reliable and accurate confirmed data. We used the number of the confirmed cases and deaths reported by WHO. There may be differences in how data is aggregated for each country or region. In fact, the criterion for classifying the confirmed cases in China has been changed twice, and it has led to sharp increase in the number of confirmed cases on 17 February 2020.

Nevertheless, the proposed model can be modified by applying various situations for each system and culture in diverse countries. We only used the data on China, however, if the model is supplemented, it can be applied to many different countries with a variety of spread patterns.

The proposed SUC epidemic model for computing the unidentified infected In the Appendix, we provide the source program code so that the interested readers can use and modify it for their own needs. In future works, we will improve the SUC model with more specific conditions such as a latent period, changes in population due to birth and death, infection in medical staff or confirmed patients not isolated. We will also develop a novel and proper index corresponding to the basic reproduction number used to investigate infectious diseases and compare to other diseases.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

The first author (C. The following code is the main program. 

Estimating the Unreported Number of Novel Coronavirus (2019-nCoV) Cases in China in the First Half of January 2020: A Data-Driven Modelling Analysis of the Early Outbreak

Novel Coronavirus (2019-nCoV) situation reports

Assessing the Impact of Reduced Travel on Exportation Dynamics of Novel Coronavirus Infection (COVID-19)

Shortterm Forecasts of the COVID-19 Epidemic in Guangdong and Zhejiang

Incubation period of 2019 novel coronavirus (2019-nCoV) infections among travellers from Wuhan, China

Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts

Structural and practical identifiability analysis of outbreak models

Estimation of basic reproduction number of the Middle East respiratory syndrome coronavirus (MERS-CoV) during the outbreak in South Korea

Forecasting influenza epidemics in Hong Kong

Complicated endemics of an SIRS model with a generalized incidence under preventive vaccination and treatment controls

Characterization of the COVID-19 pandemic and the impact of uncertainties, mitigation strategies, and underreporting of cases in South Korea, Italy, and Brazil

Nonlinear dynamical analysis and control strategies of a network-based SIS epidemic model with time delay

Early dynamics of transmission and control of COVID-19: a mathematical modelling study

Spread of Infectious Diseases with a Latent Period

A game theoretic approach to discuss the positive secondary effect of vaccination scheme in an infinite and wellmixed population

Optimal control of a fractional order epidemic model with application to human respiratory syncytial virus infection

A fractional-order epidemic model with time-delay and nonlinear incidence rate

Optimization techniques via the optimization toolbox

On identifiability of nonlinear ODE models and applications in viral dynamics

Structural and practical identifiability analysis of zika epidemiological models

Formal analysis, Investigation, Data Curation, Writing -Original Draft, Writing -Review & Editing, Visualization, Funding acquisition Yibao Lee.: Validation, Investigation, Writing -Original Draft, Writing -Review & Editing

Writing -Original Draft, Writing -Review & Editing, Supervision, Funding acquisition

The following MATLAB codes are available from the corresponding author's webpage:http://elie.korea.ac.kr/~cfdkim/codes/ The following code is a function and should be saved with the file name 'SUCmodel.m' and placed in the same folder where the main code is. 

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.