key: cord-0876356-jp6t796z authors: Yu, Xiang; Lu, Lihua; Shen, Jianyi; Li, Jiandun; Xiao, Wei; Chen, Yangquan title: RLIM: a recursive and latent infection model for the prediction of US COVID-19 infections and turning points date: 2021-05-31 journal: Nonlinear Dyn DOI: 10.1007/s11071-021-06520-1 sha: 368c09db90c5b8392b046c075efbb750e9d638e6 doc_id: 876356 cord_uid: jp6t796z Initially found in Hubei, Wuhan, and identified as a novel virus of the coronavirus family by the WHO, COVID-19 has spread worldwide at exponential speed, causing millions of deaths and public fear. Currently, the USA, India, Brazil, and other parts of the world are experiencing a secondary wave of COVID-19. However, the medical, mathematical, and pharmaceutical aspects of its transmission, incubation, and recovery processes are still unclear. The classical susceptible–infected–recovered model has limitations in describing the dynamic behavior of COVID-19. Hence, it is necessary to introduce a recursive, latent model to predict the number of future COVID-19 infection cases in the USA. In this article, a dynamic recursive and latent infection model (RLIM) based on the classical SEIR model is proposed to predict the number of COVID-19 infections. Given COVID-19 infection and recovery data for a certain period, the RLIM is able to fit current values and produce an optimal set of parameters with a minimum error rate according to actual reported numbers. With these optimal parameters assigned, the RLIM model then becomes able to produce predictions of infection numbers within a certain period. To locate the turning point of COVID-19 transmission, an initial value for the secondary infection rate is given to the RLIM algorithm for calculation. RLIM will then calculate the secondary infection rates of a continuous time series with an iterative search strategy to speed up the convergence of the prediction outcomes and minimize the maximum square errors. Compared with other forecast algorithms, RLIM is able to adapt the COVID-19 infection curve faster and more accurately and, more importantly, provides a way to identify the turning point in virus transmission by searching for the equilibrium between recoveries and new infections. Simulations of four US states show that with the secondary infection rate [Formula: see text] initially set to 0.5 within the selected latent period of 14 days, RLIM is able to minimize this value at 0.07 and reach an equilibrium condition. A successful forecast is generated using New York state’s COVID-19 transmission, in which a turning point is predicted to emerge on January 31, 2021. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11071-021-06520-1. of parameters with a minimum error rate according to actual reported numbers. With these optimal parameters assigned, the RLIM model then becomes able to produce predictions of infection numbers within a certain period. To locate the turning point of COVID-19 transmission, an initial value for the secondary infection rate is given to the RLIM algorithm for calculation. RLIM will then calculate the secondary infection rates of a continuous time series with an iterative search strategy to speed up the convergence of the prediction outcomes and minimize the maximum square errors. Compared with other forecast algorithms, RLIM is able to adapt the COVID-19 infection curve faster and more accurately and, more importantly, provides a way to identify the turning point in virus transmission by searching for the equilibrium between recoveries and new infections. Simulations of four US states show that with the secondary infection rate ω initially set to 0.5 within the selected latent period of 14 days, RLIM is able to minimize this value at 0.07 and reach an equilibrium condition. A successful forecast is generated using New York state's COVID- 19 transmission, in which a turning point is predicted to emerge on January 31, 2021. Since its first appearance in Hubei, Wuhan, China, in December 2019, a novel virus named COVID-19 has affected millions of people worldwide, causing unpredicted economic losses and public fear. To date, the origin, incubation time, and transmission speed of COVID- 19 have not been clarified. Numerous attempts from medical, clinical, and mathematical perspectives have been made to analyze the dramatic increase in infections brought by COVID-19 and predict its transmission trends. A number of COVID-19-related studies developed their mathematical modeling based on the susceptibleinfected-removed (SIR) model, which was originally proposed by Kermack and McKendrick [13] to analyze black death virus transmission occurring in London, the UK, and pestilence in Mumbai, India, in 1666 and 1906, respectively. Theoretically, this model divides the progress of virus transmission into three phasessusceptible, infected, and removed-and relates mathematical parameters with the characteristics of each stage. For example, a mathematical parameter, β, was assumed between susceptible and infected to identify the percentage of the healthy and vulnerable population that transform into a positively infected patient. β has been associated with R 0 , the basic reproductive number, which is widely used by clinical experts to express the average speed of transmission for a specific virus. Another important indicator, γ , has been widely applied to record the percentage that move from infection to recovery or death. The reciprocal of γ indicates the median incubation period of COVID-19 transmission, which has attracted much interest from the scientific community. Regarding the incubation period of COVID-19, a number of research findings have also been published: Yu et al. [21] investigated COVID-19-infection cases reported in China and other countries and recorded incubation periods ranging from 7 to 14 days; Lai [14] collected exposure periods for 125 Chinese patients, and the estimation indicated that the median incubation period was 4.75. Zhu [22] assumed that the latent period and the infectious period are approximately equal to the incubation period and the length of stay in the hospital and preliminarily concluded that the value of the latent period and the infectious period is 5 and 10 days, respectively. Adhikari [1] asserted that the average incubation duration of COVID-19 was 4.8+/−2.6, ranging from 2 to 11 days (with 95% confidence interval, 4.1 to 7). Although numerous mathematical models have been developed to address the dynamics of COVID-19, very few focus on the secondary infections caused by recovery. Many of these models treat COVID-19 as a respiratory disease that requires immediate medical attention but does not last for long or cause secondary effects. However, long-lasting illnesses and secondary outbreaks in the USA, UK, Brazil, and India all indicate that COVID-19 symptoms cannot be treated as terminating in a manner similar to flu. For example, Sabino et al. [18] observed the resurgence of COVID-19 in January 2021 in Brazil and asserted that one of the main reasons behind this resurgence was that immunity against COVID-19 infection had already begun to wane by December 2020. Thus, the recovered group could still be infected or become a virus carrier. According to the COVID tracking project [8] , the definition of "COVID-19 recovery" varies among different US regions, ranging from, for example, "symptom improvement" to "hospital discharges" or even "days since diagnosis". In addition, there is no clear evidence that "recovered" patients are subsequently immune to COVID-19. Thus, it is reasonable and necessary to assume that a portion of them, after a certain period of time, will move from the immune group to the susceptible group. A recent scientific report from Christian Gaebler et al. [9] proves that the humeral memory response to COVID-19 will last between 1.3 and 6.3 months after infection without vaccine support. Okhuese [16] attempted to estimate the probability of COVID-19 reinfection by searching the equilibrium state of the SEIRUS model. In his simulation report, after 12 days, the rate of recovery and rate of infection will meet and reach an equilibrium state. However, his model merely considered incorrectly executed PCR tests, which is not sufficiently accurate to describe current COVID-19 transmission in the USA. According to Altan and Karasu [2] , X-ray images are able to provide better results than RT-PCR tests in the diagnosis of COVID-19 disease. One of the essential questions to be answered by a forecasting algorithm is when and how a turning point will appear. A turning point within COVID-19 transmission contains valuable information to help governments, clinical services, and scientists model the transmission and prepare. Yang et al. [20] successfully predicted the peak of COVID-19's first wave in China in late February under public health interventions. Many studies, such as [7] and [21] , relate COVID-19 transmission's turning point with political or public affairs, such as city lockdowns and school closure. Recently, some researchers claimed that the turning point and end of an expanding epidemic cannot be precisely forecasted [3] because COVID-19 transmission is highly dynamic and unstable, and the forecasting results are sensitive to small variations in parameters. The recursive and latent infection model (RLIM) algorithm proposed in this paper can provide a reliable estimation of the COVID-19 transmission turning point due to three factors. First, the authors chose a period of 14 days for prediction, which is not long enough for new effects to emerge and affect the results. Second, the turning point in our system was predicted with validated data records, and the trend in these data records was carefully observed to guarantee their smoothness. Finally, a detailed investigation of state-level regulations was made for the target states and dates to ensure that no political events occurred (such as a state lockdown or hospital emergencies). In this article, we develop and present the RLIM, a novel COVID-19 transmission model. The main contributions of this paper are as follows: (1) Developed a novel method to forecast the number of infections in the upcoming 14 days based on historical infection and recovery data. This method is able to efficiently locate the relationship between historical data and infection data and optimize the parameters of the RLIM model in a short period. Evidence from our experiment proves that the key parameters converge within a certain period in the optimized RLIM model, thereby locating the optimal parameters. (2) Given an infection-recovery dataset for COVID-19, this method is able to promptly locate the turning point with an iterative search strategy. Our experiment shows that within a period of 60 days, an RLIM model with an optimized set of parameters based on historical data (see contribution point 1) is able to predict the secondary infection rates for the coming week with an optimization strategy that minimizes the MSE (maximum square error) between the reported number of infections and RLIM predictions. The remainder of this manuscript is organized as follows. Section 2 discusses the implemented mathematical modeling, equations and algorithms. Section 3 describes the simulation settings, software and scientific packages utilized by the RLIM program. Sec-tion 4 discusses the data and simulation results for four US states' COVID-19 and provides predictions on their infections between mid-January and mid-February. Section 5 summarizes the work and offers further discussion. This section discusses the algorithm in three steps. The first step introduces the mathematics behind RLIM in detail, explaining how it evolves from the classical SIR model and describes COVID-19's infection process in a series of equations. The next step assigns mathematical symbols to parameters in RLIM and implements these equations into sequential procedures in our algorithm. Finally, a performance measure on RLIM is proposed to evaluate how the algorithm runs on COVID-19 dataset. In this paper, a modified COVID-19 transmission model is proposed based on the original SIR model by Kermack and McKendrick [13] . They proposed the susceptible-infected-removed model and used it to successfully explain the 1665-1666 plague in London and the 1906 pestilence at Mumbai, India. The SIR model diagram is shown in Fig. 1 . The transmission process is described by Eqs. (1), (2), and (3). Their SIR model is only feasible in an ideal epidemic transmission environment because it does not consider the time variance in the infection rate β or recovery rate γ . Additionally, it requires no disease controlany political or clinical intervention is forbidden, and such transmission behavior rarely appears. However, based on these theoretical assumptions, many revised Fig. 1 The SIR model models, such as SEIR [19] , SEIRUS [16] , mechanicstatistic SEIR [17] , and deep learning SEIR [12] , have been proposed and developed by researchers adopting different epidemic transmission characteristics and human interventions. A description of these models can be found in Hethcote's review [10] . RLIM is inspired by research from Jianping Huang's team at Lanzhou university [11] . Their model, named as Global Prediction system for COVID-19 Pandemic (GPCP), adds 4 states of disease from SIR model: insusceptible state (P), potentially infected state (E), quarantined state (Q), and mortality state (D). The GPCP disease transmission model is described by Eqs. (4)- (10) . RLIM, based on our observations and given facts from news reports analysis, adds a symbol ω in the transmission loop. ω represents the probability that a patient who had recovered from COVID-19 for a certain period is again identified by respiratory tests or antibody tests as a virus carrier. Following this definition, ω is used between statuses R and S I , representing the transmission possibility between the recovered group and the secondary infected group (Fig. 2) . Following the assumptions above, for the RLIM, the equation series is modified as Eqs. (11)-(15). Compared with GPCP model, the RLIM has the following advantages: (1) The RLIM simplifies the classical susceptibleinfected-quarantined-immune process into a susceptible-infection process due to the maturity of the COVID-19 detection system through PCR tests or other nucleic acid amplification tests approved by CDSE [4] . Given an accurate number of confirmed infections, RLIM focuses on differentiating first-time and secondary infections brought by different groups to achieve more accurate prediction results. (2) The RLIM improves the GPCP model with recursive state SI and parameter ω to avoid the problem of forward transmission only. Without a recursive state and the existence of parameters, the number of new infections will decrease regardless of the actions taken, and this process would be contradictory to the current US COVID-19 transmission records. (3) Introduce the latency parameter τ to indicate the median reinfection period. In RLIM, τ is initialized with a value of 14 according to WHO's instructions and scientific reports. This parameter correlates with the recovery policy in many US states: patients in the hospital will be automatically treated as recovered after a certain period. To apply Eqs. (11) and (12) in our algorithm, a transform into discrete data series shall be implemented as Eqs. (16) and (17) . Replace I (t) and R(t) with fourth stage Eqs. (16) and (17) into Eq. (14) and we have Eq. (18) . In Eq. (16), a relationship between coefficients [a, b, c, d] and [e, f, g, h] is established; thus, RLIM is able to predict the infected number of cases given historical number of recovery, previous infections, and assumptions of infection rate, recovery rate, and secondary infection period. A detailed description of the corresponding algorithm flow and diagram will be discussed in is given in Fig. 3 and Sect. 2.2. The RLIM algorithm calculates predicted infection numbers according to Eqs. (16)- (18) and optimizes the difference between actual data recorded by the COVID tracking project and the predicted numbers returned from our model. Initially, the predicted recovery numbers R 1 , R 2 , . . . , R n are calculated by the fourth-order method based on the real data series of a selected state from the USA between November 2020 and January 2021. The notations used throughout this article are described in Table 1 . The coefficients (e, f, g, h) associated with this recovery function will then be transformed into other coefficients (a, b, c, d) , with the preassigned recovery rate λ. The RLIM algorithm calculates predicted infection numbers according to Eqs. (16)- (18) and optimizes the difference between actual data recorded by the COVID tracking project and the predicted numbers returned from our model. Initially, the predicted recovery numbers R 1 , R 2 , . . . , R n are calculated by the fourth-order method based on the real data series of a selected state from the USA between November 2020 and January 2021. The coefficients (e, f, g, h) associated with this recovery function will then be transformed into other coefficients (a, b, c, d) , with the preassigned recovery rate λ (default value 0.01) and secondary infection rate ω (default value 0.01). With coefficients (a, b, c, d) assigned, the number of newly infected cases I 1 , I 2 , . . . , I n within this state can be calculated. Comparing these predicted numbers with actual data, one can evaluate and justify whether this round of prediction is accurate or not. Our RLIM model will continuously search for the optimal infection series and then determine the optimal ω associated with this state. RLIM's performance measure is calculated as the difference between its predictive output I p and actual value I k . Three performance indicators are given: the mean square error (MSE), standard deviation, and average forecasting error rate (AFER). Because different US states have quite different numbers of infections, ranging from hundreds to thousands, these indicators will be uniform between 0 and 1 to justify the performance. -Mean square error (MSE) The average of squared difference between RLIM's predictive output I p and actual value I k can be calculated in (19) . -Root mean square error (RMSE) The root mean square error is also used to evaluate RLIM's prediction quality. Its formulation is in Eq. (20) . -Average forecasting error rate (AFER) The average forecasting error rate is the percentage of error, which represents the relative difference between the predicted output I p and the actual value I t . It is a cumulative statistic capturing deviation between two time series. The AF E R is calculated in (21) . In this section, the data set, the source code, and the software packages which have been used in RLIM are explicitly listed for researchers who are interested in our research and have the intention to re-produce our simulations. The data source directly applied in our simulation is from [6] . This data set contains US state-level data on COVID-19, starting from April 2020 until December 2020. In this article, New Jersey (NJ), New York (NY), South Dakota (SD), and Virginia (VG) are selected because they all have daily tracking recovery reports. In this section, the authors present our simulation results in figures and tables and discuss how RLIM located the turning points for four US states. Our discussion starts from New York state, where RLIM successfully located the turning point through COVID-19's epidemic data records. Then, we present a detailed discussion of New York's turning point to show its relationship of the re-infection rate, which we believe is the key. Finally, we describe RLIM's performance on the other US states with predictions of their COVID-19 transmission trends. Observations from Fig. 4 indicate that RLIM successfully fits the reported data records from mid-November until mid-January and provides predictions for the upcoming weeks. Delving into the 2 columns of Fig. 4 , a conclusion can be drawn that different assignments to the secondary infection rate will result in different outcomes and prediction series. For example, a ω equal to 0.2 will produce a curve of infection numbers with a peak number of 18549, while in the case of 0.3, the peak value is forecast to be 24866. The simulation results suggest that adoption of an iterative search strategy enables RLIM to match real numbers from COVID-19 infection reports. Thus, the relative error between predictions and observations should be minimized. The other advantage of RLIM is its ability to foretell the turning point within a certain period. Forecasts of the turning point on November 31, 2021, are also marked in the left-column figures. (A discussion of the turning point calculation is presented in Sect. 4.2.) In Table 2 , the number of new infections in New York State caused by COVID-19 is predicted starting on November 15, 2020. The data records of infections were collected from the COVID tracking project during November 2020. The ranges of infection numbers between November, December, and January are [2600, 9000], [9300, 17,000], and [17, 300, 18, 500] . Observations from Fig. 4 indicate that the value ω = 0.22 fits infected cases well. Regarding the MSE, RMSE, and AFER indicators, the RLIM reaches 5.8 million absolute errors over 60 days with an AFER of 29.02%. In December 2020, it obtained a better MSE of 4.6 million and a lower AFER of 14.97%. Considering that RLIM's objective is to fit the actual infection numbers and predict the trend, it can be concluded that RLIM achieves a satisfactory result in fitting the actual data and converging the MSE. RLIM's prediction results suggest that COVID-19 transmission in New York state will reach an equilibrium after January 31, 2021, with new infections remaining at a level of 1.8 k per day. New infections will not bring an abrupt change in numbers, so clinical services such as hospitals will not require extra measures. In Table 3 , the number of new recoveries in New York state is also predicted for the months of November, December, and January, with MSE/RMSE/AFER calculated against authentic data records. In November 2020, the MSE of recovery case prediction reached 27893.26, with an AFER of 38.8%. For predictions in December 2020, the MSE increased to 43394.77, but AFER improves to 25.6%. Predictions of recovery data indicate that new recoveries will remain at the level of eight hundred per day, with a flattened tail after mid-January 2021. From the RLIM's output on New York state's infection and recovery numbers, one can observe that the turning point of this state's COVID-19 transmission occurs around January 30, 2021. Predictions indicate that from mid-January, New York's infections will The process of how the RLIM algorithm begins from an initial value and quickly evolves within one month to reach a turning point is clearly illustrated in Fig. 5 . The secondary infection rate ω is given a value of 0.5 on November 15, 2020, and the goal is to reach an equilibrium state where ω decreases to 0.07, the same as the value of λ. The optimistic case is that it will follow a straight pathway to reach 0.07 on a certain date. However, its path needs to be justified according to the actual secondary infection data reported from New York state, recorded as the series of blue points in Fig. 5 . Our model produces a series of fitted measurements, which are marked in black in Fig. 5 . These measurements represent optimal secondary infection values, chosen by an iterative search algorithm, with a near-optimal error based on the real values. The model forecast starts on January 14, 2021, and successfully reaches an equilibrium state after 2 weeks. It is reasonable to assert that the RLIM model is able to find an optimal curve for the secondary infection rate with an acceptable MSE and follows the curve to predict the forthcoming secondary infection rate. According to the simulation reports in Table 2 , RLIM's prediction results reduce the absolute error of new infections from 5.8 to 4.6 million, with an AFER from 29 to 15%. The simulation results shown in Fig. 6 indicate three scenarios: moderate increase (New Jersey) at the top, moderate decrease (South Dakota) in the middle, and exponential increase (Virginia) at the bottom. The optimal secondary infection rate ω for these states is marked above (0.13 for New Jersey, 0.056 for South Dakota and 0.19 for Virginia). Observations from these states' infection and recovery data indicate no strong correlations between ω and COVID-19 transmission trends. Revisiting Eqs. (11) and (15) explains that in RLIM, ω affects the incremental steps of infection cases positively and recovery cases negatively. However, it is still valuable to predict the turning point when it approaches the value of the recovery rate. Thus, one can conclude that if the recovery rate λ remains stable during the periods of τ (in RLIM, τ equals 14), then RLIM will approach it during a period of time and lock down the turning point. This will greatly reduce the time needed for scientists to elaborate on COVID-19's behavior. This research proposes a recursive, latent, dynamic virus transmission model based on the classical SEIR model. This model, named RLIM, is able to fit the COVID-19 transmission data of the USA and efficiently locate the transmission turning point. Introducing a new parameter ω into the classical SEIR model, RLIM is able to predict newly infected cases based on recovered data and historical COVID-19 records. Experimental results for New York, New Jersey, South Dakota, and Virginia prove that given a reasonable initial value of ω, RLIM is able to predict 30-day infections and recoveries with a reasonable error rate. RLIM also provides an estimation of ω in the time domain, suggesting that it is valuable to explore its approximation and locate the future turning point in COVID-19 transmission. Simulations on New York, dated from mid-November 2020 until the end of January 2021, provide valuable information for ω's curve and predict that it reaches an equilibrium state on the 31st of January. Our conclusion, based on the RLIM results, indicates that starting in February, New York state's COVID-19 transmission will enter an equilibrium state. One of RLIM's advantages is that it does not include environmental factors, such as weather changes, hospital capacity, or city lockdowns. Thus, it is suitable for the prediction of COVID-19 transmission without additional information. Our model is effective for virus modeling including a second wave of COVID-19 epidemic transmission, with key factors such as incubation period and infection rate statistically determined in advance. Compared with other predictive algorithms, RLIM predicts infection numbers based on optimal parameter set from 14-day historical records. The authors believe that this time span would be beneficial to avoid data overfit issue as mentioned in [3] . In a word, RLIM is believed to propose a novel yet effective solution in COVID-19 prediction and turning point estimation. A promising field of application is to integrate RLIM with machine learning techniques. RLIM's recursive, latent status is suitable for description with a back propagation process inside a neural network, so it can be easily equipped with self-learning abilities. Another interesting yet unexplored subject is to use RLIM in prediction of COVID-19's vaccine impact. Researchers may use RLIM to evaluate different kinds of vaccines' impact on COVID-19 transmission. RLIM is able to generate optimal set for pre and post vaccine inoculation group, and with these parameters' visualized, researchers and governments are able to justify the certain kind of vaccine's effectiveness on COVID-19 transmission. Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (COVID-19) during the early outbreak period: a scoping review Recognition of COVID-19 disease from X-ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique The turning point and an expanding epidemic cannot be precisely forecast Interim Case Definition An interactive web-based dashboard to track COVID-19 in real time White, laurent hebert-dufresne: state-level variation of initial COVID-19 dynamics in the United States: the role of local government interventions Recovery data and the COVID Tracking Project Evolution of antibody immunity to SARS-CoV-2 The mathematics of infectious diseases The amplified second outbreaks of global COVID-19 pandemic. medRxiv Analysis of COVID-19 spread in South Korea using the SIR model with timedependent parameters and deep learning. medRxiv A contribution to the mathematically theory of epidemics Asymptomatic carrier state, acute respiratory disease, and pneumonia due to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2): Facts and myths CORD-19 Transmission, incubation, environment Estimation of the probability of reinfection with COVID-19 coronavirus by the SEIRUS model. JMIR Public Health Surveil Using early data to estimate the actual infection fatality ratio from COVID-19 in France Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence An SEIR model for assessment of current COVID-19 pandemic situation in the UK. medRxiv Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions COVID-19 asymptomatic infection estimation. medRxiv Transmission dynamics and control methodology of COVID-19: a modeling study Acknowledgements This work is part of a project supported by the National Science Foundation for Young Scientists of China (Grant No. 61703267). The data that support the findings of this study are available from The COVID Tracking Project: http://covidtracking.com. The authors declare that they have no conflict of interest.