key: cord-0947945-tusqf260
authors: Hu, Z.; Ge, Q.; Li, S.; Xu, T.; Boerwinkle, E.; Jin, L.; Xiong, M.
title: Spread of Covid-19 in the United States is controlled
date: 2020-05-11
journal: nan
DOI: 10.1101/2020.05.04.20091272
sha: 02336a02f8e7d85f34d2d992f309c59c8fe9492a
doc_id: 947945
cord_uid: tusqf260

As of May 1, 2020, the number of cases of Covid-19 in the US passed 1,062,446, interventions to slow down the spread of Covid-19 curtailed most social activities. Meanwhile, an economic crisis and resistance to the strict intervention measures are rising. Some researchers proposed intermittent social distancing that may drive the outbreak of Covid-19 into 2022. Questions arise about whether we should maintain or relax quarantine measures. We developed novel artificial intelligence and causal inference integrated methods for real-time prediction and control of nonlinear epidemic systems. We estimated that the peak time of the Covid-19 in the US would be April 24, 2020 and its outbreak in the US will be over by the end of July and reach 1,551,901 cases. We evaluated the impact of relaxing the current interventions for reopening economy on the spread of Covid-19. We provide tools for balancing the risks of workers and reopening economy.

Although as of May 1, 2020, the confirmed number cases of Covid-19 in the US has passed 1,062,446, non-pharmaceutical interventions such as strict self-quarantine for families, maintaining social distancing, stopping mass gatherings, and closure of schools and universities among others has dramatically slowed down the spread of Covid-19 and saved a large number of lives. However, public health interventions have restricted economic activities and caused high unemployment. Some investigators who published their mathematical projection of the dynamics of Covid-19 in Science suggested "prolonged or intermittent social distancing" which may drive the outbreak of Covid-19 into 2022 (1) . Meanwhile, MIT researchers questioned the "intermittent social distancing" policy and worried that relaxing public interventions may cause an exponential explosion of Covid-19 (2) . Now it is a critical decision point as to whether public health intervention measures should remain in place or should be lifted for reopening economy. Can we simultaneously improve both public health and economy? A key to correctly answering this question is to reconstruct the complex epidemic dynamic systems from the data, precisely predict the extent or duration of COVID-19, and develop a causal inference framework for devising practical implementable public health interventions to control the spread of Covid-19 in the US.

The basic mathematical models which underlying many statistical and computer methods for predicting the dynamics of the Covid-19 are the susceptible-exposed-infected-recovered (SEIR) models and their various versions (3) (4) (5) (6) . Although these epidemiological models are useful for estimating the dynamics of transmission, and evaluating the impact of intervention strategies, they have some critical limitations (7, 8) . First, the SEIR models assume a homogeneous population which is evenly mixed. Second, the epidemiological models consist of ordinary differential equations that have many unknown parameters. These parameters are not identified (9) , which leads to low accuracy and a wide range of predictions. Third, most models assume that some control parameters are constant and are not time varying and system dependent. This will dramatically limit our ability to simulate interventions and improve prediction accuracy.

To overcome these limitations, we developed an artificial intelligence (AI) and causal inference integrated intervention auto-encoder (IAE) to reconstruct nonlinear time-varying epidemic dynamic systems, model health intervention plan and make multi-step predictions of the response trajectory of the Covid-19 over time with multiple interventions (fig, S1) (10) .

Interventions include strict travel restriction, no large group gatherings, mandatory quarantine, restricted public transportation, and school closures. Similar to reproducing number in the epidemiological models, the various interventions are quantified as control variable taking values in the interval [0, 1]. A value of 1 for intervention indicates that intervention is the strongest and reproducing number R is close to zero. A value of zero for intervention variables indicates that no restrictions on social-economic activities are imposed. We assume that the time varying intervention variable is system dependent and can be automatically adjusted. As shown in Figure S1 , the IAE determines the intervention response (similar to counterfactual outputs) for a set of time varying and system adjusted interventions and evaluates the impact of different intervention strategies and their implementation times on curbing the spread of Covid-19 and provides timely selection of an optimal sequence of intervention strategies to balance public health and economy reopening.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 11, 2020 We first introduce the susceptible-exposed-infected-recovered (SEIR) model which is a mathematical compartmental model based on the average behavior of a population under study (1) . The SEIR model is defined as

where , E(t), I(t) and are the numbers of susceptible, exposed, infected and recovered (recovery or death) individuals at time , respectively, is the population size, and and are transmission, incubation and recovery rate at time , respectively.

Solving the differential equations (1)-(4), we obtain ,

where and are the initial values of , E(t), I(t) and .

For the convenience of discussion, is denoted by .The observed is a nonlinear function of history of , parameters and . Public health interventions such as social distancing, regional lockdowns, quarantine and intensive testing can change these parameters. In the classical SEIR and SIR model, we define the basic reproduction number as , (6) which measures the transmission dynamic properties.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 11, 2020 

were is the number of time lags.

The intervention measure can also be written as .

Single layer autoencoder (AE) is a three layer feedforward neural network (2) . The first layer is the input layer, the third layer is the reconstruction layer, and the second layer is the hidden layer. The input vector is denoted by , where is the number of cases at the time and is the public health intervention measure variable. The input vector is mapped to the hidden layer to capture the features of the transmission dynamics of Covid-19 with public health intervention.

AE attempts to generate an output that reconstructs its input by mapping the hidden vector to the reconstruction layer. The single layer AE attempts to minimize the error between the input vector and the reconstruction vector. We develop stacked autoencoders with 4 layers that consist of two single-layer AEs stacked layer by layer (2) . The dimensions of the input layer, the first hidden layer and the second hidden layer are 8, 32 and 4, respectively ( Figure   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint S1(a)). After the first single-layer AE is trained, we remove the reconstruction layer of the first single layer AE and keep the hidden layer of the first single AE as the input layer of the second single-layer AE. Repeat the training process for the second single-layer AE. The output of the final node that fully connects to the hidden layer of the second single-layer AE is the predicted number of cases and intervention measure .

The potential outcome framework that is also referred as the Rubin Causal Model (3) 

The IAE uses sequence-to-sequence multi-input/output architectures to model health intervention plan and make multi-step prediction of the response trajectory of Covid-19 over time with multiple interventions ( Figure S1 Figure S1 (b)). The algorithm for training and forecasting of IAE is summarized as follows.

Step 1. Initialization.

Randomly select for samples with time points. Using the data for US and all states and regions, we train the network. Repeat above procedure five times.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint

Step 2. After the networks are trained, for each sample and each window, divide into ten grids For each , train the network :

.

After the network is trained, for each sample, we calculate the prediction error .

. Select such that error is the smallest, i.e., and .

Step 3. Define the equation that is implemented by neural networks:

.

Train the network to estimate the parameters in the network, assuming that is estimated in step 2. In other words, we optimization the following problem:

.

Step 4. Using the trained autoencoder (1) as auto-encoder (2) . Predict using the formula:

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint

The trained IAE was used to forecast the future number of new or cumulative cases of Covid-19 for US and each state. The recursive multiple-step forecasting involved using a one-step model multiple times where the prediction for the preceding time step and intervention strategy were used as an input for making a prediction on the following time step ( Figure S1 (b)). For example, for forecasting the number of new confirmed cases for the one more next day, the predicted number of new cases and intervention measure in one-step forecasting would be used as an observational input in order to predict day 2. Repeat the above process to obtain the two-step forecasting. The summation of the final forecasted number of new or cumulative confirmed cases for each state was taken as the prediction of the total number of new or cumulative confirmed cases of Covid-19 in US.

The analysis is based on the surveillance data of confirmed and new Covid-19 cases in the US up to April 24, 2020. Data on the number of confirmed, new and death cases of Covid-19 from January 22, 2020 to April 24 were obtained from the John Hopkins Coronavirus Resource Center (https://coronavirus.jhu.edu/MAP.HTML).

A segment of time series with 8 days was viewed as a sample of data and segments of time series was taken as the training samples. One element from the time series and intervention data matrix is randomly selected as a start day of the segment and its 7 successive days were selected as the other days to form a segment of time series. Let be the index of the segment and be the column index of the matrix that was selected as the starting day. The segment time series can be represented as . Data were normalized to . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. S1 ). The average errors of 1-step, 5step and 10-step forecasting were 0.0035, 0.016 and 0.0012, respectively.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. To study the impact of relaxing intervention restrictions on the spread of Covid-19 in the US, we presented the results in Figure 1 . We considered four scenarios of interventions: scenario 1 followed current intervention measures, scenarios 2 and 3 relaxed 20% and 40% of the intervention measures, and scenario 4 increased 20% of the intervention measure, after April 25, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint 2020. Figure 1 showed that if we relaxed 40% of the intervention measure, the spread of Covid-19 would be over on August 7, with 1,869,185 cumulative cases (an increase of 317,284 cases or 20.4% of cumulative cases more than if the current intervention measure was followed) of Covid-19 in the US (table S3) . To avoid increasing the number of new cases, we can increase the number of coronavirus tests.

Public health interventions such as city lockdowns, traffic restrictions, quarantines, contact tracing, canceling gatherings and school closure will slow down the spread of Covid-19. Figure 3 plotted the intervention measure in the US under four scenarios of interventions as a function . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint of times starting with January 22, 2020 and ending with the end of September, 2020.

Intervention measure is a matric to quantify the degree of controlling infection. Figure 3 showed that the intervention curve started with a low intervention measure and then the trend of the intervention curve was, in general, increased until the end of February, 2020, when Spring break began. Spring break substantially reduced prevention measures and caused a large-scale outbreak of Covid-19 in the US. Then, the government implemented strict quarantine and social distance policies, and hence the intervention measure increased again. The average intervention measure of the US, 50 states and 5 other regions at the peak time was 0.54 (Table 1 and table S4 ). In other words, when the intervention measure was close to 0.5, interventions were sufficiently strong to decrease the number of new cases of Covid-19. Finally, the intervention measure steadily and quickly increased to 1 when the number of new cases rapidly deceased and the spread of Covid-19 was completely stopped. Figure 3 also showed that even if the intervention measure was assumed to decrease 40%, the intervention measure could still quickly and steadily increase to 1 and then the spread of Covid-19 would stop. Negative correlation coefficients indicated that increasing the intervention measure would decrease the number of new cases. To investigate the relationship between the intervention . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint measure and widely used reproduction number R(t), we first used SIR model to calculate the reproduction number R(t) and then calculate the Spearman correlation coefficient between the intervention measure and reproduction number R(t). We obtain the Spearman correlation coefficient of 0.585 between the intervention measure and reproduction number, using the number of new cases in the US and 50 states from April 1 to April 29.

In summary, this report have addressed several important issues in forecasting the is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint require causal inference as a basic tool for forecasting and evaluating the dynamics of Covid-19 in the US. We used counterfactual outcome as a general framework for modeling health intervention plan and making multi-step prediction of the response trajectory of Covid-19 over time with a sequence of public health interventions. As illustration, we evaluated four scenarios of interventions and predicted that if we relaxed 40% of intervention measure, the spread of Covid-19 would be over on August 7, with 1,869,185 cumulative cases (increased 317,284 cases or 20.4% of cumulative cases than following the current intervention measure) of Covid-19 in US. However, if we increased 20% of the intervention measure, for example, by increasing the ratio of coronavirus tests, the spread of Covid-19 would be over on July 23, with 1, 296,487 cumulative cases (reduced 16.5% of cumulative cases than following the current intervention measure).

The third issue is to simultaneously estimate the trajectory of the dynamics of Covid-19 and the intervention measure. We proposed to use intervention measure as a control variable that comprehensively quantified the public health interventions and incorporate the intervention measure as an input into the IAE model. Therefore, the IAE model jointly estimate the number of cases and intervention measure.

The four issue is interpretation of intervention measure. We could not investigate the impact of all individual elements of the interventions because many were introduced simultaneously across the US. If the individual intervention data are available, the IAE model can quantify the effect of the specific intervention on the controlling spread of Covid-19. The widely used quantity to characterize the transmission of dynamics is the reproduction number R. We found that the correlation coefficient between the intervention measure and reproduction number was 0.585.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint Figure 1 . The reported and forecasted curves of newly confirmed cases of Covid-19 in the US with three scenarios of interventions as a function of time , starting date from January 22, 2020. Scenario 1 followed the current intervention measure, scenarios 2 and 3 relaxed 20% and 40% of the intervention measure, and scenario 4 increased 20% of the current intervention measure.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.04.20091272 doi: medRxiv preprint Table S1 . Errors of the 10-Step ahead predicting the number of cumulative cases of Covid-19 in the US.

1-step 2-step 3-step 4-step 5-step 6-step 7-step 8-step 9-step 10- 

Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period

Quantifying the effect of quarantine control in Covid-19 infectious spread using machine learning

Early transmission dynamics in Wuhan, China, of novel Coronavirus-infected pneumonia

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study

A Time-dependent SIR model for COVID-19 with Undetectable Infected Persons

Reporting, epidemic growth, and reproduction numbers for the 2019 novel coronavirus (2019-nCoV) epidemic

Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model

An open challenge to advance probabilistic forecasting for dengue epidemics

Why is it difficult to accurately predict the COVID-19 epidemic?

A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines

Mathematical modeling of epidemic diseases

 /2020 6809  461  374  1728  1129 734  4246  107  478  989  356  5/24/2020 6111  419  344  1634  1128 694  4100  95  412  871  321  5/25/2020 5478  379  314  1542  1125 655  3944  84  361  771  289  5/26/2020 4788  342  286  1452  1118 616  3780  74  319  684  259  5/27/2020 4119  307  260  1363  1107 578  3609  65  283  607