key: cord-1008495-boa0n9dz authors: Liu, Ziyue; Guo, Wensheng title: Government Responses Matter: Predicting Covid-19 cases in US under an empirical Bayesian time series framework date: 2020-03-30 journal: nan DOI: 10.1101/2020.03.28.20044578 sha: e3aafde5f8122763f92982f5b4361bab68348a63 doc_id: 1008495 cord_uid: boa0n9dz Since the Covid-19 outbreak, researchers have been predicting how the epidemic will evolve, especially the number in each country, through using parametric extrapolations based on the history. In reality, the epidemic progressing in a particular country depends largely on its policy responses and interventions. Since the outbreaks in some countries are earlier than United States, the prediction of US cases can benefit from incorporating the similarity in their trajectories. We propose an empirical Bayesian time series framework to predict US cases using different countries as prior reference. The resultant forecast is based on observed US data and prior information from the reference country while accounting for different population sizes. When Italy is used as prior in the prediction, which the US data resemble the most, the cases in the US will exceed 300,000 by the beginning of April unless strong measures are adopted. When facing an epidemic, people and government of a country may underestimate its seriousness in the beginning but will eventually step up their responses. Hence the case numbers tend to increase exponentially in the early stage, while the trends will gradually bend and plateau. Therefore, similarities in the case number trajectories can be observed in different countries, though the timing and severity can differ substantially due to different responses. Figure 1 displays the trajectories of total Covid-19 case numbers for China, S. Korea, Italy, France, Iran, Germany, Spain and USA using Johns Hopkins data. These countries have more days from time zero than US, where time zero is defined as first day with 100 or more (100+) cases as a heuristic but widely used choice 1 . The curve of South Korea increased rapidly early on but quickly bended and plateaued, for which S. Korea's swift and deterministic policy responses are credited 2 . China exhibits similar but later flattening pattern, which agrees with its missing early intervention window, but later extreme lockdown policy implementation 3 . On the other hand, the cases in Italy and France have grown exponentially until recent days, which have partially been attributed to their late and weak policy responses 4 . The US trajectory is almost linear on the logarithm scale. While the US government is catching up with policies such as work/study from home, social distancing and self-quarantine, the effect has not seen in the trajectory. Existing Covid-19 forecasting are extrapolations into the future time [5] [6] [7] [8] [9] [10] [11] . Their validity relies on the crucial but unrealistic assumption that the future trajectories are completely determined by the history. This by design cannot incorporate government responses yet to come. Not surprisingly, these predictions can be off the target. For example, Fanelli and Piazza 7 these forecasting are mainly the susceptible-infected-removed (SIR) models and its variants [5] [6] [7] [8] . Others include state transition model 9 , parametric growth curve models such as logistic curves 10 , and auto regressive integrated moving average (ARIMA) models 11 . We propose an empirical Bayesian time series framework to forecast the US trajectory by Based on the estimated parameters using the eight countries, our next task is forecast the US cases while incorporate one of the countries as the prior information. This is done through constructing conditional state space model from the functional mixed effects model conditional on the observed data of the specified country 13 . By running the Kalman filter forward on the conditional state space model with the US time series data and into the future, the results are the posterior prediction incorporating both the prior information from the specific country and the observed US data. As the reference country is only specified as the prior, the posterior can be substantially different from the prior, suggesting strong deviation from the reference country. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In addition, the observed US data can be substantially different from posterior prediction, indicating that the US case are following a different trajectory because of different policy responses. More technical details are given in the Supplement. The Johns Hopkins University CSSE data were downloaded from its GitHub repository (https://github.com/CSSEGISandData/COVID-19). We modeled the natural logarithms of the case numbers as the outcome. The data were then used for prediction using the proposed method. After the posterior means and variances were calculated and the 95% prediction intervals were constructed, they were taken exponential to transform back to the original scale. The whole data analysis from reading in the data to plotting the results took less than 10 seconds on personal computer with Intel® Core™i76600U CPU @ 2.60GHz, 2801Mhz, 2 Cores, 4 Logical Processors. Results based on US data up to March 26 th , 2020 are shown in Figure 2~4 . Two important observations can be made from these figures. There is no apparent slowing down yet for US trajectory based on either the observed trend or predicted trend. This indicates that US is still in its exponentially increasing phase in the near future. Figure 2 displays the results using Italy as prior. It shows that US and Italy have similar patterns and majority of the observed US data are in the 95% prediction intervals. This suggests that the trajectory in Italy serves as a good prior for the US prediction. Based on this prediction, on the next day as March 27 th , 2020, US may have as many as 108,595 cases. In about 10 days, the US case number will exceed 300,000 around April 4 th , 2020 shall the US policy responses have similarly effects as Italy. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 30, 2020. . Figure 3 displays the results using China as prior. It shows that the observed US case numbers are already higher than the predicted values. Even if the US policy responses have similar effect as China, US case numbers will exceed 150,000 around April 11 th , 2020. The results using South Korea as prior are displayed in Figure 4 . US case numbers are predicted to exceed 200,000 around April 6 th , 2020. Since the observed US data are already well above the upper bound of the 95% prediction intervals, the data from China and South Korea are not good priors for the US prediction, suggesting that the situation in the US will be much worse than those in China and South Korea. We have proposed a new prediction method for predicting total COVID-19 cases of US by incorporating the information from other countries. While we demonstrated our method in predicting US cases, our method can be used for predicting state-by-state data as well as hospital-by-hospital data. Our prediction intervals are much smaller than most exiting methods due to the additional information from the reference country. We show that the current trajectory in US is most similar to that in Italy. The stronger response from Italy has led to slowing down of the spread in the last few days, while the effect of social distancing in the US has not shown in the observed data. It is well-known that there are serious under-reporting or under-detection of cases in various countries and under-reporting rates may be very different across counties. This can contribute to substantial differences in the trajectories. With the advance of testing techniques, more and more people are tested in the US. This may also explain why the reported cases in the US are substantially higher than other countries in the same stages. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The maximum likelihood parameter estimates were ̂= (0.34,306.69,0.09,1.40,5.81, 0.05). We adopt an empirical Bayes approach such that these parameters are treated as known in the following steps. For the ℎ reference country, the conditional SSM was constructed on the state vectors dimension of six as = ( ( ) ′( ) ( ) ′ ( ) ( ) ′ ( )) , where subscript 'US' denote US-specific component. The working data are ̃= −̂log( ). The observation All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 30, 2020. . These charts show how fast coronavirus cases are spreading -and what it takes to flatten the curve South Korea is reporting intimate details of COVID-19 cases: has it helped? What China's coronavirus response can teach the rest of the world? Italy's coronavirus response is a warning from the future. The Atlantic Prediction of the COVID-19 outbreak based on a realistic stochastic model An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China Analysis and forecast of COVID-19 spreading in China, Italy and France Estimation of the final size of the COVID-19 epidemic The prediction for development of COVID-19 in global major epidemic areas through empirical trends in China by utilizing state transition matrix model Probabilistic prediction of COVID-19 infections for China and Italy, using an ensemble of stochastically-perturbed logistic curves References for the supplement Functional mixed effects models Functional models using smoothing splines, a state space approach A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem Dynamic state space models No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted Step 1. The forward filtering: for = 1, … ,Step 2