key: cord-1021996-ifkvqwr1 authors: Basu, Sayantani; Campbell, Roy H. title: Going by the Numbers : Learning and Modeling COVID-19 Disease Dynamics date: 2020-07-20 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.110140 sha: 64eeabcfa44fa42afe958207341e40eab6ea4eca doc_id: 1021996 cord_uid: ifkvqwr1 The COrona VIrus Disease (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) has resulted in a challenging number of infections and deaths worldwide. In order to combat the pandemic, several countries worldwide enforced mitigation measures in the forms of lockdowns, social distancing, and disinfection measures. In an effort to understand the dynamics of this disease, we propose a Long Short-Term Memory (LSTM) based model. We train our model on more than four months of cumulative COVID-19 cases and deaths. Our model can be adjusted based on the parameters in order to provide predictions as needed. We provide results at both the country and county levels. We also perform a quantitative comparison of mitigation measures in various counties in the United States based on the rate of difference of a short and long window parameter of the proposed LSTM model. The analyses provided by our model can provide valuable insights based on the trends in the rate of infections and deaths. This can also be of help for countries and counties deciding on mitigation and reopening strategies. We believe that the results obtained from the proposed method will contribute to societal benefits for a current global concern. The outbreak of a novel coronavirus which spread from China in December 2019 led to severe health problems and fatalities worldwide, eventually turning into a global pandemic. The virus was named the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) and the disease caused due to it was named the COrona VIrus Disease . The most common symptoms are cough and fever, although symptoms range in severity including pneumonia-like symptoms and dyspnea [1] . It was confirmed that COVID-19 spread through infected respiratory droplets transmitted from an infected person to a healthy person [2] . Other symptoms in humans have been studied, including relations to the gastrointestinal system [3] and skin rashes [4] . More recent studies have focused on other possible zoonotic pathways, including other species that can possibly contract COVID-19 [5] . Till date, there is no definite medical treatment for COVID-19, although global efforts are in progress for development of a vaccine or specific drugs. Current methods for treatment have involved the use of antiviral medications commonly used for viral diseases [6, 7, 8] . Healthcare professionals were given special instructions and equipment in order to treat patients who tested positive for COVID-19. There was also a surge in demand for PPE (Personal Protective Equipment), especially for frontline healthcare workers as well as for ventilators in hospitals [9, 10] . Countries all over the world executed various mitigation measures for the COVID-19 pandemic, including social distancing, enforcing total lockdowns, and advising people to take precautions like wearing face masks and washing hands. These measures were implemented based on projection studies and statistical models that predicted the disease spread [11, 12, 13] . Usually individuals showing common COVID-19 symptoms were asked to self-isolate for two weeks and seek medical help if health conditions worsened [14, 15, 16] . COVID-19 was generally observed to affect specific groups of people in populations [17, 18] , as well as people who have pre-existing medical conditions [19, 20] . It was noted that people may be capable of transmitting the virus despite being completely asymptomatic or in other words, testing positive for COVID-19 without any visible symptoms [21] . This may have expedited the spread of the disease [22] as certain individuals may have been unaware of their conditions and symptoms and transmitted the disease while socializing before they medically tested positive for COVID. In the present work, we propose an LSTM based approach to understand and evaluate the effectiveness of mitigation measures at both the country and county levels. The rest of this paper is organized as follows: Section 2 discusses the related work, Section 3 discusses the proposed method, Section 4 discusses the results, and Section 5 concludes the paper and suggests future work. The spread of COVID-19 led to global research efforts to understand the disease and its possible implications. We focus on discussing related work with regard to COVID-19 disease dynamics, especially studies involving simulations and computational learning algorithms. Several studies have focused on projections of how the infections are likely to spread based on statistical analyses. Neher et al. [23] have proposed a seasonal transmissability model based on SIR [24] , a class of mathematical epidemiological models to predict the trajectory of the virus re-infecting sub-populations in the world in future years. They have taken into account the infection, emigration, and population turnover rates respectively as part of their proposed model. A refined version of their model [25] considers visualizing the projected hospital resources necessary in order to support a pandemic like COVID-19. Another similar hospital impact model CHIME (COVID-19 Hospital Impact Model for Epidemics) developed by Penn Medicine [26] suggests the projection of the hospitalized, ICU (Intensive Care Unit), and ventilated patients respectively for Penn hospitals. They have used a statistical model based on measures like social contact, hospitalization rate, and probability of detection. With regard to COVID-19, there have been several machine learning based models proposed by various research teams globally in an effort to understand the different facets of COVID-19. Hu et al. [27] proposed a machine learning model based on stacked auto-encoders to forecast cumulative COVID-19 cases in China till April 2020 based on past cumulative data. They also grouped the cities and provinces in clusters based on the features extracted from their proposed auto-encoder model. Similar LSTM approaches have also been explored in order to understand patient related data [28] as well as to study epidemic trends in China [29] . All COVID-19 forecast and projection models proposed so far have been more focused on graphical analyses and future curve prediction. However, in our proposed model we quantitatively evaluate the degree to which mitigation measures are working based on LSTM predictions. Wang et al. [30] have proposed an "Ontology-based Side-effect Prediction Framework (OSPF)" for evaluating the "Traditional Chinese Medicine (TCM)" model using deep learning. They have trained an Artificial Neural Network (ANN) model with an architecture composed of three hidden layers on the TCM prescriptions used for the treatment of COVID-19 patients. They have also tested the validity of their OSPF model to evaluate the effectiveness and safety of use of TCM prescriptions for other flu-like diseases as a potential source of treatments for COVID-19. Their proposed model classifies a given TCM prescription as 'Safe' or 'Unsafe'. They have identified seven TCM prescriptions that should take precedence over others while treating COVID-19 patients. Wang et al. [31] have proposed BI-AT-GRU, a GRU (Gated Recurrent Unit) based model with bidirectionality and attention in order to classify six different respiratory patterns. They have specifically observed that abnormal breathing patterns in patients diagnosed with COVID-19 arise significantly due to Tachypnea, which is characterised as abnormally rapid and shallow breathing. They show that their proposed model is able to distinguish respiratory patterns of 'Tachypnea', 'Central-Apnea', 'Bradypnea', 'Eupnea' , 'Cheyne-Stokes', and 'Biots' with high F-1 scores. Song et al. [32] have proposed a deep learning model for studying CT (Computer Tomography) images of COVID-19 patients. They collected lung CT images from COVID-19 patients, bacterial pneumonia patients, and healthy people JULY 18, 2020 from two hospitals in China. Their proposed framework DeepPneumonia has an AUC (Area Under Curve) metric of 0.99 and sensitivity of 0.93 with regard to distinction of COVID-19 from other diseases. Wang et al. [33] have proposed a CNN (Convolutional Neural Network) based model 'Inception Migration Neuro Network' to identify the radiographical features in volumetric chest CT images of patients obtained from two medical institutions in China. The aim of their proposed model is to identify whether COVID-19 is present or not in the CT images of the patients. They have evaluated AUC and F-1 metrics on their model and have claimed that their method is not invasive and is low cost. Li et al. [34] have proposed a deep learning framework 'COVID-19 detection neural network' or 'COVNet' to distinguish CT scans of COVID-19 patients from those of 'CAP (Community Acquired Pneumonia)' and other nonpneumonia diseases. They have proposed a three-dimensional model involving ResNet50, max-pooling, and a fully connected layer to identify the presence of COVID-19 in the images. Sethy and Behera [35] have proposed a similar machine learning model for X-Ray images based on ResNet50 and SVM (Support Vector Machines) in order to identify patients with COVID-19. They have evaluated the efficiency of their proposed model using False Positive Rate (FPR), F1 score, Kappa, and MCC (Matthews Correlation Coefficient) metrics. In the present work, we explore the possibilities of using an LSTM-based approach to learn patterns in COVID-19 data. Statistical learning models require thorough understanding of data and coefficient tuning in order to fit models suitable for prediction. However, compared to statistical learning, machine learning approaches involve allowing the model to automatically learn complex patterns from the data based on the model constructed and tuned hyperparameters. This is particularly useful for handling epidemiological time-series data like that of COVID-19 where the cumulative curves of infections and deaths vary with respect to time and place. Long Short-Term Memory Networks (LSTMs), originally proposed by Hochreiter and Schmidhuber [36] , are a special kind of Recurrent Neural Networks (RNNs) that have the ability of capturing long-term dependencies. In this paper, we apply LSTMs to study the disease dynamics of COVID-19, specifically in forecasting cumulative cases and deaths. Most importantly, the capabilities of LSTMs allow us to gain a deeper understanding of where the model stands in terms of future predictions and will serve as an aid in taking important healthcare and mitigation measures. For the purpose of training, validation and testing, we utilize data from the Johns Hopkins University repository [37] . For experimental purposes, we consider data over the period from January 22 to June 30. The dataset provides time-series data of COVID-19 confirmed cases and deaths. We separately consider each of these scenarios, as well as look into more fine-grained data at the county-level in order to analyse the effectiveness of mitigation measures. Separating time series data into train, validation, and test sets requires care since all the temporal relations have to be maintained. Since COVID-19 is an infectious viral disease, it spreads over time. This means that we can predict the infections/deaths at a particular point of time based on preceding values. It is also essential to monitor the generalization capabilities of the model and select hyperparameters for the model accordingly. In order to handle this, we partition the data based on time for training and validation as shown in Figure 1 . The data is separately trained on three different train and validation splits and hyperparameters with minimum validation root mean squared error (RMSE). The hyperparameters we specifically focus on tuning are the lookback and lookahead. The split of train and test data for the tuned model is approximately maintained at an 80:20 ratio based on the initial lookback and lookahead of 1 and 1 respectively, although the model is capable of handling other splits based on the data. The smaller splits of train and validation sets are approximately maintained at a 60:20 ratio. Intuitively, an increase in the lookback and lookahead parameters reduces the number of training samples and testing samples respectively and leads to erroneous results. For experimentation purposes, we test lookback in the range of 1 and 7 and lookahead in the range of 5 and 7 for reasonable future predictions, while adhering to the index constraints on individual train, validation, and test data. For the proposed LSTM model, Keras [38] was used as the framework with TensorFlow as backend. In all our experiments, we train the LSTM for 10,000 epochs with 10 layers and a batch size of 10. All normalization is performed on the log scale and Adam [39] is used as the optimizer with a learning rate of 0.001. The window parameters we specifically focus on in this case are the lookback and lookahead parameters which are tuned based on the data of a particular location. In this case, the lookback parameter is indicative of the window size in terms of the number of consecutive days the LSTM needs to "look back" at in order to make a reasonable prediction. The lookahead parameter indicates the number of days in advance that the system needs to "look ahead" in order to make a prediction for a future date. Both the lookback and lookahead parameters are used by carrying out index slicing on the respective training, validation, and test data and can be formulated as data[:-(lookahead)] split into sets of lookback days based on time for the model input and data[(lookback+lookahead):] for the model output. In general, the model is able to provide future predictions of cumulative confirmed cases/deaths. However, from a research perspective, we plot the predictions obtained after training on the suitable hyperparameters by comparing them with the actual values of confirmed cases/deaths observed on those specific days for obtaining useful interpretations. The detailed analyses of the results and corresponding figures are discussed in Section 4. Our analyses based on our proposed LSTM model are divided into two main parts: the prediction models that show predictions of COVID-19 confirmed cases/deaths and evaluations on how well social distancing/mitigation measures are working in the present scenario. In all our plots, dates in January, February, March, April, May, and June are prefixed with 'J', 'F', 'M', 'A', 'MA', and 'JU' respectively followed by the day of the month. As aforementioned, all modeling is carried out based on the data provided for a specific location. We discuss our results on two levels: country-level and county-level data on COVID-19 confirmed cases and deaths. In addition to plotting the predicted values and actual values, we also plot a baseline in all figures that is calculated as the moving average over the actual values in the lookback window. Overall, it was observed that it was harder to predict the number of comfirmed cases compared to the number of deaths. It was also observed in general that shorter values of lookback and lookahead resulted in lower values of both the train and test RMSE. However, in the present situation, it is of more interest to discuss the interpretation of what the LSTM predictions are indicative of instead of purely focusing on the values of the RMSE. In this section, we discuss our results on five globally affected locations: (a) the United States of America, (b) Italy, (c) India, (d) Japan, and (e) Hubei, China. Figure 2 shows the plot of predictions for COVID-19 confirmed cases in the United States of America. Figure 3 shows the plot of predictions for COVID-19 deaths in United States of America. The United States had different lockdown and phased reopening strategies for the various states [40] . Predictions on cases indicate the cumulative cases are still increasing. The actual COVID-19 cases are also rising. The predictions for deaths have a flattening trend. The trends indicate more mitigation measures may be needed. Figure 7 shows the plot of predictions for COVID-19 deaths in India. India enforced nationwide lockdowns in several phases following the order issued on March 24 [42] . COVID-19 predictions for cases indicate cumulative cases are still increasing, which is also a trend indicated by the actual cases. Predictions on deaths indicate a flattening trend. The trends show that stronger mitigation measures need to be enforced to reduce infections. Figure 8 shows the plot of predictions for COVID-19 confirmed cases in Japan. Figure 9 shows the plot of predictions for COVID-19 deaths in Japan. Japan implemented a series of measures in March 2020 to control the spread of COVID-19 [43] . The slight surge in actual cases compared to the flattening of the LSTM predictions indicates that the mitigation measures are yet to fully work. The LSTM predictions show a similar trend as the confirmed cases, however, here the COVID-19 deaths in Japan show a flattening in the curve that is also indicated by the LSTM predictions. Figure 11 shows the plot of predictions for COVID-19 deaths in Hubei, China. Wuhan in Hubei, China was the location where the first cases of COVID-19 were observed [23] . China enforced mitigation measures in January [44] . Based on both the LSTM predictions and actual values, the COVID-19 cases seem to have flattened, suggesting that mitigation measures have worked. The trends in the LSTM predictions as well as the actual number of deaths indicate a flattening in the number of deaths. However, the difference suggests that some enforcement may still be needed before the mitigation measures are completely lifted. In this section, we show predictions of our proposed LSTM model on eight affected counties within the United States of America: (a) Los Angeles, California, (b) Dallas, Texas, (c) Hillsborough, Florida, (d) Maricopa, Arizona, (e) King, Washington, (f) Fulton, Georgia, (g) Cook, Illinois, and (h) New York City, New York. Our discussion is based on various lockdown and phased reopening measures in such states [45] . Figure 12 and Figure 13 show predictions for COVID-19 cases and deaths respectively in Los Angeles, California. The cases and deaths depict an increasing trend that is also indicated by the predictions. In this situation however, there is a small divergence between the predicted and actual curves, which shows that more mitigation measures may need to be introduced. Figure 17 show the predictions cases and deaths respectively in Hillsborough, Florida. The cases depict an increasing trend shown by the actual as well as predicted values. In the case of deaths, significant divergence is observed between the predicted and actual plots, which denotes that more mitigation measures need to be enforced. Figure 23 show the predictions for cases and deaths respectively in Fulton, Georgia. As indicated by the graph, the mitigation measures are working well in terms of deaths but not in the case of infections which is shown by the divergence between the actual and predicted curves during the last week of June. We now aim at understanding how well the current mitigation measures are working. Various countries placed nationwide lockdowns, while others implemented phased strategies based on affected individual locations. The goal is to understand how social distancing had made the situation better or worse. The analyses provided are based on our proposed LSTM model. In order to do this, we consider a shorter and longer window size. In this context, we fix the lookback parameter as 1 and use the line of best fit on the rate of change of the difference between the smaller lookahead parameter of 1 and a larger lookahead parameter of 5. The analyses can also be carried out on individual parameters, however, for experimental purposes we choose to compute the rate on the difference in order to obtain a smoother curve for the purpose of this work. Again, the model can be used for this type of analyses for the near future as well, which will be valuable for nations or specific locations to decide whether to tighten or loosen restrictions based on the trend of how well their mitigation policies will work in the near future. In order to quantify and evaluate how well mitigation and social distancing is working, we compare the slope, intercept and RMSE values (with respect to of the line of best fit for each of the counties as shown in Table 1 . It is important to note that COVID-19 death rates are decreasing or nearly stabilized as shown by the eight counties in the United States. This happened due to the rapid increase in testing as well as improved treatment in hospitals, which reduced the mortality rate and increased the recovery rate. Table 2 evaluates the success of mitigation measures based on COVID-19 infections and deaths respectively, where indicates mitigation measures being successful and indicates mitigation measures not being successful. Table 2 provides an overview of the success of mitigation measures based on the analyses in Table 1 . Therefore, various effects of mitigation measures have been observed in different parts of the United States which may require additional measures like rapid testing, contact tracing, and phased reopening depending on the situation in various places before the entire nation resumes a normal lifestyle. The COVID-19 pandemic affected countries globally and led to an unprecedented number of infections and deaths. Countries took vital steps in mitigating the pandemic by enforcing lockdowns, social distancing, and a variety of other mitigation measures. In this paper, we propose an LSTM based learning model that can learn from the cumulative rise in COVID-19 confirmed cases and deaths and provide valuable insights on how well mitigation measures are working quantitatively in terms of the rate of infections and deaths. The predictions of the model can be helpful for countries deciding to make important decisions regarding the effects of currently implemented mitigation measures and aid in making plans for reopening various places. We provide analyses at both the country and county levels. Future extensions of this work include implementing and understanding how this model can be transferred to studying the disease dynamics of other similar pandemics. ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19)-China World Health Organization et al. Modes of transmission of virus causing COVID-19: implications for IPC precaution recommendations: scientific brief COVID-19: gastrointestinal manifestations and potential fecal-oral transmission Cutaneous manifestations in COVID-19: a first perspective Discovering drugs to treat coronavirus disease 2019 (COVID-19) COVID-19: combining antiviral and anti-inflammatory treatments Of chloroquine and COVID-19 Critical supply shortages-the need for ventilators and personal protective equipment during the COVID-19 pandemic World Health Organization et al. Rational use of personal protective equipment for coronavirus disease (COVID-19): interim guidance A simple planning problem for COVID-19 lockdown COVID-19 Projections (IHME) Critical community size for COVID-19-a model based approach to provide a rationale behind the lockdown COVID-19: a remote assessment in primary care Virtually perfect? Telemedicine for COVID-19 How will country-based mitigation measures influence the course of the COVID-19 epidemic? The Lancet Severe outcomes among patients with coronavirus disease 2019 (COVID-19)-United States COVID-19 in children: initial characterization of the pediatric disease The impact of COPD and smoking history on the severity of COVID-19: a systemic review and meta-analysis Cardiovascular disease and COVID-19 Presumed asymptomatic carrier transmission of COVID-19 Asymptomatic carrier state, acute respiratory disease, and pneumonia due to severe acute respiratory syndrome coronavirus 2 (SARSCoV-2): facts and myths Potential impact of seasonal forcing on a SARS-CoV-2 pandemic Contributions to the mathematical theory of epidemics-I COVID-19 Scenarios COVID-19 Hospital Impact Model for Epidemics Artificial intelligence forecasting of COVID-19 in China Machine Learning approach for confirmation of COVID-19 cases: positive, negative, death and release. medRxiv Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions Evaluating the traditional Chinese medicine (TCM) officially recommended in China for COVID-19 using ontology-based side-effect prediction framework (OSPF) and deep learning Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images. medRxiv A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19). medRxiv Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest ct Detection of coronavirus disease (COVID-19) based on deep features Long short-term memory Johns Hopkins COVID-19 Data Repository The python deep learning library. ascl Adam: A method for stochastic optimization The COVID-19 pandemic: Government vs. community action across the United States COVID-19 in Italy: momentous decisions and many uncertainties. The Lancet Global Health India's lockdown Governance, technology and citizen behavior in pandemic: lessons from COVID-19 in East Asia Unprecedented disruption of lives and work: health, distress and life satisfaction of working adults in China one month into the COVID-19 outbreak US Reopening of 50 States This project has been funded by the Jump ARCHES endowment through the Health Care Engineering Systems Center. Code for the proposed method is available here : https://github.com/sayantanibasu/covid19-models