key: cord-0789396-8d366jiq authors: Bednarski, Bryan P; Singh, Akash Deep; Jones, William M title: On collaborative reinforcement learning to optimize the redistribution of critical medical supplies throughout the COVID-19 pandemic date: 2020-12-09 journal: J Am Med Inform Assoc DOI: 10.1093/jamia/ocaa324 sha: a132571e45494dcc8a85d948e301fd866e39849b doc_id: 789396 cord_uid: 8d366jiq OBJECTIVE: This work investigates how reinforcement learning and deep learning models can facilitate the near-optimal redistribution of medical equipment in order to bolster public health responses to future crises similar to the COVID-19 pandemic. MATERIALS AND METHODS: The system presented is simulated with disease impact statistics from the Institute of Health Metrics (IHME), Center for Disease Control, and Census Bureau[1, 2, 3]. We present a robust pipeline for data preprocessing, future demand inference, and a redistribution algorithm that can be adopted across broad scales and applications. RESULTS: The reinforcement learning redistribution algorithm demonstrates performance optimality ranging from 93-95%. Performance improves consistently with the number of random states participating in exchange, demonstrating average shortage reductions of 78.74% (± 30.8) in simulations with 5 states to 93.50% (± 0.003) with 50 states. CONCLUSION: These findings bolster confidence that reinforcement learning techniques can reliably guide resource allocation for future public health emergencies. The COVID-19 pandemic has challenged nations with shortages of medical supplies to combat the disease. In Northern Italy doctors rationed equipment and decided which patients to save [4] . In the United States a few public health officials worked together to share their resources. However, as entities lack a unified system for equipment redistribution, they resorted to rudimentary techniques such as phone calls and press-releases to request help. This reactionary response strongly indicates a non-optimal redistribution of equipment [5, 6] . Here we investigate how officials can share resources more optimally when facing future public health crises. To evaluate our redistribution system, we created a simulation environment with data from the Institute of Health Metrics (IHME), Center for Disease Control, and Census Bureau [1, 2, 3] . Preprocessed data is fed into a custom neural network inference model to predict the future demand of ventilators in each state. These predictions are fed into one of five redistribution algorithms to control daily actions. We evaluate our custom demand prediction model and use its output to test redistribution algorithms. For each algorithm we determine the average performance in a series of simulations in which 5, 20, 35, or 50 participating states are selected at random. We find that the q-learning redistribution algorithm outperforms all other methods while maintaining a high degree of optimality. Additionally, as the number of participating states increases, the algorithm's performance improves consistently in terms of both shortage reduction and reliability. The three-stage pipeline shown in Figure 1 is proposed. First, observed input data is preprocessed. Second, a deep learning inference model predicts future demands. Finally, a preselected redistribution algorithm interprets demand predictions to determine daily actions. The second and third stages of the pipeline are optimized separately each day of the simulation. We frame this redistribution system as an optimization problem to minimize the total ventilator shortages accumulated over the simulation period. Shortages are incurred each day by states that have fewer ventilators in supply than demand. Input parameters dictate the dates for which to run the simulation and the number of random states selected. No new ventilators enter the simulation after initialization, though our system maintains the ability to distribute new supplies from any state or central agent as needed. (Supplementary Appendix A -complete mathematical formulation) In the preprocessing stage, critical disease metrics are primarily drawn from the University of Washington Institute for Health Metrics and Evaluation's (IHME) widely cited COVID-19 tracking program [7, 8, 9] . We include biweekly CDC metrics for state-wise deaths above average to assist our inference model in overcoming statistical biases that result from regional differences in the number of COVID-19 tests performed [10] . Furthermore, we include fixed values for statewise rates of heart disease, asthma, COPD and diabetes from the 2017 Census Bureau to account for varying comorbidities [3, 11] . We preprocess data between February 4th, 2020 and August 4th, 2020. Linear interpolation estimates missing fields and guarantees continuity [12] . (Supplementary Appendix B -input data field descriptions) We make two statistical assumptions for a robust simulation environment. First, that the number of available ventilators in a state is equivalent to the number of COVID-ICU beds. The use of a proxy variable is made necessary by the lack of a system for tracking and reporting ventilators in hospitals by state. However, previous studies showed that approximately half of COVID-ICU patients required mechanical ventilation in the early stages of the pandemic [13, 14] . The low scale of this ratio allows us to assume ICU-bed data as a proxy for ventilators and expect simulation performance to be an accurate indicator of the model's extensibility to the real-world. A second statistical assumption defines the model for the logistics downtime of redistributed ventilators. Incurred delays are randomly sampled from a gaussian i.i.d (mean: 3 days, stdev: 0.5) and rounded to the nearest two (~16% of total), three (~68%), or four (~16%) days. The distribution's lower bound is based on Health and Human Services' report that ventilators from emergency stockpiles can be available nationwide in between 24-36 hours. The upper bound includes a considerable time buffer for cleaning and logistics. The pipeline's second stage infers daily future state-wise ventilator demands at the mean redistribution delay interval. The repeating, time-series nature of regional COVID-19 peaks makes recurrent neural networks (RNNs) a good fit for predicting equipment demands. Long short-term memory (LSTM) RNNs have been demonstrated to perform exceedingly well in similar non-seasonal, multivariate, time-series prediction applications [15, 16] . However, it is generally difficult to train a deep learning model to accurately predict future demands without training data from a previous pandemic. In response, we pretrain the LSTM before simulation with a small amount of available data and retrain daily using observations to achieve good online performance. Primary simulations run for 154 days from March 1 st -August 1 st , 2020. Data is available from February 4 th , allowing 26 days of processed observations to be used for pretraining the LSTM, which takes series of 14-day observations from states to make predictions. From the available 26 days we compile a set of all unique 14-day series from each state and pretrain the LSTM for 150 epochs. Each day of operation allows a unique series from each state to be aggregated into the set for retraining (100 epochs). Observations from all 50 states are considered even if fewer participate in exchange. In simulation the LSTM demonstrates a predictive root-mean-square error (RMSE) of 104.74 ventilators per day across all states with a prediction interval of 3-days in the future. The prediction interval is set to the mean logistics delay to provide our redistribution algorithm with the most accurate demand estimates for when its actions are expected to take effect. For comparison, we implemented a univariate Holt exponential smoothing model with the same prediction interval and observed a RMSE of 545.82 [17] . As a baseline we present the naïve case for which the current demand in each state is used as the three-day prediction and observe a RMSE of 1,186.13. (Supplementary Appendix C -inference model specifications and analysis) The final stage is the redistribution algorithm to determine daily actions (state-to-state transfer). Three intuition-based algorithms and two reinforcement learning (RL) algorithms are implemented. We compare algorithmic performance to a baseline case where no ventilators are exchanged (begin/end with initial supplies). Intuition-based approaches allocate surplus ventilators to states by the magnitude of their predicted needs with the following policies: maximum needs first, minimum needs first, or random order. These models demonstrate gains over the status-quo that would be achieved if the system consistently applies non-learning algorithms. The two RL algorithms are value-iteration and q-learning. Both follow similar dynamic programming principles by iterating over possible actions and future supply distributions to optimize daily actions [18, 19] . The primary difference between these approaches is that the qlearning algorithm follows a look-up table (pre-defined, continually updated) to evaluate actions, while value-iteration recursively explores all possible actions until convergence to provide a map for the most valuable action in every scenario. This study bolsters confidence that deep learning and reinforcement learning models can be used to facilitate the efficient redistribution of medical supplies during public health crises. System performance improves with the number of participating states and is significant with only a few states participating. Q-learning is outperformed by the Allocate Maximum First algorithm with five participating states because intuition-based algorithms are well-suited for the logical simplicity of these simulations. In fact, the baselines demonstrate peak optimality in five-state simulations and degrade with task complexity, while q-learning improves. High standard deviations in q-learning's shortage reduction with fewer states is attributable to simulations where excessive demand from larger states cannot be satisfied by the supplies from smaller participating states. As the number of states increases, these adverse cases become less likely and performance deviations reduce dramatically. This conclusion is supported by q-learning's consistently high degree of optimality, showing that the algorithm takes near-optimal actions even when the overall reduction is not exceedingly high. This system could assist officials in managing future public health crises if safety controls are agreed upon and implemented prior to operation. Additionally, algorithm performance for future applications would improve as data collected from this pandemic is used to train that system's models. Future users of this system should consider the following extensions. First, users should verify a This research project was not funded by any agency in the public, commercial or not-for-profit sectors. BB/ADS conceived project idea and developed system. ADS designed redistribution algorithms. BB designed simulation environment, prediction models, optimizations, and was lead author on paper drafts. WJ provided medical background to paper's statistical assumptions, citations, and paper drafts. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months CDC. COVID-19 Databases and Journals There Aren't Enough Ventilators to Cope With the Coronavirus. The New York Times New York's Andrew Cuomo decries 'eBay'-style bidding war for ventilators. Guardian News and Media California ventilators en route to New York IHME COVID-19 health service utilization forecasting team. COVID-19 Estimate Downloads. Institute of Health Metrics and Evaluation What 5 Coronavirus Models Say the Next Month Will Look Like. The New York Times Learning as We Go: An Examination of the Statistical Accuracy of COVID19 Daily Death Count Predictions. The University of Sydney Excess Deaths Associated with COVID-19 Where Chronic Health Conditions and Coronavirus Could Collide. The New York Times Johns Hopkins Bloomberg School of Public Health -Center for Health Security Intubation and Ventilation amid the COVID-19 Outbreak A Novel Connectionist System for Unconstrained Handwriting Recognition Long Short-Term Memory The Holt-winters forecasting procedure Bayesian Q-learning. InAaai/iaai 26 Statsmodels Python API. Statsmodels v0.12.0 -Exponential Smoothing -Holt's Method The authors have no conflict of interest to declare. Underlying data available at: https://doi.org/10.5068/D1K39S