key: cord-272727-a5ngjuyz authors: Bertsimas, D.; Boussioux, L.; Cory Wright, R.; Delarue, A.; Digalakis, V.; Jacquillat, A.; Lahlou Kitane, D.; Lukin, G.; Li, M. L.; Mingardi, L.; Nohadani, O.; Orfanoudaki, A.; Papalexopoulos, T.; Paskov, I.; Pauphilet, J.; Skali Lami, O.; Stellato, B.; Tazi Bouardi, H.; Villalobos Carballo, K.; Wiberg, H.; Zeng, C. title: From predictions to prescriptions: A data-drivenresponse to COVID-19 date: 2020-06-29 journal: nan DOI: 10.1101/2020.06.26.20141127 sha: doc_id: 272727 cord_uid: a5ngjuyz The COVID-19 pandemic has created unprecedented challenges worldwide. Strained healthcare providers make difficult decisions on patient triage, treatment and care management on a daily basis. Policy makers have imposed social distancing measures to slow the disease, at a steep economic price. We design analytical tools to support these decisions and combat the pandemic. Specifically, we propose a comprehensive data-driven approach to understand the clinical characteristics of COVID-19, predict its mortality, forecast its evolution, and ultimately alleviate its impact. By leveraging cohort-level clinical data, patient-level hospital data, and census-level epidemiological data, we develop an integrated four-step approach, combining descriptive, predictive and prescriptive analytics. First, we aggregate hundreds of clinical studies into the most comprehensive database on COVID-19 to paint a new macroscopic picture of the disease. Second, we build personalized calculators to predict the risk of infection and mortality as a function of demographics, symptoms, comorbidities, and lab values. Third, we develop a novel epidemiological model to project the pandemic's spread and inform social distancing policies. Fourth, we propose an optimization model to reallocate ventilators and alleviate shortages. Our results have been used at the clinical level by several hospitals to triage patients, guide care management, plan ICU capacity, and re-distribute ventilators. At the policy level, they are currently supporting safe back-to-work policies at a major institution and equitable vaccine distribution planning at a major pharmaceutical company, and have been integrated into the US Center for Disease Control's pandemic forecast. America is applying our infection risk calculator to determine 97 how employees can safely return to work. A major hospital 98 system in the United States planned its intensive care unit 99 (ICU) capacity based on our forecasts, and leveraged our opti-100 mization results to allocate ventilators across hospitals when 101 the number of cases was rising. Our epidemiological predic-102 tions are used by a major pharmaceutical company to design 103 a vaccine distribution strategy that can contain future phases 104 of the pandemic. They have also been incorporated into the 105 US Center for Disease Control's forecasts (7). Early responses to the COVID-19 pandemic have been in-108 hibited by the lack of available data on patient outcomes. 109 Individual centers released reports summarizing patient char-110 acteristics. Yet, this decentralized e ort makes it di cult to 111 construct a cohesive picture of the pandemic. To address this problem, we construct a database that ag-113 gregates demographics, comorbidities, symptoms, laboratory 114 blood test results ("lab values", henceforth) and clinical out-115 comes from 160 clinical studies released between December 116 2019 and May 2020-made available on our website for broader 117 use. The database contains information on 133,600 COVID-19 118 patients (3.13% of the global COVID-19 patients as of May 119 12, 2020), spanning mainly Europe (81, 207 patients), Asia 120 (19, 418 patients) and North America (23, 279 patients). To 121 our knowledge, this is the largest dataset on COVID-19. A. Data Aggregation. Each study was read by an MIT re-123 searcher, who transcribed numerical data from the manuscript. 124 The appendix reports the main transcription assumptions. Each row in the database corresponds to a cohort of 126 patients-some papers study a single cohort, whereas oth-127 ers study several cohorts or sub-cohorts. Each column reports 128 cohort-level statistics on demographics (e.g., average age, gen-129 der breakdown), comorbidities (e.g., prevalence of diabetes, 130 hypertension), symptoms (e.g., prevalence of fever, cough), 131 treatments (e.g., prevalence of antibiotics, intubation), lab 132 values (e.g., average lymphocyte count), and clinical outcomes 133 (e.g., average hospital length of stay, mortality rate). We also 134 track whether the cohort comprises "mild" or "severe" patients 135 (mild and severe cohorts are only a subset of the data). Due to the pandemic's urgency, many papers were published 137 before all patients in a cohort were discharged or deceased. Ac-138 cordingly, we estimate the mortality rate from discharged and 139 deceased patients only (referred to as "Projected Mortality"). Using a similar nomenclature, Figure 2A D. Discussion and Impact. Our database is the largest avail-184 able source of clinical information on COVID-19 assembled 185 to date. As such, it provides new insights on common symp-186 toms and the drivers of the disease's severity. Ultimately, this 187 database can support guidelines from health organizations, 188 and contribute to ongoing clinical research on the disease. Another benefit of this database is its geographical reach. 190 Results highlight disparities in patients' symptoms across 191 regions. These disparities may stem from (i) di erent reporting 192 criteria; (ii) di erent treatments; (iii) disparate impacts across 193 di erent ethnic groups; and (iv) mutations of the virus since 194 it first appeared in China. This information contributes to 195 early evidence on COVID-19 mutations (14, 15) and on its 196 disparate e ects on di erent ethnic groups (16, 17). The insights derived from this descriptive analysis highlight 203 the need for personalized data-driven clinical indicators. Yet, 204 our population-level database cannot be leveraged directly 205 to support decision-making at the patient level. We have 206 therefore initiated a multi-institution collaboration to collect 207 electronic medical records from COVID-19 patients and de-208 velop clinical risk calculators. These calculators, presented in 209 the next section, are informed by several of our descriptive 210 insights. Notably, the disparities between severe patients and 211 the rest of the patient population inform the choice of the fea-212 tures included in our mortality risk calculator. Moreover, the 213 geographic disparities suggest that data from Asia may be less 214 predictive when building infection or mortality risk calculators 215 designed for patients in Europe or North America-motivating 216 our use of data from Europe. Throughout the COVID-19 crisis, physicians have made dif-219 ficult triage and care management decisions on a daily basis. 220 Oftentimes, these decisions could only rely on small-scale 221 clinical tests, each requiring significant time, personnel and 222 equipment and thus cannot be easily replicated. Once the 223 burden on "hot spots" has ebbed, hospitals began to aggregate 224 rich data on COVID-19 patients. This data o ers opportu-225 nities to develop algorithmic risk calculators for large-scale 226 decision support-ultimately facilitating a more proactive and 227 data-driven strategy to combat the disease globally. 228 We have established a patient-level database of thousands of 229 COVID-19 hospital admissions. Using state-of-the-art machine 230 learning methods, we develop a mortality risk calculator and an 231 infection risk calculator. Together, these two risk assessments 232 provide screening tools to support critical care management 233 decisions, spanning patient triage, hospital admissions, bed 234 assignment and testing prioritization. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 29, 2020. . Table 1 . Count and prevalence of symptoms among COVID-19 patients, in aggregate, broken down into mild/severe patients, and broken down per continent (Asia, Europe, North America). Mild and severe patients only form a subset of the data, and so do patients from Asia, Europe and North America. A "-" indicates that fewer than 100 patients in a subpopulation reported on this symptom. All patients discriminative ability of the proposed models. We report in 286 the appendix average results across all random data partitions. 287 We also report in the appendix threshold-based metrics, 308 † HM Hospitals patients were not included since no negative case data was available. C. Discussion and Impact. The models with lab values provide 309 algorithmic screening tools that can deliver COVID-19 risk 310 predictions using common clinical features. In a constrained 311 healthcare system or in a clinic without access to advanced 312 diagnostics, clinicians can use these models to rapidly identify 313 high-risk patients to support triage and treatment decisions. The models without lab values o er an even simpler tool 315 that could be used outside of a clinical setting. In strained 316 healthcare systems, it can be di cult for patients to obtain 317 direct advice from providers. Our tool could serve as a pre-318 screening step to identify personalized infection risk-without 319 visiting a testing facility. While the exclusion of lab values 320 reduces the AUC (especially for infection), these calculators 321 still achieve strong predictive performance. Our models provide insights into risk factors and biomark-323 ers related to COVID-19 infection and mortality. Our results 324 suggest that the main indicators of mortality risk are age, 325 BUN, CRP, AST, and low oxygen saturation. These findings 326 validate several population-level insights from Section 1 and 327 are in agreement with clinical studies: prevalence of shortness 328 of breath (23), elevated levels of CRP as an inflammatory 329 marker (24, 25), and elevated AST levels due to liver dysfunc-330 tion in severe COVID-19 cases (11, 26). Turning to infection risk, the main indicators are CRP, 332 Leukocytes, Calcium, AST, and temperature. These findings 333 are also in agreement with clinical reports: an elevated CRP 334 generally indicates an early sign of infection and implies lung 335 lesions from COVID-19 (27), elevated levels of leukocytes 336 suggest cytokine release syndrome caused by SARS-CoV-2 337 virus (28), and lowered levels of serum calcium signal higher 338 rate of organ injury and septic shock (29) . Since our findings 339 agree with clinical observations, our calculators can be used 340 to support clinical decision making-although they are not 341 intended to substitute clinical diagnostic or medical expertise. 342 When lab values are not available, the widely accepted 343 risk factors of age, oxygen saturation, temperature, and heart 344 rate become the key indicators for both risk calculators. We 345 observe that mortality risk is higher for male patients (blue in 346 Figure 3B ) than for female patients (red), confirming clinical 347 reports (30, 31). An elevated respiratory frequency becomes 348 an important predictor of infection, as reported in (32). These 349 findings suggest that demographics and vitals provide valuable 350 information in the absence of lab values. However, when lab 351 values are available, these other features become secondary. A limitation of the current mortality model is that it does 353 not take into account medication and treatments during hos-354 pitalization. We intend to incorporate these in future research 355 to make these models more actionable. Furthermore, these 356 models aim to reveal associations between risks and patient 357 characteristics but are not designed to establish causality. Overall, we have developed data-driven calculators that 359 allow physicians and patients to assess mortality and infection 360 risks in order to guide care management-especially with 361 scarce healthcare resources. These calculators are being used 362 by several hospitals within the ASST Cremona system to 363 support triage and treatment decisions-alleviating the toll of 364 the pandemic. Our infection calculator also supports safety 365 protocols for Banco de Credito del Peru, the largest bank in 366 Peru, to determine how employees can return to work. The inverse tangent function provides a concave-convex re-404 lationship, capturing three phases of government response. 405 In Phase I, most activities continue normally as people 406 adjust their behavior. In Phase II, the infection rate 407 declines sharply as policies are implemented. In Phase 408 III, the decline in the infection rate reaches saturation. 409 The parameters t0 and k can be respectively thought of 410 as the start date and the strength of the response. Ultimately, DELPHI involves 13 parameters that define 412 the transition rates between the 11 states. We calibrate six of 413 them from our clinical outcomes database (Section 1). Using 414 of the pandemic (7). It has also been used by the Hartford HealthCare system-the major hospital system in Connecticut, 472 US-to plan its ICU capacity, and by a major pharmaceutical 473 company to design a vaccine distribution strategy that can 474 most e ectively contain the next phases of the pandemic. B. DELPHI-presc: Toward Re-opening Society. To inform the 476 relaxation of social distancing policies, we link policies to the 477 infection rate using machine learning. Specifically, we predict 478 the values of "(t), obtained from the fitting procedure of 479 DELPHI-pred. For simplicity and interpretability, we consider 480 a simple model based on regression trees (35) and restrict the 481 independent variables to the policies in place. We classify 482 policies based on whether they restrict mass gatherings, school 483 and/or other activities (referred to as "Others", and including 484 business closures, severe travel limitations and/or closing of 485 non-essential services). We define a set of seven mutually 486 exclusive and collectively exhaustive policies observed in the 487 US data: (i) No measure; (ii) Restrict mass gatherings; (iii) 488 Restrict others; (iv) Authorize schools, restrict mass gatherings 489 and others; (v) Restrict mass gatherings and schools; (vi) 490 Restrict mass gatherings, schools and others; and (vii) Stay-491 at-home. 492 We report the regression tree in the appendix, obtained 493 from state-level data in the United States. This model achieves 494 an out-of-sample R 2 of 0.8, suggesting a good fit to the data. 495 As expected, more stringent policies lead to lower values of 496 "(t). The results also provide comparisons between various 497 policies-for instance, school closures seem to induce a stronger 498 reduction in the infection rate than restricting "other" activ-499 ities. More importantly, the model quantifies the impact of 500 each policy on the infection rate. We then use these results 501 to predict the value of "(t) as a function of the policies (see 502 appendix for details), and simulate the spread of the disease 503 as states progressively loosen social distancing policies. 504 Figure 4D plots the projected case count in the State of New 505 York (NY), for di erent policies (we report a similar plot for 506 the death count in the appendix). Note that the stringency of 507 the policies has a significant impact on the pandemic's spread 508 and ultimate toll. For instance, relaxing all social distancing 509 policies on May 12 can increase the cumulative number of 510 cases in NY by up to 25% by September. Using a similar nomenclature, Figure 4E shows the case 512 count if all social distancing policies are relaxed on May 12 vs. 513 May 26. Note that the timing of the policies also has a strong 514 impact: a two-week delay in re-opening society can greatly 515 reduce a resurgence in NY. The road back to a new normal is not straightforward: 517 results suggest that the disease's spread is highly sensitive to 518 both the intensity and the timing of social distancing policies. 519 As governments grapple with an evolving pandemic, DELPHI-520 presc can be a useful tool to explore alternative scenarios and 521 ensure that critical decisions are supported with data. We model ventilator pooling as a multi-period resource 565 allocation over S states and D days. The model takes as input 566 ventilator demand in state s and day d, denoted as v s,d , as 567 well as parameters capturing the surge supply from the federal 568 government and the extent of inter-state collaboration. We 569 formulate an optimization problem that decides on the number 570 of ventilators transferred from state s to state s Õ on day d, 571 and on the number of ventilators allocated from the federal 572 government to state s on day d. We propose a bi-objective 573 formulation. The first objective is to minimize ventilator-day 574 shortages; for robustness, we consider both projected shortages 575 (based on demand forecasts) and worst-case shortages (includ-576 ing a bu er in the demand estimates). The second objective 577 is to minimize inter-state transfers, to limit the operational 578 and political costs of inter-state coordination. Mixed-integer 579 optimization provides modeling flexibility to capture spatial-580 temporal dynamics and the trade-o s between these various 581 objectives. We report the mathematical formulation of the 582 model, along with the key assumptions, in the appendix. put on a ventilator, which we use to estimate the demand for 589 ventilators. We also obtain the average length of stay from 590 our clinical outcomes database (Figure 2 ). We discuss these trade-o s further in the appendix. A similar model has been developed to support the re-626 distribution of ventilators across hospitals within the Hartford 627 HealthCare system in Connecticut-using county-level fore-628 casts of ventilator demand obtained from DELPHI-pred. This 629 model has been used by a collection of hospitals in the United 630 States to align ventilator supply with projected demand at a 631 time where the pandemic was on the rise. Looking ahead, the proposed model can support the alloca-633 tion of critical resources in the next phases of the pandemic-634 spanning ventilators, medicines, personal protective equipment 635 etc. Since epidemics do not peak in each state at the same 636 time, states whose infection peak has already passed or lies 637 weeks ahead can help other states facing immediate shortages 638 at little costs to their constituents. Inter-state transfers of 639 ventilators occurred in isolated fashion through April 2020; 640 our model proposes an automated decision-making tool to 641 support these decisions systematically. As our results show, 642 proactive coordination and resource pooling can significantly 643 reduce shortages-thus increasing the number of patients that 644 can be treated without resorting to extreme clinical recourse 645 with side e ects (such as splitting ventilators). This paper proposes a comprehensive data-driven approach to 648 address several core challenges faced by healthcare providers 649 and policy makers in the midst of the COVID-19 pandemic. 650 We have gathered and aggregated data from hundreds of clini-651 Dimitris Bertsimas et al. PNAS | May 26, 2020 | vol. XXX | no. XX | 9 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 29, 2020. . cal studies, electronic health records, and census reports. We Sophia Xing and Cynthia Zheng from 672 our extended team for helpful discussions Persistence of coronaviruses on inanimate sur-674 faces and its inactivation with biocidal agents High Contagiousness and Rapid Spread of Severe Acute Respiratory 676 How will country-based 680 mitigation measures influence the course of the covid-19 epidemic? Economic effects of coronavirus outbreak (covid-19) on the world economy. 683 Available at SSRN The global macroeconomic impacts of covid-19: Seven scenarios. 685 CAMA Work COVID-19 Forecasts Check if you have Coronavirus symptoms 694 Clinical characteristics of coronavirus disease 2019 in china Clinical characteristics of Covid-19 in Factors associated with hospitalization and critical illness among 4,103 701 patients with covid-19 disease in new york city Phylogenetic network analysis of SARS-CoV-2 703 genomes An 81 nucleotide deletion in SARS-CoV-2 ORF7a identified from sentinel 705 surveillance in Arizona Hospitalization rates and characteristics of patients hospitalized with laboratory-707 confirmed coronavirus disease 2019-covid-net, 14 states Racial and ethnic disparities in sars-cov-2 pandemic: Analysis of a covid-19 710 observational registry for a diverse us metropolitan population Positive RT-PCR test results in patients recovered from COVID-19 Missing value estimation methods for dna microarrays Xgboost: A scalable tree boosting system in Proceedings of the 22nd 716 acm sigkdd international conference on knowledge discovery and data mining A unified approach to interpreting model predictions in Advances in 719 Neural Information Processing Systems 30 From local explanations to global understanding with explainable AI for 722 trees Unique epidemiological and clinical features of the emerg-724 ing 2019 novel coronavirus pneumonia (covid-19) implicate special control measures The covid-19 epidemic Chest ct features of covid-19 in rome, italy Clinical features of patients infected with 2019 novel coronavirus in wuhan, 730 china C-reactive protein levels in the early stage of COVID-19 COVID-19 infection: the perspectives on immune responses Serum calcium as a biomarker of clinical severity and prognosis in patients with 735 coronavirus disease 2019: a retrospective cross-sectional study Covid-19: risk factors for severe disease and death Neutrophil-to-lymphocyte ratio as an independent risk factor for mortality in 738 hospitalized patients with covid-19 Clinical course and risk factors for mortality of adult inpatients with covid-19 in 740 china: a retrospective cohort study. The Lancet Projecting the transmission dy-742 namics of sars-cov-2 through the postpandemic period Classification and regression trees Effects of prone positioning on lung protection in patients with acute 748 The standard of care of patients with ARDS: ventilatory settings and rescue 751 therapies for refractory hypoxemia Intubation and Ventilation amid the COVID-19 OutbreakWuhan's Experience Critical supply shortages-the need for ventilators and per-755 sonal protective equipment during the Covid-19 pandemic Stockpiling ventilators for influenza pandemics A model of supply-chain decisions for 759 resource sharing with an application to ventilator allocation to combat COVID-19