key: cord-0755557-kvwgxosv authors: Fernandes, F. T.; Oliveira, T. A. d.; Teixeira, C. E.; Batista, A. F. d. M.; Costa, G. D.; Chiavegatto Filho, A. title: A multipurpose machine learning approach to predict COVID-19 negative prognosis in Sao Paulo, Brazil date: 2020-09-01 journal: nan DOI: 10.1101/2020.08.26.20182584 sha: b935cf441b3c29f95f980dadbe244c6968bcd465 doc_id: 755557 cord_uid: kvwgxosv Introduction The new coronavirus disease (COVID-19) is a challenge for clinical decision-making and the effective allocation of healthcare resources. An accurate prognostic assessment is necessary to improve survival of patients, especially in developing countries. This study proposes to predict the risk of developing critical conditions in COVID-19 patients by training multipurpose algorithms. Methods A total of 1,040 patients with a positive RT-PCR diagnosis for COVID-19 from a large hospital from Sao Paulo, Brazil, were followed from March to June 2020, of which 288 (28%) presented a severe prognosis, i.e. Intensive Care Unit (ICU) admission, use of mechanical ventilation or death. Routinely-collected laboratory, clinical and demographic data was used to train five machine learning algorithms (artificial neural networks, extra trees, random forests, catboost, and extreme gradient boosting). A random sample of 70% of patients was used to train the algorithms and 30% were left for performance assessment, simulating new unseen data. In order to assess if the algorithms could capture general severe prognostic patterns, each model was trained by combining two out of three outcomes to predict the other. Results All algorithms presented very high predictive performance (average AUROC of 0.92, sensitivity of 0.92, and specificity of 0.82). The three most important variables for the multipurpose algorithms were ratio of lymphocyte per C-reactive protein, C-reactive protein and Braden Scale. Conclusion The results highlight the possibility that machine learning algorithms are able to predict unspecific negative COVID-19 outcomes from routinely-collected data. The COVID-19 epidemic that first appeared in Wuhan, China, has rapidly spread worldwide and still continues to affect most countries, through late initial infections 1,2 and second waves 3, 4 . Currently, there have been more than 14 million cases and five hundred and ninety thousand confirmed deaths 5 . In critical patients, the disease has been shown to rapidly worsen a few days after infection 6, 7 requiring immediate clinical decisions, especially in developing countries with limited resources 8, 9 . An accurate COVID-19 prognosis assessment is crucial for screening and treatment procedures and may increase patient survival 10, 11 . The consequences of COVID-19 have been disastrous for health systems in middle and low-income countries (LMICs) 12 , especially in Brazil 13 . The lack of established knowledge about the disease has made it difficult to identify risk criteria to support clinical conducts and to allocate human and physical resources in health facilities and hospitals 14 . Currently, many Brazilian cities are at their saturation capacity for the provision of clinical care, especially regarding ICU beds and mechanical ventilators 15, 16 . Previous studies have used blood tests 17 , CT images 18, 19 , sociodemographic and comorbidities history 20 to develop COVID-19 diagnostic and prognostic models, including machine learning techniques [21] [22] [23] . Biomarkers from blood tests have emerged as important variables for poor prognostic factors 24 , which are a promising tool in poorer regions, due to its low cost and inclusion in standard protocols for clinical care. However, the majority of studies 25 rely on algorithms trained on a single prognostic outcome, which in theory require the training of specific algorithms for each distinct negative outcome. This study proposes to develop multipurpose machine learning algorithms to analyze if it is possible to predict overall poor prognosis for COVID-19 patients. We aim to test if the algorithms can generalize risk patterns for severe conditions, so they can be . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint used as tools to assist in the prognosis of distinct negative outcomes for COVID-19 patients. Individual patient data was collected from electronic medical records. We included as predictors only variables collected in early hospital admission, i.e. within 24 hours before and 24 hours after the RT-PCR exam. A total of 57 routinely-collected variables were used for the development of the predictive models, including demographic data, laboratory tests and vital signs (the complete list is described in Supplementary Table 1 ). Figure 1 illustrates the overall process. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint Five of the most popular machine learning models for structured data (artificial neural networks 27 , extra trees 28 , random forests 29 , catboost 30 , and extreme gradient boosting 31 ) were trained with 70% of the data, and tested in the other 30%, simulating new unknown data. All the results reported in this study are from the test set. K-fold cross-validation with 10 folds was used to adjust the hyperparameters with Bayesian optimization (HyperOpt). Due to the unbalanced nature of the outcomes, random undersampling was performed in the training set, by randomly selecting examples from the majority class for exclusion. This technique was implemented using the RandomUnderSampler imbalanced-learn class 32 . Variables with more than two categories were represented by a set of dummy variables, with one variable for each category. Continuous variables were standardized using the z-score. Variables with a correlation greater than 0.90 (mean arterial pressure, is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint total bilirubin, and creatine kinase) were discarded, and missing values were imputed by the median. To assess the performance of the models, measures such as accuracy, sensitivity (also known as recall), specificity, positive predictive value (PPV) (also known as precision), negative predictive value (NPV), and F1 score were analyzed. The value of the AUROC was used to select the best model. To understand the individual contribution of each variable to the predictive models, we calculated their respective Shapley values. All the analyzes were performed using the Python programming language with the scikit-learn library. Patients and the public were not directly involved in the design and conduct of this research. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint We analyzed the predictive performance of the algorithms for three negative prognostic outcomes: ICU admission (n=263, 25.5%), mechanical ventilation (MV) intubation (n=106, 10.2%) and death (n=92, 9.4%). We first tested the predictive performance of the machine learning algorithms for a specific individual outcome (e.g. death). We then used the other two outcomes (in this specific example, mechanical ventilation and ICU admission) to train another model, and then tested this model performance to predict the previous outcome (death). We then compared the performance of the two strategies using the 95% confidence interval of the area under the receiver operating characteristic curve (AUROC). . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint Table 2 shows the results of the models trained with the aggregated outcomes and the models with a single outcome. Every model, even the ones trained with different outcomes, presented high predictive performance, always with an AUROC over 0.91 in the test set. The individual models were overall better, but the difference between the aggregated and individual models were all within the 95% confidence intervals. Supplementary Figure 1 shows the AUROC for each model. The sensitivity and specificity of the machine learning algorithms were also very high, in most cases over 0.8, with an average sensitivity of 0.92 and specificity of 0.82. In Supplementary Table 2 we present the final hyperparameters for each model. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint The Braden score played an important role in the aggregated outcome algorithms, ranking as the most important predictor in two of three models. Also, the Creactive protein and ratio of lymphocytes per C-reactive protein proved to be good predictors, appearing in the top 5 in all three models. Urea, age, creatinine, and arterial lactate were important for only one of the aggregated models. We found that machine learning algorithms were able to predict with high overall performance negative prognostic outcomes for COVID-19, even when the specific outcome was not included in the training of the algorithms. All models presented an AUROC higher than 0.91 (average of 0.92) in the test set, with high sensitivity and specificity (average of 0.92 and 0.82, respectively). The results highlight the possibility that high-performance machine learning algorithms are able to predict unspecific negative COVID-19 outcomes using routinely-collected data. Brazil is currently the second country in the world in total number of cases and deaths from COVID-19 33 . There is a growing demand in Brazil, and in many other developing countries, for decision support in the allocation of scarce hospital resources, especially in relation to the availability of ICU beds and mechanical ventilators 34, 35 . From a clinical standard, knowledge about immediate risks of negative prognosis can also contribute to the early start of preventive measures and new interventions, and thereby increase patient survival 10, 11 . The results of this study highlight that it is possible to predict with high performance the risk of a negative prognosis in patients with COVID-19. Additionally, we found that it is also possible to predict the risk of negative outcomes well even when they were not used for training of the model. This result is promising for the prediction of specific outcomes that are not possible to be collected for model training, either due to is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. For every outcome, variable importance analysis identified that age, C-reactive protein (CRP), creatinine, urea and the Braden Scale were usually among the most important. While the age of the patient is widely found to be an important predictor for most negative health outcomes, CRP has been increasingly included among the main inflammatory biomarkers for the prognosis of cardiovascular 36 and respiratory diseases 37 . High levels of CRP have been also previously associated with individual severity of SARS-CoV-2 38, 39 . Interestingly, previous studies have also identified that chronic kidney disease is associated with developing severe conditions in COVID-19 patients [40] [41] [42] , where it has been observed that patients with higher levels of creatinine and urea are more at risk 43 . The Braden Scale is often used as a predictor for pressure ulcers, a common clinical classification scale for predicting pneumonia 44 during clinical reception, and in this study, it was an important predictor for negative prognosis in COVID-19 patients. The scale has a score between 1 (worst score) and 4 (best score) where the factors included are sensory perception, skin moisture, activity, mobility, nutritional status and friction 45 . The percentage of lymphocytes in the blood has been described as a strong predictor of prognosis for the severity of the new coronavirus. The randomized study by Lin Tan et al. 46 suggests that, in most confirmed cases, the . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint 13 percentage of lymphocytes was reduced to 5% in two weeks after the onset of COVID-19, in line with other studies findings 47 . The study has a few limitations that need to be mentioned. First, some of the outcomes overlap which may have helped the performance of the aggregated models, even though in the majority of cases the outcomes were independent. In the case of ICU admission, 55% of the patients did not die or used MV, while in the case of MV and death, 63% and 70% of their respective aggregated model was trained on other outcomes. Ideally, the outcomes would never overlap, but this is clinically unfeasible given the interlaced nature of negative prognostic outcomes. Another limitation is that we analyzed data from an urban COVID-19 hotspot in Brazil, in a period where clinical protocols for the disease were still being established, so this could affect the incidence of prognostic outcomes and may not directly generalize to other periods. In conclusion, we found that machine learning algorithms can predict severe outcomes in COVID-19 patients with high performance, including previously unobserved outcomes, using only routinely-collected laboratory, clinical and demographic data. The use of multipurpose algorithms for the prediction of overall negative prognosis is a promising new area that can support doctors with clinical and administrative decisions, especially regarding priorities for hospital admission and monitoring. The data comes from medical records from BP -A Beneficência Portuguesa de São Paulo Hospital in Brazil and it is not publicly available as it contains sensitive information of patients. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. All the code written to process and analyze the data can be made available upon request to the corresponding author. Correspondence should be adressed to Fernando T. Fernandes. The authors declare no competing interests. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint COVID-19 resurgence in Iran Recurrence of positive SARS -CoV-2 viral RNA in recovered COVID -19 patients during medical isolation observation transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: a modelling impact assessment COVID-19: Are we ready for the second wave Coronavirus disease (COVID-19) Situation Report -181 Using Machine Learning to Predict ICU Transfer in Hospitalized COVID-19 Patients COVID-19: immunopathology and its implications for therapy The COVID-19 Pandemic: Effects on Low-and Middle-Income Countries Evolution and epidemic spread of SARS-CoV-2 in Brazil Ethnic and regional variations in hospital mortality from COVID-19 in Brazil : a cross-sectional observational study The COVID-19 pandemic in Brazil: Analysis of supply and demand of hospital and ICU beds and mechanical ventilators under different scenarios Demand for hospitalization services for COVID-19 patients in Brazil D-dimer levels on admission to predict in-hospital mortality in patients with Covid-19 A predictive model and scoring system combining clinical and CT characteristics for the diagnosis of COVID-19 A Fully Automatic Deep Learning System for COVID-19 Diagnostic and Prognostic Analysis Building a COVID-19 Vulnerability Index An interpretable mortality prediction model for COVID-19 patients COVID-19 diagnosis prediction in emergency care patients : a machine learning approach Early risk assessment for COVID-19 patients from emergency department data using machine learning Hematological findings and complications of COVID-19 Prediction models for diagnosis and prognosis of COVID-19 infection: Systematic review and critical appraisal Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration Neural Networks for Pattern Recognition Extremely randomized trees Random forests CatBoost: gradient boosting with categorical features support Imbalanced learning: foundations, algorithms, and applications Coronavirus disease (COVID-19) Situation Report -192 Fair allocation of scarce medical resources during COVID-19 pandemic: ethical considerations Respiratory Support in COVID-19 Patients, with a Focus on Resource-Limited Settings Impaired cardiac function is associated with mortality in patients with acute COVID-19 infection Plasma C-reactive protein levels are associated with improved outcome in ARDS Plasma CRP level is positively associated with the severity of COVID-19 C-Reactive Protein Level May Predict the Risk of COVID The role of biomarkers in diagnosis of COVID-19 -A systematic review We would like to thank the BP -A Beneficência Portuguesa de São Paulo Hospital for its willingness to contribute to the research. It is made available under a perpetuity.is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted September 1, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.26.20182584 doi: medRxiv preprint