key: cord-0572028-lqy14qbj authors: Makhija, Yukti; Bhatia, Samarth; Singh, Shalendra; Jayaswal, Sneha Kumar; Malik, Prabhat Singh; Gupta, Pallavi; Samaga, Shreyas N.; Johri, Shreya; Venigalla, Sri Krishna; Hota, Rabi Narayan; Bhatia, Surinder Singh; Delhi, Ishaan Gupta Indian Institute of Technology; Pune, Armed forces Medical College; Delhi, All India Institute of Medical Sciences; Education, Indian institute of Science; Bhopal, Research; Delhi, DGAFMS office Ministry of Defence title: Challenges in the application of a mortality prediction model for COVID-19 patients on an Indian cohort date: 2021-01-15 journal: nan DOI: nan sha: 6bb6761296e3a287ae5c83926052de4dcb64400c doc_id: 572028 cord_uid: lqy14qbj Many countries are now experiencing the third wave of the COVID-19 pandemic straining the healthcare resources with an acute shortage of hospital beds and ventilators for the critically ill patients. This situation is especially worse in India with the second largest load of COVID-19 cases and a relatively resource-scarce medical infrastructure. Therefore, it becomes essential to triage the patients based on the severity of their disease and devote resources towards critically ill patients. Yan et al. 1 have published a very pertinent research that uses Machine learning (ML) methods to predict the outcome of COVID-19 patients based on their clinical parameters at the day of admission. They used the XGBoost algorithm, a type of ensemble model, to build the mortality prediction model. The final classifier is built through the sequential addition of multiple weak classifiers. The clinically operable decision rule was obtained from a 'single-tree XGBoost' and used lactic dehydrogenase (LDH), lymphocyte and high-sensitivity C-reactive protein (hs-CRP) values. This decision tree achieved a 100% survival prediction and 81% mortality prediction. However, these models have several technical challenges and do not provide an out of the box solution that can be deployed for other populations as has been reported in the"Matters Arising"section of Yan et al. Here, we show the limitations of this model by deploying it on one of the largest datasets of COVID-19 patients containing detailed clinical parameters collected from India. Many countries are now experiencing the third wave of the COVID-19 pandemic straining the healthcare resources with an acute shortage of hospital beds and ventilators for the critically ill patients. This situation is especially worse in India with the second largest load of COVID-19 cases and a relatively resource-scarce medical infrastructure. Therefore, it becomes essential to triage the patients based on the severity of their disease and devote resources towards critically ill patients. Yan et al. 1 have published a very pertinent research that uses Machine learning (ML) methods to predict the outcome of COVID-19 patients based on their clinical parameters at the day of admission. They used the XGBoost algorithm, a type of ensemble model, to build the mortality prediction model. The final classifier is built through the sequential addition of multiple weak classifiers. The clinically operable decision rule was obtained from a 'single-tree XGBoost' and used lactic dehydrogenase (LDH), lymphocyte and high-sensitivity C-reactive protein (hs-CRP) values . This decision tree achieved a 100% survival prediction and 81% mortality prediction. However, these models have several technical challenges and do not provide an out of the box solution that can be deployed for other populations as has been reported in the "Matters Arising" section of Yan et al. Here, we show the limitations of this model by deploying it on one of the largest datasets of COVID-19 patients containing detailed clinical parameters collected from India. Our dataset was collected as a part of a retrospective study that was conducted at two centers for COVID-19 in New Delhi, namely, Sardar Vallabhbhai Patel Covid Hospital and PM Cares Covid Care Hospital. Sardar Vallabhbhai Patel Covid hospital took the lead for this study and coordinated with the other centre for data collection and analysis. The study period was from 13th July 2020 to 14th October 2020. Every case with a confirmed infection who was admitted at any of the participating centres between this period, for treatment of COVID-19 infection was diagnosed by a Rapid Antigen Test or RT-PCR testing of a nasal/throat swab sample. Patients of all age groups were included in the study. Most of the data was retrieved from retrospectively maintained medical records. The clinical classification for COVID-19 severity was defined according to the Ministry of Health and Family Welfare definition (MoHFW, Government of India). All the hospitalized cases were followed up till discharge or death during COVID-19 illness. The primary criteria for discharge in mild/moderate hospitalized cases was the resolution of symptoms and a minimum stay of 10 days in the absence of follow-up RT-PCR negativity. Patients with severe infection were discharged only after clinical recovery and with a negative RT-PCR on repeat swab after resolution of symptoms. We tested how the model published by Yan et al performs on the above dataset of Indian patients. Although we collected data from 841 patients, most of these patients did not undergo all the clinical tests and, hence we did not have the clinical parameters used to sort patients based on the decision rule presented in the paper. Many of these parameters were not collected and this presented one of the biggest challenges in deploying ML-models in a resource-scarce environment such as India, where these models can be most useful in terms of managing patient load given poor doctor: patient ratio of 1:1456 2 . Only 120 patients (Table 1 ) had all the parameters measured necessary in the Yan et al study, out of which 95 (79.17 %) were discharged, and 25 (20.83%) were deceased. Of the 95 patients who had recovered, 23 (19.17%) patients experienced mild symptoms, 42 (35%) moderate, and 30 (25%) severe. The overall survival prediction accuracy was 65.26%, and the mortality prediction accuracy was 88%. Individual survival prediction in the case of mild, moderate, and severe patients was 91.3%, 73.81%, and 33.33%, respectively (Figure 1 ). Besides the fewer number of parameters measured from each patient, we uncovered a major challenge. As reported previously by Reeve et al. 3 , there are two kits used in practice to measure LDH levels based on conversion of lactate to pyruvate or vice versa. We used the later kit which has a reference range of values of 240-480 Ul −1 . While Yan et al most likely used the former kit with a reference range of values of 135-250 Ul −1 . Hence, we had to normalize the LDH values in our dataset as shown previously 4 . This again reiterates the previous concerns 3 about deploying machine learning models that may require the knowledge of reference range values of the biochemical tests performed and appropriate normalization between datasets. We conclude that on Indian patients the decision rule by Yan et al is a good predictor of mortality but underperforms in predicting survival of COVID-19 patients. Further, it particularly underperforms in predicting severity of infection which is the key prediction necessary for effective patient triage and resource allocation. Our results present an opposite trend from a similar replication study by Quanjel et on a Dutch cohort 5 , who report good performance of the decision rule to predict patient severity but not mortality. The discrepancy in replicating the results from Yan et al could possibly be explained by the differences in access to medical infrastructure between the population used for training the models, by demography (age, gender ratio distribution), or by the differences in reference ranges of the selected parameters due to population genetics. An observation further strengthened the conjecture regarding biases due to population genetics. The minor allele frequencies of rs7305678, an allele that is significantly associated with LDH serum levels 6 , were dramatically different in European, East Asian, and South Asian (Indian) populations at 0.48, 0.13, 0.21 respectively according to the 1000 Genomes Project 7, 8 . We propose that other yet genetic factors may also influence quantitative traits such as biochemical parameter values between different populations. These differences may also manifest as differences in patient mortality rates which are quite different between countries such as India has a mortality rate at about 1.5%, while China has a mortality rate at about 3.8% 9 . Therefore, our results suggest that machine learning models being developed to predict patient outcomes, especially those predicting large scale clinical manifestation in pandemics, need to take into account the biases in the collected feature values such as technical variability in biochemical parameters, population genetics, demography, and other socio-economic factors in healthcare. An interpretable mortality prediction model for COVID-19 patients Press Information Bureau. pib.gov.in/Pressreleaseshare.aspx? Consider laboratory aspects in developing patient prediction models Commutable calibrator with value assigned by the IFCC reference procedure to harmonize serum lactate dehydrogenase activity results measured by 2 different methods Replication of a mortality prediction model in Dutch patients with COVID-19 Common and rare variants associating with serum levels of creatine kinase and lactate dehydrogenase Introduction to 1000 Genomes Project and IGSR data resources We would like to acknowledge the nursing staff and medical professionals who have tirelessly worked to alleviate the suffering caused by this Pandemic and facilitated the collection of good quality data for this publication. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors have no conflict of interest.