key: cord-0874773-h3y10tow
authors: Alle, S.; Siddiqui, S.; Kanakan, A.; Garg, A.; Karthikeyan, A.; Mishra, N.; Waghdhare, S.; Tyagi, A.; Tarai, B.; Hazarika, P. P.; Das, P.; Budhiraja, S.; Nangia, V.; Dewan, A.; Sethuraman, R.; Subramanian, C.; Srivastava, M.; Chakravarthi, A.; Jacob, J.; Namagiri, M.; Konala, V.; Dash, D.; Jha, S.; Pandey, R.; Agrawal, A.; Vinod, P. K.; Priyakumar, U. D.
title: COVID-19 Risk Stratification and Mortality Prediction in Hospitalized Indian Patients
date: 2020-12-22
journal: nan
DOI: 10.1101/2020.12.19.20248524
sha: bded0862b1a190a23f5e4beb9eb3f3c5db55d69d
doc_id: 874773
cord_uid: h3y10tow

The clinical course of coronavirus disease 2019 (COVID-19) infection is highly variable with the vast majority recovering uneventfully but a small fraction progressing to severe disease and death. Appropriate and timely supportive care can reduce mortality and it is critical to evolve better patient risk stratification based on simple clinical data, so as to perform effective triage during strains on the healthcare infrastructure. This study presents risk stratification and mortality prediction models based on usual clinical data from 544 COVID-19 patients from New Delhi, India using machine learning methods. A Random Forest classifier yielded the best performance on risk stratification (F1 score of 0.81). A logistic regression model yielded the best performance on mortality prediction (F1 score of 0.71). Significant biomarkers for predicting risk and mortality were identified. Examination of the data in comparison to a similar dataset with a Wuhan cohort of 375 patients was undertaken to understand the much lower mortality rates in India and the possible reasons thereof. The comparison indicated higher survival rate in the Delhi cohort even when patients had similar parameters as the Wuhan patients who died. Steroid administration was very frequent in Delhi patients, especially in surviving patients whose biomarkers indicated severe disease. This study helps in identifying the high-risk patient population and suggests treatment protocols that may be useful in countries with high mortality rates.

The World Health Organization (WHO) declared the outbreak of coronavirus 2 disease 2019 (COVID-19) as a global health emergency of international concern. 3 Originating in Wuhan, China, the disease has spread to the rest of the world. 4 As of December 17, 2020 , WHO states that about 10 million confirmed cases of 5 COVID-19 have been detected in India alone, making the number of cases in the 6 country 15% of the total cases worldwide and the second largest affected nation 7 after the United States. Due to the sudden spike in the number of cases, healthcare 8 systems across the world including India's are under tremendous pressure for 9 making tough decisions in resource allocation among affected patients. Early 10 risk stratification through identification of key biomarkers is essential because it 11 helps in the understanding of the relative severity among infected patients and 12 hence guides decisions in the scare medical resource setting. 13 COVID-19 is a highly contagious respiratory infection with symptoms that 14 include fever, dry cough, nasal congestion and breathing difficulties [1, 2] . In 15 more severe cases, it can cause pneumonia, severe acute respiratory syndrome, 16 cardiac arrest, sepsis, kidney failure and death [3, 4] . WHO classifies the risk 17 into the following categories: critical, severe, and moderate/mild. By definition, 18 critical patients require ventilation, severe patients require supplemental oxygen, 19 moderate patients have pneumonia but do not require oxygen, and mild patients 20 only have upper respiratory tract disease. The cause of death is generally 21 respiratory failure, but few deaths have been caused by multiple organ failure 22 (MOF) or chronic comorbidities [2, 5] . Those at a higher risk are the elderly and 23 people with comorbidities, such as cardiovascular diseases and diabetes [6, 7] . 24 However, symptoms at onset are relatively mild and a significant proportion of 25 patients do not show apparent symptoms prior to the development of respiratory 26 failure [2, 5] . Clinically, this makes it difficult to predict the progression of 27 severity in patients until respiratory failure develops. Early risk prediction and 28 effective treatment can reduce mortality and morbidity as well as relieve resource 29 shortages [8] . Artificial intelligence based solutions may help in clinical decision-30 making by providing predictions that are accurate, fast, and interpretable. Recent 31 studies have used various machine learning algorithms for analysing COVID-19 32 patients' clinical data and providing disease prognosis [9, 10] . Hao et al. [11] 33 examined COVID-19 patients admitted in Massachusetts to predict level-of-care 34 requirements based on clinical and laboratory data. They compared machine 35 learning algorithms (such as XGBoost, Random Forests, SVM, and Logistic 36 Regression) and predicted the need for hospitalization, ICU care, and mechanical 37 ventilation. The most effective features for hospitalization were vital signs, age, 38 BMI, dyspnea, and comorbidities. Opacities on chest imaging, age, admission 39 vital signs and symptoms, male gender, admission laboratory results, and diabetes 40 were the most effective risk factors for ICU admission and mechanical ventilation. 41 Xie et al. [12] used multivariable logistic regression for the classification task 42 through identifying SpO2, Lymphocyte Count, Age and Lactate dehydrogenase 43 (LDH) as the set of important features. A nomogram was created based on these 44 features to deliver the probability of mortality. Ji et al. [13] built a scoring model, 45 named as CALL, for prediction of progression risk in COVID-19 patients from 46 Chinese hospitals. They used Multivariate Cox regression to identify risk factors 47 associated with progression, which were then incorporated into a nomogram for 48 establishing a prediction scoring model. Comorbidity, older age, lower lymphocyte 49 count, and higher lactate dehydrogenase were found to be independent high-risk 50 factors for COVID-19 progression. Yan et al. proposed an interpretable mortality 51 prediction model for COVID-19 patients [14] . They analysed blood samples of 52 485 patients from Wuhan, China, and created a clinically operable single tree 53 through XGBoost. The model used three crucial features Lactate Dehydrogenase 54 (LDH), lymphocyte (%) and C-Reactive Protein (CRP). The decision rules with 55 the three features and their thresholds were devised recursively. This provided an 56 interpretable machine learning solution with at least 90% accuracy. Karthikeyan 57 et al. [15] analysed the same dataset through comparing various machine learning 58 algorithms. XGBoost feature selection and neural network classification yielded 59 the best performance with the important biomarkers selected as neutrophil (%), 60 lymphocyte (%), LDH, CRP and age. However, no detailed studies on risk 61 stratification have not been done on Indian cohorts.

Most machine learning based risk stratification and mortality prediction al-63 gorithms analysed patients from China or the United States of America. Studies 64 have suggested that the virus has different strains around the globe due to muta-65 tions [16] [17] [18] [19] [20] . Moreover, the physiologic response to the virus and the eventual 66 course of disease depends on regional factors such as population characteristics 67 and hospital practices. Hence, the studies are not universally applicable and it 68 is critical to examine cohorts from India to aid the Indian healthcare systems. 69 In this study, patients with confirmed COVID-19 infection from MAX group 70 of hospitals in New Delhi, India were examined to identify the key features 71 affecting severity and mortality. The machine learning models built using these 72 key features aid in risk stratification and mortality prediction. The mortality 73 rate in the Indian population is low compared to China and other countries. 74 Hence a comprehensive comparison between the cohorts from New Delhi and 75 Wuhan has been done, and analysis with respect to treatment protocols were 76 explored to identify possible factors for such a difference [14] . The data in this study was collected from patients with confirmed diagnosis of 80 COVID-19 at MAX group of Hospitals in New Delhi, India between June 3rd 81 and October 23rd, 2020. The patient records were collected and anonymized 82 at the data warehouse of CSIR-IGIB. A total of 544 patients with a clear final 83 outcome were considered in our study. Among these, diagnostic lab reports were 84 3/24 available as a time series of test results. The data collected contains 357 distinct 85 parameters (or biomarkers) that include vitals, symptoms, co morbid conditions 86 and lab reports from 161 different tests along with the medicines administered 87 for treatment. Multiple tests were recorded for each patient during their stay at 88 the hospital, varying from 1 to 134 records per patient.

Patients were categorized into risk levels-based on the severity of their condition 91 during their stay at the hospital. Although a description of the severity of each 92 patient is not available, since COVID-19 is a respiratory disorder that effects 93 the lungs, the amount of care provided to a patient with respect to respiratory 94 support was considered as an indicator of severity. Considering the size of 95 the dataset and the levels of respiratory support provided all the patient were 96 categorized into two levels mild and severe where all patients who died or who 97 were under some form of respiratory support or whose condition was specifically 98 mentioned to be severe were categorized into severe/high risk group and all the 99 remaining patients were put under mild/low risk group. The resulting dataset 100 follows the data distribution as shown in Table 1 .

The 15 most frequent tests corresponding to 38 biomarkers were selected for 102 analysis based on the availability of data. Five biomarkers WBC count, neu-103 trophil lymphocyte ratio (NLR), lymphocyte monocyte ration (LMR), neutrophil 104 monocyte ratio (NMR), platelet to lymphocyte ratio (PLR) were manually cal-105 culated from various blood cell counts available owing their reported importance 106 in predicting mortality due to COVID [21, 22] . 209 unique co-morbid conditions 107 were observed in patients in our study. To analyse them without exploding the 108 number of features and to avoid an increase in chances of over fitting due to 109 increase in dimensionality, we grouped all the encountered co-morbid conditions 110 into 7 groups based on the area or organ that the condition effects as shown 111 in Table S1 . Four more groups: diabetes, hypertension, hyperlipidaemia, and 112 cancer were considered due to their reported importance in mortality risk due 113 to COVID-19 [7] . This information was encoded into 11 binary features, each 114 representing one group where a sample assumes a value one if the patient has 115 one or more co-morbid conditions that fall into that group. To incorporate and 116 analyse the effects of medical prescriptions the information regarding prescription 117 of steroids and antiviral drugs was encoded into two binary features.

This leads to 70 unique parameters measured which include 11 grouped 119 co-morbid conditions, 14 clinical parameters, 2 RT-PCR genomic parameters 120 and 43 lab test results. An exhaustive list of categorical parameters can be 121 found in Table S1 and continuous parameters can be found in Table S2 . To 122 evaluate the significance of each parameter considered for risk stratification and 123 mortality prediction, we calculated the p value using the Chi-Squared test [23] 124 for the categorical features and using the ANOVA f-value test for the continuous 125 features. In our study, we evaluated how machine learning models trained on non-Indian 128 cohorts perform in predicting mortality on the Indian cohort. We used the best 129 performing model reported by Karthikeyan et.al [15] for predicting mortality 130 using data from Wuhan, China [14] to examine its applicability on the Indian 131 cohort. The Wuhan cohort comprises of data collected from 375 patients who were 132 admitted to Tongji Hospital, Wuhan. The model evaluated is a neural network 133 trained to predict mortality from CRP, LDH, neutrophil (%) lymphocyte(%) 134 and age. For predicting mortality in Indian Cohort using the same model, we 135 selected 3092 datapoints where at least 3 of the required 5 features were present. 136 KNN imputation was done to take care of the missing features. 137 We also explored the differences between Wuhan and New Delhi cohorts in 138 key biomarkers across survivors and the dead [14, 15, 24] . We choose mortality as 139 the indicator for comparison as it does not depend on subjective labelling. The 140 feature density histograms were analyzed to examine the variations in biological 141 parameters across survivors and the dead between cohorts of Wuhan and Delhi. 142 The Kolmogorov-Smirnov test (K-S test) [25] was used to analyze variations 143 in the density distributions of the important biomarkers between both classes 144 across cohorts. The K-S test is a non-parametric test that quantifies the distance 145 between the empirical distributions of samples sampled from two distributions. 146

Machine Learning Pipeline 147 Figure 1 depicts the overall pipeline used in this study for performing the risk 148 stratification and mortality prediction tasks. We compared several machine 149 learning algorithms namely XGBoost, random forests, Support Vector Machine 150 (SVM) and logistic regression for evaluating their predictive performance. A 151 detailed account of the step-by-step procedure is presented in the following 152 sections.

Data Pre-processing 154 For each patient in the dataset, there were multiple lab test results recorded on 155 different days before the outcome. We have considered each individual recorded 156 test result as a unique data point for training and testing as has been done 157 before [14, 15] . Each sample has a dimensionality equal to the number of unique 158 parameters measured across all lab tests considered for the analysis. The values 159

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; Figure 1 . Machine learning pipeline for the development of the risk stratification and mortality prediction tasks.

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; To build and validate machine learning models we split patients with respect 171 to the day of outcome. 429 patients with clear outcome by 11 September 2020 172 were considered for model development, and the remaining 115 patients were 173 considered as a part of a holdout test set. This method of splitting is adopted as 174 models developed will be used to aid future patients where it is known that the 175 COVID-19 and responses of its infected patients may change with time [16] [17] [18] [19] . 176 The day wise distribution of samples in both the train and test sets for risk 177 stratification and mortality prediction is shown in Figures S1 and S2 , respectively. 178

Among the 70 features chosen for analysis, selecting the most influential biomark-180 ers for risk stratification and mortality prediction by eliminating redundant or 181 unimportant parameters is crucial to avoid over-fitting when the size of the 182 dataset is small. Moreover, a lower number of features would mean cheaper and 183 faster tests for efficient risk profiling given the high influx of patients on a daily 184 basis and subsequently increased efficiency of the decision-making process of the 185

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101/2020.12.19.20248524 doi: medRxiv preprint healthcare systems.

The relative importances provided by an XGBoost classifier fit on the training 187 data for a particular task is used as the measure of importance for selecting 188 features. XGBoost is a powerful decision-tree-based ensemble algorithm that 189 uses a gradient boosting framework and estimates features that are the most 190 discriminative of model outcomes [26] . The relative importance of each feature 191 is determined by its accumulated use in each decision step in each tree of the 192 ensemble.

The number of features to utilize for model training was obtained by iteratively 194 training an XGboost model on a collection of the top K most important features 195 while increasing K by 1 in each iteration. The collection of features that achieved 196 the best performance for 5-fold cross validation on the training set was considered 197 as the set of key features to train the final models. The feature importances were 198 obtained separately for the binary risk stratification and mortality prediction 199 models. The classification performance for selecting the optimal set of features is 200 evaluated using AUC score for risk stratification and average precision score for 201 mortality prediction. Average precision score is used for mortality prediction due 202 to the heavy imbalance of samples representing fatal cases in mortality prediction. 203

After obtaining the collection of important features, duplicates that arose due 205 to the elimination of less important features were removed from the train set. 206 The set was then normalized to a range of 0-1 using min-max scaler to avoid 207 any biases due to differences in scales across parameters. The train set was then 208 resampled using the SMOTE algorithm to reduce bias that may arise due to 209 the class imbalance observed. The SMOTE algorithm was chosen to generate 210 synthetic samples of the minority class due to its good performance. Various 211 algorithms were trained and compared on the resampled dataset to classify the 212 samples depending on the task, either risk stratification or mortality prediction, 213 with their respective feature set. We also built another set of models trained on 214 only patient vitals to gauge the prediction performance that can be achieved 215 with data acquired before blood test results.

Testing

The hold out test data of 115 patients was normalized with min-max scaler to a 218 range of 0-1 using the min-max statistics obtained from the training set. Then 219 the models built were evaluated on the test set. We report the F1-scores of the 220 algorithms as the mean and standard deviation of performance of trained models 221 from 5-fold cross validation on the test set. F1-score is preferred over AUC and 222 accuracy as it is better in measuring performance when data imbalance exists. 223 The model achieving the best performance was then tested and analysed on the 224 set of samples corresponding to each individual day for a period of 14 days before 225 the final outcome to observe relevant trends.

The following metrics were recorded to assess the predictive performance of the 228 supervised models. Formulae for the calculation of all metrics are given below. 229 Here, TP, TN, FP, and FN stand for true positive, true negative, false positive 230 and false negative rates, respectively. 

The F1 score measures the harmonic mean of precision of recall and is often preferred to accuracy when the data has imbalanced classes:

where, P recision = T P T P + F P and,

T P T P + F N

A comparison of different clinical features between low and high-risk patients 239 was carried out. Tables S1-S4 show the differences in categorical and continuous 240 features between high and low risk groups, and between survivors and the dead. 241 The KS test showed that none of the continuous features followed a normal 242 distribution and hence the medians and interquartile ranges are reported. The 243 patients' age ranged between the age of 9 and 98 years with the median age 58 244 (48-66) years. The median age for the high-risk patients was 61 (53-68) years 245 while for the low-risk patients it was 53 (41-64) years. Out of the 544 patients, 246 164 (30.15%) were females while 380 (69.85%) were males. The blood clotting 247

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; Machine learning models for predicting mortality based on patients' blood 254 parameters have been reported, however, it is necessary to ascertain whether 255 these models are generalizable. Karthikeyan et.al. [15] built a neural network that 256 predicted mortality in Wuhan cohort with an accuracy of 96.5%±0.6% using only 257 five parameters, age, lymphocyte (%), neutrophil (%), LDH and CRP. The same 258 model when tested on the Indian cohort (current dataset) predicted mortality 259 with an accuracy of only 58%. The drop in performance of the model when tested 260 on the Indian group shows that there is a significant difference between the two 261 cohorts. Figure 3 demonstrates that the Neural Net was performing much better 262 in identifying the patients who died (precision 84.85%) over those who survived 263 (precision 49.54%). This suggests that the patients who were expected to die 264 based on the findings from Wuhan data were actually surviving in the Indian 265 Cohort.

To understand the difference between cohorts, we compared the feature 267 density histograms of Indian and Wuhan cohorts (Figure 4) . It was observed 268 that survival of patients with LDH in the range 500-1000 is much higher in Delhi 269 compared to Wuhan. It can also be observed that there are almost no survivors 270 with an LDH value greater than 800 in the Wuhan cohort while patients with 271

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ;  LDH values of even about 1000 have survived in Delhi Cohort. The survivability 272 of patients with CRP greater than 50 is higher in the Indian cohort compared to 273 Wuhan. Similar conclusions can be drawn with Indian patients having relatively 274 lower lymphocyte (%) and higher neutrophil (%). This is interesting as the 275 likelihood of survival with higher neutrophil (%) or lower lymphocyte (%) is 276 much lower [27] . 277 Figure 5 shows various matrices with two sample K-S statistics that measure 278 pairwise distances between distributions of important biomarkers of survivors 279 and the dead across Indian and Wuhan cohorts. It is observed that the distance 280 between distributions of the Indian Recovered (IR) and Indian Dead (ID) is 281 significantly lower compared to the distance between the distributions of the 282 Wuhan Recovered (WR) and Wuhan Dead (WD) for all the five biomarkers. 283 This is mainly due to the differences between distributions of recovered across 284 Delhi and Wuhan as the distance between the cohorts of the dead is low and the 285 distance between cohorts of the recovered is high. From this, it is evident that 286 many Indian patients who were at high risk of death according to the insights 287 from other cohorts have survived. 288 It is observed that the distance between WD and ID distributions is low, 289 especially in neutrophils (%), LDH and lymphocytes (%). The high survivability 290 of patients with extreme neutrophil and lymphocyte percentages is consistent 291 with the lower mortality rates observed in the Indian population compared to 292 several other countries. Identification of possible reasons for such a phenomenon 293 would enable minimization of mortality in other countries as well.

XGboost was used to rank features based on the contribution of each features 296 to the performance in risk stratification. Figure S3 shows the list of the top 25 297 features sorted in descending order with respect to their relative importance in 298 risk stratification. The 11 features that were selected to train the models in the 299 order of their importance are absolute neutrophil count, LDH, lymphocyte (%), 300 neutrophil(%), record of diabetes comorbidity, ferritin, INR, interleukin-6(IL-6), 301 oxygen saturation level, absolute eosinophil count and packed cell volume. Figure 302 S4 shows the density distributions for the top 4 features identified.

Comparison of the performance of various algorithms showed XGboost algo-304 rithm to perform the best with an F1-score of 0.810±0.01 as seen in Figure 6 . The 305 model also yielded better AUC (0.833±0.01) and average precision (0.891±0.01) 306 (Table S5 ). The confusion matrix of predictions from an XGboost model trained 307 on the entire train set is shown in Figure S5 . We also evaluated how the per-308 formance of model changes with days to outcome, where the day of outcome is 309 either the day of discharge from the hospital or the day of death. Figure 7 shows 310 that the performance of the risk stratification model decreases as the samples 311 approach the day of outcome. This suggests that the feature difference between 312 low risk and some high-risk patients who are recovering is decreasing towards 313 the day of outcome. However the performance of the mortality prediction model 314 increases towards the day of outcome. Hence, selective use of these two models 315 depending on the number of days from infection may be effective. 316 

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; Figure 4 . Comparison of the the normalized histogram plots of important features useful for predicting mortality from Wuhan and Indian Cohorts.

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101/2020.12.19.20248524 doi: medRxiv preprint Further, we trained and evaluated models with only patient vitals, comor-317 bidities, and medication information to evaluate the predictive performance 318 that can be achieved without lab test results. Figure S6 shows the F1 scores 319 of various models that were built to use only these patient information. The 320 random forests algorithm performed the best with an F1 score of 0.76±0.02. The 321 important features selected were administration of steroids, oxygen saturation 322 levels, record of diabetes, thyroid problems, presence of any other comorbidities, 323 weight, temperature, respiration rate, hypertension, and BMI.

Mortality Prediction 325 Figure S7 shows the top 25 features sorted in descending order with respect 326 to their relative importance in mortality prediction. The 9 features that were 327 selected to obtain the results in the order of their importance are D-Dimer, 328 Ferritin, Lymphocyte (%), Neutrophil to Lymphocyte ratio (NLR), WBC, Trop 329 I, INR, IL-6 and LDH. Figure S8 shows the density distributions for the top 4 330 features identified.

Logistic regression model performed the best with an F1-score of 0.710±0.02 332 as seen in Figure 6 . The model also yielded better AUC (0.927±0.01) and average 333 precision (0.801±0.02) ( Table S6 ). The performance of the model increases as 334 the samples approach the day of outcome as seen in Figure 7 . We trained 335 and evaluated models with only patient vitals, comorbidities, and medication 336 information to evaluate the predictive performance that can be achieved with 337 data excluding lab test results. Figure S6 shows the F1-scores of various models 338 that were built using the selected patient information. SVM performed the 339 best with an F1 score of only 0.34±0.03. The important features selected were 340 hypertension, record of any comorbidities related to liver, record of cancer, oxygen 341 saturation, administration of antivirals and respiration rate.

As a part of the study, we also compared the differences in neutrophil and 344 lymphocyte percentages across patients who were administered steroids and 345 patients who were not to understand if the treatment protocols followed in India 346 medical systems has an effect on the lower mortality rates. Of the 544 patients 347 involved in the study 338 (62.13%) patients were administered steroids. It was 348 observed that Methylprednisolone was the most widely administered steroid 349 that was given to 262 different patients, followed by Dexamethasone given to 89 350 patients. Prednisolone was administered to 11 patients while Hydrocortisone and 351 Triamcinolone were given to only one patient. It is to be noted that there were 352 instances where a single patient was administered with more than one of these. 353 Figure 8 shows the density histograms of neutrophil and lymphocyte percentages 354 for survivors and mild patients. It is observed that a higher proportion of 355 the survivors and mild patients who were administered steroids had extreme 356 neutrophil and lymphocyte percentages indicating that administration of steroids 357 may have had an impact in patient outcome. 358 

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; Figure 8 . Distribution plots for lymphocytes (%) and neutrophil (%) for the patients who survived and were administered steroids vs those who survived and were not administered steroids and patients who had mild severity and were administered steroids vs patients who had mild severity and were not administered steroids.

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. COVID-19 has spread around the globe and the need for fast and effective resource 360 allocation is urgent, but very few studies have examined Indian cohorts. Studies 361 show the effect of regional factors such as patient characteristics and method of 362 treatment on their response to the infection. Hence, it is critical to analyse Indian 363 patients to aid the healthcare systems of the second highest COVID-19 hit nation 364 in the world. In this study, we analysed 15648 samples of 544 patients, with 365 confirmed diagnosis of COVID-19, at MAX group of hospitals in New Delhi, India. 366 Each sample contains 70 unique parameters including the grouped comorbid 367 conditions, patient vitals, patient demographic information, and lab test results. 368 We found that existing mortality prediction models trained on Wuhan cohort 369 cannot be directly used for mortality prediction on the Indian cohort due to 370 cohort specific differences in response to COVID-19. We observed greater overlap 371 between dead and survivors' parameter/biomarker distributions in the Indian 372 cohort than in Wuhan. It was observed that KS distance between distributions 373 of WR and IR for neutrophil and lymphocyte percentages is comparatively high 374 while the distance between the distributions of the dead (WD, ID) across the 375 cohorts was low. This shows that the increased overlap in the distributions in 376 the Indian cohort is primarily due to survivors. Patients in India recovered even 377 when their neutrophil and lymphocyte percentages reached levels similar to the 378 levels of patients who died in Wuhan. While the observed differences could be 379 due to better healthcare or a less severe immune response, a probable reason 380 for the low mortality in the Delhi cohort may be the inclusion of steroids and 381 immunosuppressant drugs in the treatment protocols by the Government of India 382 early on in the timeline of the pandemic. Studies have shown that use of steroids 383 like Dexamethasone lowered COVID-19 fatalities significantly when administered 384 to patients who require supplemental oxygen [28] [29] [30] [31] . We observed a relation 385 between the usage of these drugs and the survival of patients with extreme 386 lymphocytes and neutrophils counts, which are associated with mortality( Figure 387 8) [14, 15, 24, 32, 33] .

Machine learning models for risk stratification and mortality prediction were 389 developed based on features extracted from Indian cohort. The important 390 features for risk stratification included blood parameters, diabetes comorbid 391 condition and oxygen saturation level. On the other hand, mortality prediction 392 is dependent only on blood parameters. Blood coagulation parameters (ferritin, 393 D-Dimer and INR), immune and inflammation parameters (IL6, LDH and 394 Neutrophil(%)) are common features for both risk and mortality prediction. 395 Features for mortality prediction also included NLR, WBC and Trop I. Some 396 of these features have been identified as predictors of the progression of the 397 COVID-19 disease [12, 14, 15, 24, [33] [34] [35] [36] [37] [38] [39] .

The best performing model for risk stratification on the Indian dataset was 399 the XGboost classifier, which acheived an F1-score of 0.81±0.01 while Logistic 400 regression yielded the best performance for mortality prediction with an F1-score 401 of 0.71±0.02. We also examined the performance of these algorithms when 402 trained on a dataset comprising of only vitals and clinical attributes, as these are 403 features that can be acquired quickly and may aid in the initial decision-making 404 process. The best performing models gave an F1 score of 0.76±0.02 for risk 405 stratification and 0.34±0.3 for mortality prediction. The low performance of these 406 models shows the importance of blood parameters in describing the progression 407 of COVID-19. 408 We observed that the progression of COVID-19 infection is accompanied 409 by hemocytometric changes with respect to the numbers of days to outcome 410 (Figures 9,10) . The final day of outcome was considered as it is a more stable 411 reference point compared to the day of admission as a patient may be identified 412 and admitted late in the progression of the disease. The patients who died 413 showed elevated levels of D-Dimer, Ferritin and NLR, while lymphocyte (%) 414 levels dropped. The separation of the biomarkers' values between the two classes 415 is observed to be consistent through the course of the disease. This shows their 416 significance in making predictions. Interestingly, the mortality prediction model 417 performed better when nearing the day of outcome whereas the performance 418 of the risk stratification model decreased as we move towards the day of the 419 outcome. The differences between the survivors and the dead increase as the 420 time progresses as survivors recover from the conditions whereas patients who 421 die do not, making it easier for any predictive model to classify. The performance 422 of risk stratification decreases as we move towards the day outcome because 423 as patients recover the differences between low risk and high-risk candidates 424 converge, making it more difficult for the model to classify. The proposed models 425 are based on the data collected from the Delhi region in India. This may introduce 426 regional biases and therefore, needed to be tested across multi-center. Our study 427 provides a preliminary assessment of the clinical course and outcome of Delhi 428 patients. We intend to test these models in the future on larger data collected 429 from multi-hospitals located in different geographic locations in India. As more 430 data becomes available, the whole procedure can easily be repeated to obtain 431 better models and more insights. Although we had a pool of about 70 clinical 432 measurements, here our modelling principle is a trade-off between the minimal 433 number of features and the capacity for good prediction, therefore avoiding 434 overfitting. Nevertheless, studies done on other cohorts have also identified these 435 features as key predictors [33] . 436 The major strength of our study is the inclusion of a relatively large group 437 of confirmed COVID-19 cases from India. The findings from this study will 438 not only help in clinical decision-making in Indian healthcare setting but will 439 also help healthcare systems worldwide with understanding of progression of 440 severity and the role of steroids in patient survival. This study enables to move 441 in the direction of building accurate risk and mortality prediction models and in 442 identifying significant trends in clinical course and in exploring the impact of 443 individual steroids on COVID-19 patients.

Accurate risk stratification and mortality prediction models based on vitals, co-446 morbidities and blood parameters will help in rapid screening of infected patients 447

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101/2020.12.19.20248524 doi: medRxiv preprint Figure 9 . Progression of biomarkers by risk

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101/2020.12.19.20248524 doi: medRxiv preprint Figure 10 . Progression of biomarkers by mortality

All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 22, 2020. ; https://doi.org/10.1101/2020.12.19.20248524 doi: medRxiv preprint and hence in optimal use of the healthcare infrastructure. It is likely that cohort-448 specific difference may emerge due to the difference in demographic conditions 449 and healthcare setting. This necessitates the development of population specific 450 solutions. There is also a need to study the effectiveness of certain treatment 451 protocols affecting mortality. Our study presents the first data collection effort 452 to develop predictive models and to study feature differences and the effect of 453 steroids in the Indian population. Risk stratification and mortality prediction 454 models yielded good performance and F1-scores of 0.81 and 0.71 respectively. 455 Haematological parameters are important features for risk stratification and 456 mortality prediction models. The analysis showed that steroids might have 457 played a role in patient survival with extreme neutrophils or lymphocytes. This 458 may indicate the effectiveness of the use of steroids in managing COVID19, and 459 possibly explain the effectiveness of the treatment protocols being followed by the 460 Indian medical systems. This study would help accelerate the decision-making 461 process in healthcare systems for focused and efficient medical treatments. 

Coronavirus disease 2019 (COVID-19): a perspective from China

Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The lancet

High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2. Emerging Infectious Diseases

Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China

Association between age and clinical characteristics and outcomes of COVID-19

Prevalence of Comorbidities Among Individuals With COVID-19: A Rapid Review of current Literature

Potential association between COVID-19 mortality and health-care resource availability. The Lancet Global Health

Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19

Predictors for Severe COVID-19 Infection

Early prediction of level-of-care requirements in patients with COVID-19

Development and external validation of a prognostic multivariable model on admission for hospitalized patients with COVID-19

Prediction for progression risk in patients with COVID-19 pneumonia: the CALL Score

An interpretable mortality prediction model for COVID-19 patients

Machine learning based clinical decision support system for early COVID-19 mortality prediction. medRxiv

All rights reserved. No reuse allowed without permission

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity

A virus that has gone viral: amino acid mutation in S protein of Indian isolate of Coronavirus COVID-19 might impact receptor binding, and thus, infectivity. Bioscience Reports

Mutations on COVID-19 diagnostic targets

Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2. bioRxiv

Dynamic correlation between intrahost HIV-1 quasispecies evolution and disease progression

The diagnostic and predictive role of NLR, d-NLR and PLR in COVID-19 patients. International immunopharmacology

The Neutrophil-to-Monocyte Ratio and Lymphocyte-to-Neutrophil Ratio at Admission Predict In-Hospital Mortality in Mexican Patients with Severe SARS-CoV-2 Infection (Covid-19). Microorganisms

Chi-square test and its application in hypothesis testing

An early warning tool for predicting mortality risk of COVID-19 patients using machine learning

Abnormal immunity of non-survivors with COVID-19: predictors for mortality

Remdesivir for the treatment of Covid-19-preliminary report. The New England journal of medicine

Dexamethasone and remdesivir: finding method in the COVID-19 madness. The Lancet Microbe

Dexamethasone in Hospitalized Patients with Covid-19-Preliminary Report. The New England journal of medicine

Remdesivir for the Treatment of Covid-19-Final Report

Leukoerythroblastic reaction in a patient with COVID-19 infection

A novel haemocytometric COVID-19 prognostic score developed and validated in an observational multicentre European hospital-based study

Higher procoagulatory potential but lower DIC score in COVID-19 ARDS patients compared to non-COVID-19 ARDS patients

Immune dysfunction leads to mortality and organ injury in patients with COVID-19 in China: insights from ERS-COVID-19 study

Interleukin-6 as a potential biomarker of COVID-19 progression. Médecine et Maladies Infectieuses

Serum ferritin as an independent risk factor for severity in COVID-19 patients

D-dimer as a biomarker for disease severity and mortality in COVID-19 patients: a case control study

Blood routine test in mild and common 2019 coronavirus (COVID-19) patients

Intel Commits $50 Million with Pandemic Response Technology Initiative to Combat Coronavirus

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted