key: cord-0950062-ddl8n52t authors: Zhou, Kai; Sun, Yaoting; Li, Lu; Zang, Zelin; Wang, Jing; Li, Jun; Liang, Junbo; Zhang, Fangfei; Zhang, Qiushi; Ge, Weigang; Chen, Hao; Sun, Xindong; Yue, Liang; Wu, Xiaomai; Shen, Bo; Xu, Jiaqin; Zhu, Hongguo; Chen, Shiyong; Yang, Hai; Huang, Shigao; Peng, Minfei; Lv, Dongqing; Zhang, Chao; Zhao, Haihong; Hong, Luxiao; Zhou, Zhehan; Chen, Haixiao; Dong, Xuejun; Tu, Chunyu; Li, Minghui; Zhu, Yi; Chen, Baofu; Li, Stan Z.; Guo, Tiannan title: Eleven routine clinical features predict COVID-19 severity uncovered by machine learning of longitudinal measurements date: 2021-06-17 journal: Comput Struct Biotechnol J DOI: 10.1016/j.csbj.2021.06.022 sha: e0a6b3f8757fafbeccf5ecf9a8282c83a27ddd31 doc_id: 950062 cord_uid: ddl8n52t Severity prediction of COVID-19 remains one of the major clinical challenges for the ongoing pandemic. Here, we have recruited a 144 COVID-19 patient cohort, resulting in a data matrix containing 3,065 readings for 124 types of measurements over 52 days. A machine learning model was established to predict the disease progression based on the cohort consisting of training, validation, and internal test sets. A panel of eleven routine clinical factors constructed a classifier for COVID-19 severity prediction, achieving accuracy of over 98% in the discovery set. Validation of the model in an independent cohort containing 25 patients achieved accuracy of 80%. The overall sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 0.70, 0.99, 0.93, and 0.93, respectively. Our model captured predictive dynamics of lactate dehydrogenase (LDH) and creatine kinase (CK) while their levels were in the normal range. This model is accessible at https://www.guomics.com/covidAI/ for research purpose. ized and monitored with intensive care to prevent deterioration of the disease that may lead to fatality without timely diagnosis and treatment. Currently, the diagnosis of COVID-19 mainly depends on the virus RNA test of SARS-CoV-2 [2, 3] . This is a qualitative test showing whether the patient is infected by the virus [4] . Computed tomography (CT) is a complementary strategy for COVID-19 diagnosis [5] . CT images have been used to facilitate severity determination of COVID-19 patients in an artificial intelligence (AI)assisted model [5] . However, about 20% of COVID-19 patients showed no obvious imaging changes in the lung [6] , bringing difficulties to physicians when making decisions for suitable clinical treatments. To better evaluate the disease conditions of COVID-19 patients, routine laboratory tests, including but not limited to complete blood cell count, blood biochemistry and immune tests are taken regularly by physicians. Physicians then make clinical decisions and prescribe treatments accordingly. However, this is laborious and sometimes biased, depending on the physician's own experience, especially when facing with such a heavy medical burden in the pandemic. Therefore, automatic integration and interpretation of routine laboratory indexes in an unbiased way will be beneficial to severity stratification and prognosis evaluation for COVID-19 patients. Machine learning has been applied in medicine for clinical data such as images for disease diagnosis and classification. During the COVID-19 pandemic, machine learning has widely applied in the diagnosis, prognosis, and vaccine development of COVID-19 [7] . Studies focusing on building predictive models for supporting clinical decision-making have been reported, including prediction of outcome [8] , evaluation of mortality risk [9] , development of critical illness [10] , and monitoring the severity [11, 12] , etc. Risk factors associated with the severity of the COVID-19 disease have been widely reported [7] . Several clinical features, such as age, gender, lactic dehydrogenase (LDH), C-reactive protein (CRP), and lymphocyte count, have been reported to be highly correlated with the severity of COVID-19 patients [11] . Recently, a Chinese team found that three key features (LDH, CRP, and lymphocyte) can be used to predict the mortality of COVID-19 with over 90% accuracy [13] . These models were built from a limited amount of data and might be biased [11] . In a paper published by Liang et al., a risk-scoring algorithm based on some key characteristics of COVID-19 patients at the time of admission to the hospital was developed, which may assist in predicting a patient's risk of developing into critical illness [14] . More recently, they also built a survival model to predict critical cases by comorbidity and several laboratory indexes [10] . These models are built on datasets containing static information at certain time points, for example, upon admission or discharge, mostly when the patients were in severe or critical status. However, the time from symptom onset to admission varied in a large range due to different local medical resources and personal medical treatment intention. The time of onset of symptoms as a starting point is more objective and reasonable. As the disease progresses, the physiological condition and respective laboratory indicators of the patient are constantly changing. However, few models make use of the longitudinal data to predict disease severity due to the lack of dynamic data. Therefore, it remains challenging to stratify the COVID-19 patients when they are in the transient stage from mild to severe. In the present study, we built a machine learning model based on the longitudinal measurement of a panel of 124 clinical indicators in a retrospective patient cohort containing 144 COVID-19 patients to predict the disease severity. Eleven key clinical factors were prioritized to be highly associated with COVID-19 severity. The model achieved 93% accuracy in distinguishing severe patients among the infected cases in the whole datasets. From January 17 to March 10 in 2020, 841 patients have been screened by SARS-CoV-2 nucleic acid test in Taizhou Hospital. From them, 144 patients were diagnosed as COVID-19 patients by reverse transcriptase-polymerase chain reaction (RT-PCR) and chest CT according to the Pneumonia Diagnosis and Treatment Scheme for New Coronary Virus Infections (Trial Edition 5, Revision). A total number of 124 indicators from 17 categories of laboratory tests have been regularly monitored over 52 days. Clinical data for these patients were curated from the hospital information system (HIS), included epidemiology, gender, age, BMI, underlying diseases, chest CT, presenting symptoms, length of stay (LOS). Laboratory data included complete blood count parameters, blood biochemistry and immune index, blood coagulation indicator, lymphocyte subsets, cytokines, and arterial blood gas (ABG). Another twenty-five independent test readings were collected from Shaoxing People's Hospital following the same criteria with the Taizhou test set. This study was approved by the Medical Ethics Committee of Taizhou Hospital, Shaoxing People's Hospital, and Westlake University. The informed consent was obtained from each enrolled subject. Besides, the case of minors enrolled in the study was approved by parents and/or legal guardians. Blood samples were collected at each time point since admission. 2 mL EDTA-K2 anticoagulant peripheral blood samples were measured for complete blood count using a Sysmex 2100D routine hematology analyzer (Kobe, Japan). Erythrocyte sedimentation rate (ESR) was calculated using the Alifax Test 1 automatic ESR analyzer (UDINE, Italy). Cytokines and lymphocyte subsets were measured using BD FACSCantoTM II within 6 h. Sodium citrate plasma samples were centrifuged at 1500 g for 15 min. Coagulation parameters were determined using a Sysmex CS 5100i automatic hemagglutination analyzer (Kobe, Japan). ABG analysis was performed using GEM Premier 3500 (Instrumentation Laboratory, US). Serum samples were centrifuged at 1500 g for 10 min for measuring biochemical data including electrolyte, liver and kidney functional proteins, immunoglobulin serial index, blood lipid and glucose, myocardial enzymes (except myoglobin, creatine kinase MB and troponin-I), and infection index (except procalcitonin) using Beckman automatic biochemical analyzer (AU5821). Myoglobin, creatine kinase MB and troponin-I were detected by Beckman automatic immunological analyzer (UniCel DXI-800). Procalcitonin (PCT) levels were determined using the Roche Cobas e411 electrochemiluminescence analyzer (Basel, Switzerland). Throat swab and sputum specimens collected during hospitalization were sent to the PCR laboratory (BIOSafety Laboratory II) in a biosafety transportation box. Total nucleic acid extraction from the samples was performed using Nucleic Acid Extraction Kit (Shanghai Zhijiang) and RT-PCR was performed using a commercial kit specific for 2019-nCoV detection (triple fluorescence PCR, Shanghai Zhijiang, China, NO. P20200105) approved by the China Food and Drug Administration (CFDA). The patients (except pregnant) had a chest CT examination at the time of admission. For each patient, the chest CT was performed according the needs of the disease changes during the period of her/his hospitalization. A CT examination was also operated at the time of her/his discharge. The points of CT score are defined by the following rules. Based on the lesion involvement and lesion properties, each infected lobe adds one point. Presence of ground-glass opacity adds one point. Two points were added in the presence of consolidation lesions, while three points were added in case of fibrosis lesions. The score was reduced by 0.5 point if the CT is improved compared to the previous CT scan. Otherwise, the score was increased by 0.5 point. The abbreviations, symbols and markings that we used throughout the text are provided in Supplementary Table 1. The data matrix from Taizhou Hospital was divided into a discovery dataset and a test dataset. This method defined the samples collected before February 2, 2020 as the discovery dataset (228*124) and the samples collected after February 2, 2020 as the test dataset (130*124). Feature selection and model training were performed in the discovery dataset. The model was then tested in the test dataset. We also included an independent test data matrix (25*11) from Shaoxing People's Hospital for further testing. The data analysis included four steps: data preprocessing, feature selection, model training and testing. In the data preprocessing step, all missing values were first filled with a median value of all patients. Then as normalization, the mean value of all features in the discovery set was converted to 0 and the standard deviation was converted to 1. The same normalization parameters were applied to the test set. In the feature selection step, the standard genetic algorithm (GA) method in the Python deap library was used. We set the gene locus in the method as the index of the feature, and set the length of the gene chain to 20 (this indicates that the method can select up to 20 different features). Then we set the crossover probability to 0.3, the mutation probability to 0.5, the number of genes in the population to 500, and the number of iterations to 30. The accuracy rate of the 10-fold cross-validation on the verification set was taken as the fitness of the gene chain. Finally, we selected eleven non-repeated and effective features. In the model training step, the discovery dataset was divided into training set and the validation set. The training set was used to train the model, while the validation set was used to optimize the model parameters. Thereafter, the test dataset was used to test the model for prediction accuracy. The support vector machine (SVM) model with 'rbf' kernal in Python's scikit-learn library was employed. Hyperparameters including regularization parameter C and threshold T. Kernel coefficient were optimized. Regularization parameter C was selected from [1.0, 1. Table 2 ). The above parameters were optimized and evaluated according to AUC. The detection result of patient i on a certain feature f is D i;f , then the data composed of the selected features of the patient is D i . Then we have D i ¼ fD i;f 1 ; D i;f 2 ; D i;f 3 ; Á Á Á ; D i;f N g as the number of selected features. Enter D i into the trained model MðÁÞ, and the model will output a predicted score S i . We made the diagnosis based on predicted score S i . When Y i is 1, patient i is diagnosed as severe. When Y i is 0, patient i is diagnosed as non-severe.T is the threshold for diagnosis. Due to the heterogeneity of the positive and negative samples, it is not possible to directly use T ¼ 0:5 for diagnosis. The threshold T is therefore determined by maximizing the correct rate of diagnosis in the validation set, and the same T was applied to the evaluation test set. Statistical analysis of clinical characteristics was performed by SPSS (version 19.0). Continuous variables were represented by median and range, Kruskal-Wallis H test was used between multiple groups. Categorical variables were expressed as numbers (percentages), and a comparison between groups was made by chi-square. Further investigations were performed by R software (version 3.6.3). The comparison of continuous variables between two groups was performed using Student's t-test for normally distributed variables and the Mann-Whitney U statistics for nonnormally distributed variables. P-value in boxplot was calculated by the unpaired two-tailed Student's t-test, and in violin plot pvalue was adjusted by Benjamini & Hochberg method. The smooth plot was fitted by locally estimated scatterplot smoothing (LOESS) using the geom smooth function in ggplot. PPV and NPV were adjusted by the ratio of severe cases followed by the published formula [15] . A total of 841 patients have been screened with the SARS-CoV-2 nucleic acid test from January 17 to March 10, 2020 in Taizhou Hospital, with 144 patients with positive virus RNA (COVID- 19) and 697 non-COVID-19 individuals (Fig. 1A) . From the non-COVID-19 group, outpatients and patients lacking chest CT results were excluded. 65 patients were recruited as the control group, with their epidemiological information collected. Meanwhile, 144 COVID-19 patients were stratified into severe (N = 36) and nonsevere (N = 108) patients based on the clinical diagnosis guideline [6] (Fig. 1A) . 124 types of measurements from 17 categories over 52 days were collected and manually curated for these 144 COVID-19 patients, resulting in a data matrix containing 3,065 readings for 124 types of measurements (3065*124). The 17 categories of data included basic information, clinical symptoms and signs, chest CT results, and laboratory tests, as detailed in Supplementary Table 3 that were regularly recorded during their hospitalization. This cohort has 55.6% male in the severe COVID-19 group and 52.8% male in the non-severe COVID-19 group. The median age was 55.0 years for the severe group and 44.5 years for the nonsevere group. The median BMI was 25.5 kg/m 2 in the severe group and 23.8 kg/m 2 in the non-severe group (Table 1) . These parameters are consistent with previous observations [16] . The most common symptom at disease onset was fever (64.8% in non-severe and 94.4% in the severe group), followed by cough and pharyngalgia (Table 1) . Before admission, 40.2% of patients had underlying diseases, and the ratio was substantially higher in the severe group (50.0%) than that in the non-severe group (37.0%). Hypertension (15.3%) and diabetes (9.7%) were common comorbidities. Upon admission, 142 (98.6%) COVID-19 patients had image changes in chest CT. The pulmonary plaque was the top 1 abnormal pattern (52.8%), with ratios of abnormality of 66.7% for severe patients and 48.2% for non-severe patients, respectively. The second abnormal pattern was ground-glass opacity (severe vs. nonsevere was 41.7% vs. 31.5%). Ratios of pulmonary fibrosis and consolidation did not show a statistical difference between severe patients and non-severe ones. In severe patients, intervals from onset and admission to the diagnosis of severe cases were 9 (median) days and 2 (median) days. All 144 patients were followed up during their entire course in hospital. In this study, only one patient was admitted to the intensive care unit (ICU) and underwent invasive mechanical ventilation. They were all cured and discharged eventually. The median LOS for the patients was 20 days for non-severe group and 23 days for severe group, respectively. In summary, this is a well-annotated and curated COVID-19 cohort with comprehensively and systematically recorded information from the disease onset till convalesce and discharge, which provided the potential for subsequent model construction. To establish a model for severity prediction, we filtered the matrix including 3065*124 readings for all patients over 52 days in total (Fig. 1B) . The readings with a definite diagnosis for severe COVID-19 were excluded. Then the readings recorded from symptom onset to the 12th day for all patients were included for severity prediction by machine learning, in which the ratio of severe verse non-severe was close to 20% prevalence of severe cases according to previous studies [1] . At this time point (12th day), no patient had been clinically diagnosed as severe COVID-19 cases. After removing readings with more than 90% missing values, a much smaller data matrix containing 358*124 readings remained. The discovery dataset contains 228*124 readings collected from patients admitted before 1 February 2020, while the test dataset includes 130*124 readings from patients admitted after 1 February 2020. Based on the discovery dataset, a machine learning model was built up by cross training and validation (Fig. 1B) . To further test our model, we also collected an independent test dataset from another clinical center, employing the same criteria as the previous one (Fig. 1C) . The modeling contains three parts, including feature selection, model training, and prediction ( Fig. 2A) . Firstly, the missing value for each test item in the discovery dataset was filled with a relevant median value for each gender (female or male). We randomly generated feature combinations and loaded the data with selected features. Then feature selection was performed by using \GA\ [17] , one of the most advanced and the widely used algorithm for feature selection, assisted with 10-fold cross-validation in the discovery dataset, including both training (9/10) and validation (1/10) sets. The method of GA which selected a panel of features could avoid being trapped into a local optimal solution. As a result, a panel of eleven key clinical factors associated with COVID-19 severity was evolved from 124 characteristics, and included in the 'Active Feature Pool'. Then we applied the selected features and established the classifier using the SVM [18] . In this step, we set up ten random seeds, and the SVM model was trained in the training set (8/10) and then validated in the validation set (2/10). The hyper-parameters of the model were evaluated by AUC and the threshold was optimized to maximize accuracy. The threshold was set at 0.45, which was determined by maximizing the correct diagnosis rate in the validation set. Via this way, a trained SVM model including the eleven active features and classifier was generated for predicting severe cases in the test dataset (130*124) and an independent test dataset (25*11) using the same threshold of 0.45 ( Fig. 2A) . We further tested six other classifiers using LightGMB, Catboost, Random Forest, AdaBoost, Nearest Neighbors and Decision Tree. Four of six achieved AUC around 90%; the other achieved more than 70% (Sup- Table 4 ). These results indicate that our selected features are robust. SHapley Additive exPlanations (SHAP) algorithm [19, 20] was performed to interpret the eleven features in this model, namely oxygenation index, basophil counts (BASO#), aspartate aminotransferase (AST), gender, magnesium (Mg), gamma-glutamyl transpeptidase (GGT), platelet counts, activated partial thromboplastin time (APTT), oxygen saturation (SaO2), body temperature and days of symptom onset. Their importance to the model was evaluated by SHAP values (Fig. 2B) . The oxygenation index was the most important one, with a SHAP value of 0.94. In clinical prac-tice, the oxygenation index is also a critical factor to evaluate the state of disease severity. Furthermore, nine features with continuous records (12 days) were selected and compared in severe and non-severe groups. Their median values (discovery dataset) were calculated and shown in boxplots (Fig. 2C) . Six of nine features were significantly dysregulated (p-value less than 0.05). Among them, oxygen index and SaO2 were decreased in severe cases. This was directly associated with pulmonary functions. Two selected features, AST and GGT, were up-regulated, indicating that COVID-19 may induce slight hepatic injury in the early stage. Although within the normal range, the count of platelets was reduced significantly in severe cases. Three characteristics, namely APTT, Mg, and BASO#, showed no significant change between severe and non-severe COVID-19 patients, and a comprehensive comparison was further performed. As shown in Supplementary Fig. 1 , readings from four groups of individuals, severe COVID-19, non-severe COVID-19, non-COVID-19 patients with flu-like symptoms, and healthy people with physical examination were included for systematic comparison. The data showed that APTT was substantially up-regulated in the severe COVID-19 group in the entire dataset, including the discovery and the test datasets, and the count of basophils was decreased in all COVID-19 patients compared with the healthy group. However, Mg showed no difference across the four groups. To validate the importance of Mg in the model, we used the rest ten features except for Mg and investigated the performance of the model in ROC (Supplementary Fig. 2) . The AUC values were the same in the training and validation dataset and only decreased by 0.01 in the test dataset without the feature of Mg, which indicated the little contribution of Mg to the model. The remaining two indicators not shown in the box diagram are days of symptom onset and gender. This is not surprising because when the patients were evolving from non-severe to severe status, the symptoms grew worse. Gender was selected as another important feature. In our dataset, male patients were more likely to be infected SARS-CoV-2 than female patients in both non-severe and severe groups, consistent with the literature reporting that male COVID-19 patients had a worse outcome and that 70% of patients who died of COVID-19 were male in an Italy cohort [21] . The vulnerability of males has also been found in the SARS-CoV epidemic in 2003 [22] . It has been recently found that the plasma concentra- tion of ACE2, a functional receptor for SARC-CoV-2 infection, is higher in men than that in women, as detected in two independent cohorts [23] . This may explain the association between gender and the fatality rate of COVID-19. Meanwhile, females develop enhanced innate and adaptive immune responses than males did, thus they are less susceptible to kinds of infections of bacterial, viral, parasitic, and fungal origin and malignancies [24] . In fact, the clinical manifestations of infectious or autoimmune diseases and malignant tumors differ between men and women. The model assigned a score (from 0 to 1) to indicate the likelihood of disease severity, and a higher score indicated greater severity. The model is described in detail in Methods, while the data are shown in the scatter plot (Fig. 3A) . Samples with a score greater than 0.45 were identified as a severe state. The receiver operating characteristics (ROC) plot achieved an AUC value of 1.00 and 0.98 for training and validation datasets, respectively (Fig. 3B) . 224 out of 228 samples were correctly identified with an accuracy of 0.98 for the discovery dataset. The sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were 0.90, 0.99, 0.96 and 0.99, respectively (Fig. 3C) . To further evaluate the performance of the eleven-clinical feature classifier, we analyzed 130*124 readings in the test dataset and 25*11 readings in the independent test dataset from a different hospital. The classifier achieved AUC value of 0.89 (Fig. 3B ) and correctly classified 112 of 130 readings with accuracy, sensitivity, specificity, PPV, and NPV of 0.86, 0.56, 0.98, 0.91, and 0.85, respectively. Sixteen of 18 incorrectly classified readings were from the severe COVID-19 group, especially in the early stage of COVID-19, which showed similar clinical signs with the non-severe group. Most of the incorrect predictions (10 of 18 readings) were from four COVID-19 patients, indicating an individual effect on the model. As to the independent test dataset of much smaller size, the classifier achieved an AUC value of 0.75 and accuracy, sensitivity, specificity, PPV, and NPV of 0.80, 0.64, 0.93, 0.69, and 0.91, respectively (Fig. 3C) . All incorrectly identified cases in this dataset belonged to the early stage of COVID-19 with days of symptom onset less than eight days. We also examined the performance of the model from the longitudinal perspective. The test dataset was divided into three parts according to the length of symptom lasting period since disease onset. In general, the longer the disease progressed, the better the model predicted. As the onset time increased, the accuracy of prediction elevated from 0.79 (1-4 days), 0.82 (5-8 days) to 0.91 (9-12 days) in the test set (Fig. 3D ). There were two incorrectly identified readings in non-severe patients. They were both from the initial three days since onset, probably due to the low oxygen index (less than 380 mmHg), which was out of the normal range (400-500 mmHg). As to the part of 9-12 days, three readings of the severe group were classified into the non-severe group, while two of them were scored with 0.42 and 0.44, which were very close to the threshold. The unique curated longitudinal data permitted detailed temporal investigation of specific parameters. We investigated the temporal changes of eight out of eleven critical clinical features in both severe and non-severe COVID-19 patients during the first 40 days since disease onset. In addition, we also analyzed 18 more clinical features (Fig. 4) that were found to be related to the progression of COVID-19 disease [11, 25, 26] . Based on longitudinal changes of blood cell counts (such as platelet), our data showed that the severity of the disease intensified in the second week since symptom onset (Fig. 4) . The counts of platelets, lymphocytes, eosinophils, and basophils were reduced in the severe group, while neutrophils and WBC were dramatically elevated (Fig. 4) . Decreased number of lymphocytes has been reported in association with the COVID-19 severity [27] , probably due to apoptosis and necrosis of lymphocytes [28] . Dysregulation of neutrophils was another predominant alteration in the complete blood count. Neutrophils are recruited early to sites of infection where they kill pathogens (bacteria, fungi, and viruses) by oxidative burst and phagocytosis [29] . A significant rise in neutrophil counts features the severity of COVID-19. An increase of neutrophils may induce the formation of neutrophil extracellular traps (NETs) which could trigger a cascade of inflammatory reactions [30] . The latter induces damages to surround tissues, facilitates micro thrombosis, and results in permanent organ injuries to the pulmonary, cardiovascular, and renal systems [29] , which are three commonly affected organ systems in severe COVID-19 [13] . Eosinophils were drastically reduced after SARS-CoV-2 infection and were merely detectable in the acute stage (Fig. 4) . The increase of eosinophils in non-severe patients happened earlier than that in severe cases, suggesting that the increase of eosinophils can be the signs for COVID-19 recovery. The predictive value of eosinophil count observed in our study is supported by a recent independent study [10] . The count of basophils is another predictive feature of our model. Basophils are the least abundant granulocytes, representing less than 1% of peripheral blood leukocytes. Recent studies have shown that the count of basophils was reduced in COVID-19 patients [31] , agreeing with our data. Moreover, an independent study that proposed a multivariate Cox regression model exploring risk factors for lethal COVID-19 patients also nominated progressive increase of basophils [32] . Coagulation dysfunction has been found in multiple epidemiologic studies of COVID-19 [13, 33] . Our data showed that platelet counts in blood decreased in severe patients significantly (Fig. 4) , consistent with a previous meta-analysis reporting low platelet count as a risk factor for COVID-19 severity and mortality [34] . Furthermore, coagulation-related parameters such as D-dimer elevated too, especially in death cases [33] . Thrombin time (TT) and APTT are often used to monitor the coagulation function of patients, which are mostly increased in the presence of heparin or heparin-like substances. CRP is an acute-phase protein, the expression level of which in the blood is associated with the severity of inflammation [35] . As an indicator of infection, in our dataset, CRP was significantly increased in both mild and severe cases, far beyond the normal range. CRP decreased as the disease progressed and recovered. We also evaluated the functions of organs since it has been reported that SARS-CoV-2 infection systematically induced multiorgan dysfunctions, such as lung, liver, kidney, and heart, particularly in severe and critical cases [36, 37] . The commonly used indicators in clinic include pulmonary function indicators including SaO2, blood-gas lactate (LAC) and oxygen index, liver function indicators (GGT, AST, and ALT), renal function indicators (eGFR and creatinine), and cardiac function indicator (LDH), were monitored. ABG analysis is a key approach to evaluate the function of the lungs by measuring acidity and the level of oxygen and carbon dioxide, which provides important indexes and direct evidence for indicating pulmonary function and the severity of COVID-19. Our data showed that oxygen-related indexes, namely SaO2 and oxygen index, decreased in severe COVID-19 patients (Fig. 4) , suggesting a high degree of sensitivity for severity prediction and thus selected as pivotal features in our model. The level of blood-gas LAC reached a peak in the first ten days in the severe group, agreeing with the literature [25] . Regarding liver functions, the dynamic change of GGT, AST and ALT showed that hepatic functions got worse around the third week (Fig. 4) . Moreover, levels of the three indexes were higher in severe patients than in non-severe patients from the beginning of the disease. This may be partly due to the side effect of multidrug administration according to ALT and GGT levels. AST was a predominant feature in our predictive model, which was elevated in the severe group. A multicenter retrospective study including 5771 adult patients reported that AST increased first, followed by ALT in severe cases with liver injury. This alternation was associated with dysregulation of lymphocyte and neutrophil counts [38] ; the latter was selected by our model as well. The estimated glomerular filtration rate (eGFR) is a direct indicator of renal function, and its level in severe patients was significantly lower than that in non-severe patients. On the opposite, the level of urea was elevated in the severe group in the early stage (first ten days) while remained stable in the non-severe group. Electrolytes were also closely related to renal functions, and most of them were within the reference interval except for sodium and potassium in the severe group (Fig. 4C) . LDH is widely expressed in various organs, and it has been reported as a key cardiac marker closely associated with COVID-19 severity [39] . In our study, LDH peaked around the 10th day in severe COVID-19 patients comparing with non-severe cases, although the peak value did not exceed the upper limit of the normal range of LDH for males. Another cardiac indicator, CK, was also in the normal range; however, its dynamics differed between severe and non-severe patients. The major limitation of this study is the relatively small size of the patient cohort, although some interesting clues have been uncovered. Therefore, the model still requires further validation in sizeable cohorts from multiple clinical centers. In summary, we built a customized machine learning model for COVID-19 severity prediction based on the longitudinal measurement of the feature clinical factors over time. The model composed of eleven routine clinical features which are widely available might be further developed into a practical tool for COVID-19 management. Question: 1) To predict whether a COVID-19 patient would develop into severe cases based on routine clinical information using machine learning. 2) To monitor how critical clinical parameters change over the disease course of COVID-19. Findings: We present a novel support vector machine model for evaluating the severity of COVID-19 patients based on routine clinical parameters. Days after symptom onset, basophil counts, magnesium, and gamma-glutamyl transpeptidase have not been reported previously. Our model is freely available at webserver https://www.guomics.com/covidAI/. In addition, we depicted the dynamics of key factors over the disease course. Meaning: The model composed of eleven routine clinical features which are widely available might be further developed into a practical tool for COVID-19 management. T.G. and Y.Z. are shareholders of Westlake Omics Inc. W.G., Q.Z., and H.C. are employees of Westlake Omics Inc. The other authors declare no competing interests in this paper. Y.S., L.L., Y.Z., and T.G. wrote the manuscript with inputs from all co-authors. This study was approved by the Medical Ethics Committee of Taizhou Hospital, Shaoxing People's Hospital and Westlake University, Zhejiang province of China, and informed consent was obtained from each enrolled subject. Besides, the case of minors enrolled in the study was approved by parents and/or legal guardians. The datasets analyzed during the current study are available from the corresponding author on reasonable request. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72314 cases from the Chinese Center for disease control and prevention Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China COVID-19: Discovery, diagnostics and drug development CRISPR-Cas12-based detection of SARS-CoV-2 Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed Tomography Clinical characteristics of coronavirus disease 2019 in China Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning Developing a COVID-19 mortality risk prediction model when individual-level data are not available Early triage of critically ill COVID-19 patients using deep learning Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal Predictors of COVID-19 severity: A literature review An interpretable mortality prediction model for COVID-19 patients Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19 Diagnostic tests 2: Predictive values Clinical characteristics of Covid-19 in China Elements of Generic Algorithms. An Introduction to Generic Algorithms A GA-based feature selection and parameters optimizationfor support vector machines The Shapley value: essays in honor of Notes on the n-Person Game -II: The Value of an n-Person Game Case-fatality rate and characteristics of patients dying in relation to COVID-19 in Italy Do men have a higher case fatality rate of severe acute respiratory syndrome than women do? Circulating plasma concentrations of angiotensin-converting enzyme 2 in men and women with heart failure and effects of renin-angiotensin-aldosterone inhibitors Sexual dimorphism in innate immunity Early predictors of clinical deterioration in a cohort of 239 patients hospitalized for Covid-19 infection in Lombardy, Italy The Lancet Global H. Decolonising COVID-19 COVID-19, ECMO, and lymphopenia: a word of caution Lymphopenia predicts disease severity of COVID-19: a descriptive and predictive study Targeting potential drivers of COVID-19: Neutrophil extracellular traps An emerging role for neutrophil extracellular traps in noninfectious disease Dysregulation of immune response in patients with COVID-19 in Wuhan, China Longitudinal hematologic and immunologic variations associated with the progression of COVID-19 patients in China Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China Thrombocytopenia is associated with severe coronavirus disease 2019 (COVID-19) infections: A meta-analysis Role of C-reactive protein at sites of inflammation and infection COVID-19 with different severities: A multicenter study of clinical features The science underlying COVID-19: Implications for the cardiovascular system Longitudinal association between markers of liver injury and mortality in COVID-19 in China Clinical characteristics of refractory COVID-19 pneumonia in Wuhan, China This work is supported by grants from National Key R&D Program of China (No. 2020YFE0202200), National Natural Science Foundation of China (81972492, 21904107), Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars (LR19C050001), Hangzhou Agriculture and Society Advancement Program (20190101A04), and Tencent Foundation (2020). We thank the patients enrolled in this study, and the physicians, nurses, and secretaries of the Taizhou Hospital of Zhejiang Province and Shaoxing People's Hospital for their critical contributions.