key: cord-0881668-0a2kzq6s authors: Sun, Chaoyang; Bai, Yong; Chen, Dongsheng; He, Liang; Zhu, Jiacheng; Ding, Xiangning; Luo, Lihua; Ren, Yan; Xing, Hui; Jin, Xin; Chen, Gang title: Accurate classification of COVID‐19 patients with different severity via machine learning date: 2021-02-26 journal: Clin Transl Med DOI: 10.1002/ctm2.323 sha: e21cda5ca67c1e041af24eb7722699ed946d3370 doc_id: 881668 cord_uid: 0a2kzq6s nan idated, resulting in a micro-average AUROC and microaverage AUPR of 0.9941 and 0.9837 based on an independent testing set, respectively ( Figure 1E and F). The confusion matrix ( Figure 1G ) showed that all patients in the independent testing set were correctly identified, except for two mild patients who were predicted as severe. For further validation, we trained different XGBoost models through the same training protocol with each singleomics data. Results demonstrated that the XGBoost model outperformed models trained using single-omics features ( Figure 1I and J, Figure S5 ). Furthermore, we trained an additional XGBoost model based on the 24 features identified in Guo's method 5 (two proteins and three metabolites were not detected in our experiment), leading to micro-average AUROC and micro-average AUPR in independent testing set be 0.9305 and 0.8300, respectively ( Figure S5 ), which may be partially due to the different purposes for model construction. Guo's method sought to distinguish severe patients from nonsevere patients, whereas we attempted to identify four groups of COVID-19 patients' severity. The uniform manifold approximation and projection (UMAP) plot showed distinct separation of the four severity groups( Figure 1H ). Furthermore, we calibrated our model using Platt scaling method in a oneversus-rest fashion. The expected calibrator error (ECE) and brier score (BS) were computed to evaluate calibration (see Methods in the Supporting Information). As a result, the ECE for the uncalibrated model and the calibrated model was 0.0773 and 0.0996, respectively, whereas the BS for the uncalibrated model and the calibrated model was 0.0312 and 0.0353, respectively ( Figure S6 ). These results suggested that the output probabilities of our model can represent uncertainty about prediction. Together, our results implied that the XGBoost model based on the top 60 multi-omics features could precisely differentiate COVID-19 patient severity status. Many machine learning-based models have been developed to predict outcomes of patients with COVID-19. Nevertheless, most of those models were created based on computed tomography images or several diagnostic predictors such as age, body temperature, clinical signs and symptoms, complications, epidemiological contact history, pneumonia signs, neutrophils, lymphocytes, and Creactive protein (CRP) levels. Recently, COVID-19 patients that may become severe were identified by applying a prediction model that was developed using proteomic and metabolomic measurements. 5 Although all these models reported promising predictive performance with high Cindices, they carried a high risk of bias according to the Prediction model Risk Of Bias ASsessment Tool (PROBAST). 6 This was because most prediction models did not exclude the patients with severe comorbidities and had a high risk of bias for the participant group or used nonrepresentative controls, making the prediction results unreliable. Here, we minimized the selection bias using strict inclusion and exclusion criteria (see Methods in the Supporting Information). According to the results of the prediction model, most mRNAs were highly correlated with asymptomatic patients ( Figure 1K ; Figure S1 ). Multi-omics features such as TBXA2R, ALOX15, IL1B, IFIT2, BCL2A1, LSP1, glycyl-L-leucine, and l-aspartate were highly expressed in the asymptomatic group, and thus may potentially yield crucial diagnostic biomarkers for identifying asymptomatic COVID-19 patients. In the critical illness group, besides CRP which has already been used to monitor the severity of COVID-19, some immune-related features, such as EEF1A1, FGL1, LRG1, CD99, COL1A1, cholinesterase(18:3), monoacylglyceride(18:1), cannabidiolic acid, and betaasarone were found to be highly expressed in the critical group. Moreover, two transcription factor encoding genes, ZNF831 and RORC, closely associated with immune response were lowly expressed in critical patients. Using these features, we could optimize existing approaches to improve the accuracy and sensitivity of detection based on nucleic acid testing and predict asymptomatic patient prognosis more accurately. With the assistance of this machine-learning model, we could help identify individuals with a high risk of poor prognosis in advance, and prevent progression in time to minimize individual, medical, and social costs. In summary, we developed an XGBoost-based model by integrating multi-omics data to dissect subtle changes in gene expression and pathways of COVID-19 patients with different severity levels. Our model reached microaverage AUROC and micro-average AUPR as high as 0.9941 and 0.9837, respectively, which could clearly distinguish patients from different severity groups and accurately predict pathological status. In addition, analysis of the top 60 multi-omics features demonstrated that our model had the potential of discovering molecules associated with the pathogenesis of COVID-19. Overall, the methodology employed in this study could be widely applied for the study of other diseases and provide clues for the control and treatment of patients suffering from COVID-19 and many other infectious diseases. The authors declare no conflict of interest. Impaired type I interferon activity and inflammatory responses in severe COVID-19 patients Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans Omics-driven systems interrogation of metabolic dysregulation in COVID-19 pathogenesis The trans-omics landscape of COVID-19 1. medRxiv Proteomic and metabolomic characterization of COVID-19 patient sera Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal