key: cord-318044-podm4mjd authors: Wu, P.; Chen, D.; Ding, W.; Hou, H.; Bai, Y.; Zhou, Y.; Li, K.; Xiang, S.; Liu, P.; Ju, J.; Guo, E.; Liu, J.; Yang, B.; Fan, J.; He, L.; Sun, Z.; Feng, L.; Wang, J.; Wu, T.; Wang, H.; Cheng, J.; Xing, H.; Meng, Y.; Li, Y.; Zhang, Y.; Luo, H.; Xie, G.; Lan, X.; Tao, Y.; Yuan, H.; Huang, K.; Sun, W.; Qian, X.; Li, Z.; Huang, M.; Ding, P.; Qiu, J.; Wang, F.; Wang, S.; Zhu, J.; Ding, X.; Chai, C.; Liang, L.; Wang, X.; Luo, L.; Sun, Y.; Yang, Y.; Zhuang, Z.; Li, T.; Tian, L.; Zhang, S.; Zhu, L.; Chen, L.; Wu, Y.; Ma, X.; Chen, F.; Ren, Y.; Xu, X.; Liu, S.; Yang, H.; Wang, title: The Trans-omics Landscape of COVID-19 date: 2020-07-22 journal: nan DOI: 10.1101/2020.07.17.20155150 sha: doc_id: 318044 cord_uid: podm4mjd System-wide molecular characteristics of COVID-19, especially in those patients without comorbidities, have not been fully investigated. We compared extensive molecular profiles of blood samples from 231 COVID-19 patients, ranging from asymptomatic to critically ill, importantly excluding those with any comorbidities. Amongst the major findings, asymptomatic patients were characterized by highly activated anti-virus interferon, T/natural killer (NK) cell activation, and transcriptional upregulation of inflammatory cytokine mRNAs. However, given very abundant RNA binding proteins (RBPs), these cytokine mRNAs could be effectively destabilized hence preserving normal cytokine levels. In contrast, in critically ill patients, cytokine storm due to RBPs inhibition and tryptophan metabolites accumulation contributed to T/NK cell dysfunction. A machine-learning model was constructed which accurately stratified the COVID-19 severities based on their multi-omics features. Overall, our analysis provides insights into COVID-19 pathogenesis and identifies targets for intervening in treatment. ( Figure 3C) , supporting the observed thrombocytopenia and coagulopathy in critically 287 ill patients. 288 Metabolites also showed distinct profiles in the different clusters. In particular, 290 compared to other severities, increase in phenylalanine and tryptophan biosynthesis 291 was observed in critical COVID-19 patients (Figures 3D-E, Table S9 .3). Tryptophan 292 metabolism was considered a biomarker and therapeutic target of inflammation 293 (Sorgdrager et al., 2019) and changes in tryptophan metabolism were reported to be 294 correlated with serum interleukin-6 (IL-6) levels (Moffett and Namboodiri, 2003) . 295 Consistently, we detected enrichment of IL-6 in critical patients using enzyme-linked 296 immunosorbent assay (ELISA) ( Figure 3E , Table S2) . Figure S9) . Notably, 304 we found two groups of lipids showing specific expression patterns, the 305 phosphatidylethanolamine lipids belonging to the gradually increasing cluster (cluster 306 1), and triglyceride lipids presented in the U-shaped cluster with elevated levels in the 307 asymptomatic and critical patients. Consistent with previous findings that RNA virus 308 replication is dependent on the enrichment of phosphatidylethanolamine distributed at 309 the replication sites of subcellular membranes (Xu and Nagy, 2015) , we observed that 310 the expression patterns for a cohort of 16 phosphatidylethanolamine lipids positively 311 correlated with COVID-19 severities ( Figure 3F) . Furthermore, we identified that a 312 repertoire of 36 triglyceride lipids were relatively low in both mild and severe COVID-313 19 patients ( Figure 3G ). In addition to the above abundant structural lipid classes, 314 several bioactive lipids also changed significantly in the symptomatic groups, including 315 lysophosphatidylcholine (inhibiting endotoxin-induced release of late proinflammatory cytokine) (Yan et al., 2004) and lysophosphatidyliositol (an endogenous agonist for 317 GPR55 whose activation regulates several pro-inflammatory cytokines) Cancino et al., 2017), suggesting that lipidome changes that interfere with cell 319 membrane integrity and normal functions or disturb inflammatory and immune states 320 may play important and complex roles in COVID-19 disease development ( Figure S9) . 321 322 Based on multi-omics analysis, we found that the mild and severe groups shared many 324 similar characteristics. However, it is important to predict these two groups, which is 325 important for early intervention and then preventing disease progress. We therefore 326 developed an XGBoost (extreme gradient boosting) machine learning model by 327 leveraging multi-omics data. We randomly stratified samples for the training set (80%) 328 and the independent testing set (20%) (Figure 4A, the 60 features such as C-reactive protein (CRP) and EEF1A1 were expressed highly 372 in critical patients ( Figure 4K) . These results demonstrated that our model could not 373 only be employed to stratify COVID-19 patients, but also discover molecular associated 374 with pathogenesis of COVID-19. 375 With the global outbreak of SARS-CoV-2, COVID-19 has become a serious, worldwide 378 and public health concern. However, comprehensive analysis of multi-omics data 379 within a large cohort remains lacking, especially for patients with various severity 380 grades, i.e., asymptomatic across the course of the disease to critically ill. To the best 381 of our knowledge, this is the first trial designed to systematically analyze trans-omics 382 data of COVID-19 patients with grade of clinical severity. Furthermore, it is worth 383 emphasizing that we excluded all patients of extreme age or with comorbidities, to 384 minimize bias due to confounding factors related to severity. 385 386 Asymptomatic patients have drawn great attention as these silent spreaders are hard to 387 identify and cause difficulties in epidemic control (Long et al., 2020) . Here, we 388 demonstrated an unexpected transcriptional activation of the pro-inflammatory 389 pathway and inflammatory cytokines (Figure 2B and 5A) . However, consistent with a 390 recent report (Long et al., 2020) , secretion of inflammatory cytokines such as IL-6 and 391 IL-8 was extremely low in sera from the asymptomatic population (Table S2 ). In 392 contrast, critically ill patients were characterized with excessive inflammatory cytokine 393 production ( Figure 3C , Table S2 ), whereas their transcription levels were only 394 modestly elevated (Figures 2B, 5A) . Recently, by applying a proteomic and metabolomic measurement prediction model, 493 COVID-19 patients that may become severe cases were identified (Shen et al., 2020) . 494 Although these models all report promising predictive performance with high C-indices 495 (Wynants et al., 2020) , they also carry a high risk of bias according to the PROBAST 496 bias assessment tool (Moons et al., 2019) . This is because most prediction models have 497 not excluded patients with severe comorbidities and had a high risk of bias for the 498 participant group or used non-representative controls, making the prediction results 499 unreliable. Here, using strict inclusion and exclusion criteria, we minimized selection 500 bias. According to the results of the prediction model, most mRNAs were highly 501 correlated with asymptomatic patients (Figures 4K, S10) . Multi-omics features such as 502 TBXA2R, ALOX15, IL1B, IFIT2, BCL2A1, LSP1, glycyl-L-leucine and l-aspartate were 503 highly expressed in the asymptomatic group, and thus may potentially yield crucial 504 diagnostic biomarkers for identifying asymptomatic COVID-19 patients. In the critical 505 illness group, beside CRP which has already been used to monitor the severity of 506 COVID-19, some immune-related features, such as EEF1A1, FGL1, LRG1, CD99, 507 COL1A1, cholinesterase(18:3), monoacylglyceride(18:1), Cannabidiolic acid and beta-508 asarone were found to be highly expressed in the critical group. Using these features, 509 we could optimize existing approaches to improve the accuracy and sensitivity of 510 detection based on nucleic acid testing and predict asymptomatic patient prognosis 511 more accurately. With the assistance of this machine learning model, we could help 512 identify individuals with a high risk of poor prognosis in advance, and prevent 513 progression in time to minimize individual, medical and social costs. 514 515 The limitations of this research are as follows: (1) We did not enroll a healthy population 517 as a control group, so conclusions made in this study are only limited to differences in 518 COVID-19 severity. However, it is worth emphasizing that our research focused on the 519 diversities and similarities in consecutively severe COVID-19. All stages of COVID-520 19 were included in our study design, in an attempt to identify key clues or biomarkers 521 to distinguish disease severity and help prevent disease progression. (2) We excluded 522 patients with comorbidities. As aforementioned, comorbidities including cancer history, 523 hypertension, diabetes, cardiovascular disease, and respiratory diseases can affect 524 COVID-19 progression. However, it remains unclear how these comorbidities affect 525 COVID-19 progression and the relative weight of these comorbidities to progression. 526 Thus, it would be risky to apply a specific prediction model to all COVID-19 patients. 527 In conclusion, our study presented a panoramic landscape of blood samples within a 529 large cohort of COVID-19 patients with various severities from asymptomatic to 530 critically ill. Through trans-omics analyzing, we uncovered multiple novel insights, 531 biomarkers and therapeutic targets relevant to COVID-19. Our data provided valuable 532 clues for deciphering COVID-19, and the underlying mechanism warrant further 533 pursuits. 534 535 Acknowledgements 536 The study was supported by funding from National University Basic Scientific Table S1 and S2. 582 The mean age of the patients was 46.7 years old (Standard Deviation=13.5), and the 583 ratio of male to female was 1.12:1. All these patients were diagnosed following the 584 guidelines for COVID-19 diagnosis and treatment (Trial The whole genome data was generated through the following steps: 1) DNA was 615 randomly fragmented by Covaris. The fragmented genomic DNA were selected by 616 Magnetic beads to an average size of 200-400bp. 2) Fragments were end repaired and 617 then 3' adenylated. Adaptors were ligated to the ends of these 3' adenylated fragments. whole genome sequencing was conducted on MGI2000 PE100 platform with 100bp 620 paired end reads. 621 Transcriptome RNA data was generated through the following steps: 1) rRNA was 622 removed by using RNase H method, 2) QAIseq FastSelect RNA Removal Kit was used 623 to remove the Globin RNA, 3) The purified fragmented cDNA was combined with End 624 Repair Mix, then add A-Tailing Mix, mix well by pipetting, incubation, 4) PCR 625 amplification, 5) Library quality control and pooling cyclization, 6) The RNA library 626 was sequenced by MGI2000 PE100 platform with 100bp paired-end reads. 627 Small RNA data was generated through the following steps: 1) Small RNA 628 enrichment and purification, 2) Adaptor ligation and Unique molecular identifiers 629 (UMI) labeled Primer addition, 3) RT-PCR, Library quantitation and pooling 630 cyclization, 4) Library quality control, 5) Small RNAs were sequenced by BGI500 631 platform with 50bp single-end reads resulting in at least 20M reads for each sample. PCA was performed using PLINK (v1.9) (Chang et al., 2015) . Bi-allelic SNPs were 649 selected based on the following criteria: minor allele frequency (MAF) ≥ 5%; 650 genotyping rate ≥ 90%; LD prune (window = 50, step = 5 and r2 ≥ 0.5). A subset of 651 605,867 SNPs was used to perform PCA on the 203 unrelated individuals. We used 652 rvtest (Zhan et al., 2016) to perform genotype-phenotype association analysis for 653 5,082,104 bi-allelic common SNPs with MAF > 5%. Gender, age and top 10 principal 654 components were used as covariates for all the association tests. The qqman (Turner, 655 2014) and CMplot R packages (Yin, 2020) were applied to generate the Manhattan plot 656 and quantile-quantile plot. We defined genome-wide significance for single variant 657 association test as 5e -8 , suggestive significance as 1e -6 . 658 We obtained matched proteomics, lipidomics, metabolomics, gene expression and SNP 661 genotyping data for COVID-19 patients (n = 132). For the genotyping data, we removed 662 outlier SNPs with MAF < 0.05. The QTL analysis (cis-eQTL analysis [local, distance 663 < 10kb] for gene expression data, QTL analysis for proteomics, lipidomics, 664 metabolomics data) was conducted using linear regression as implemented in 665 MatrixEQTL (Shabalin, 2012) . In this analysis, age and gender (1 for male and 2 for 666 female) were considered as covariates. Associations with a p value less than 0.001 were 667 kept, followed by FDR estimation using the Benjamini-Hochberg procedure as 668 implemented in Matrix-QTL. QTL associations with an FDR-corrected p value < 5e -8 669 were considered significant (Frochaux et al., 2020) . 670 671 Gene Expression Analysis 672 RNA-seq raw sequencing reads were filtered by SOAPnuke (Li et al., 2008) to remove 673 reads with sequencing adapter, with low-quality base ratio (base quality < 5) > 20%, 674 and with unknown base ('N' base) ratio > 5%. Reads aligned to rRNA by Bowtie2 675 (v2.2.5) (Langmead and Salzberg, 2012) were removed. Then, the clean reads were 676 mapped to the reference genome using HISAT2 (Kim et al., 2015) . Bowtie2 (v2.2.5) 677 was applied to align the clean reads to the transcriptome. Then the gene expression level 678 (FPKM) was determined by RSEM (Li and Dewey, 2011) . Genes with FPKM > 0.1 in 679 at least one sample were retained. Differential expression analysis was performed using 680 DESeq2 (v1.4.5) with gender and age as confounders. Differential expressed genes 681 were defined as those with Benjamini Hochberg adjusted p value < 0.05 and fold 682 change > 2. GO enrichment analysis was performed using clusterProfiler (Yu et al., 683 2012) . GO BP terms with an FDR adjusted p value threshold of 0.05 were considered 684 as significant (Abdi, 2007) . 685 Small RNA raw sequencing reads with low quality tags (which have more than 686 four bases whose quality is less than ten, or have more than six bases with a quality less 687 than thirteen.), the reads with poly A tags, and the tags without 3' primer or tags shorter 688 than 18nt were removed. After data filtering, the clean reads were mapped to the 689 reference genome and other sRNA database including miRbase, siRNA, piRNA and 690 snoRNA using Bowtie2 (Langmead and Salzberg, 2012) . Particularly, cmsearch 691 (Nawrocki and Eddy, 2013) (Table S7 ) and mRNA-lncRNA were calculated (Table S8) . 702 Correlation pairs with coefficients < -0.5 in mRNA-miRNA or < -0.6 in mRNA-703 lncRNA were retained. MultiMiR was used to confirm the top pairs of mRNA-miRNA 704 by performing miRNA target prediction (Ru et al., 2014) . The mRNA-miRNA and 705 mRNA-lncRNA networks were visualized using Cytoscape ( Figure 2D ) (Shannon et 706 al., 2003) . 707 The sera samples were inactivated at 56°C water bath for 30min and followed by 710 For each patient in the cohort, we computed intensity for a given lipid complex 794 class by summing up intensity of each lipid in the class. For each lipid complex class, 795 the intensity value of each patient was further scaled by median value of intensity from mild patient group. We applied Mann-Whitney U-test (multiple comparisons correction The Bonferonni and Šidák Corrections for Multiple Comparisons. 941 Encyclopedia of measurement and statistics 3 A global 944 reference for human genetic variation Presumed 946 Asymptomatic Carrier Transmission of COVID-19 Imbalanced Host Response 949 to SARS-CoV-2 Drives Development of COVID-19 Proteomics of SARS-CoV-2-infected host cells reveals therapy 952 targets Host-Viral Infection Maps Reveal Signatures of 955 Severe COVID-19 Patients Type III interferons disrupt 958 the lung epithelial barrier upon viral recognition. Science Post-transcriptional regulation of gene expression in innate immunity Cellular mRNA decay protein 963 AUF1 negatively regulates enterovirus and human rhinovirus infections A familial cluster of pneumonia associated with the 2019 967 novel coronavirus indicating person-to-person transmission: a study of a family cluster Second-generation PLINK: rising to the challenge of larger and richer datasets XGBoost: A Scalable Tree Boosting System MSstats: an R package for statistical analysis of quantitative mass 975 spectrometry-based proteomic experiments The metabolite BH4 controls T 978 cell proliferation in autoimmunity and cancer Discovery 981 of a Novel and Selective Indoleamine 2,3-Dioxygenase (IDO-1) Inhibitor 3-(5-Fluoro-982 1H-indol-3-yl)pyrrolidine-2,5-dione (EOS200271/PF-06840003) and Its 983 Characterization as a Potential Clinical Candidate Reduction and Functional Exhaustion of T Cells in Patients With 986 Coronavirus Disease 2019 (COVID-19). Front Immunol 11 Epidemiology 988 of COVID-19 Among Children in China Genomewide Association 991 Study of Severe Covid-19 with Respiratory Failure Exploring regulation in tissues 994 with eQTL networks Tools -A fast and accurate solution to variant calling from next-generation sequence 997 data. bioRxiv cis-regulatory 1000 variation modulates susceptibility to enteric infection in the Drosophila genetic 1001 reference panel MCPIP1 Endoribonuclease 1004 Activity Negatively Regulates Interleukin-17-Mediated Signaling and Inflammation. 1005 Immunity Targets of T 1008 Cell Responses to SARS-CoV-2 Coronavirus in Humans with COVID-19 Disease and 1009 Unexposed Individuals Comorbidity and its impact on 1590 patients 1012 with COVID-19 in China: a nationwide analysis Limitations and Off-Target Effects of 1014 Tryptophan-Related IDO Inhibitors in Cancer Treatment HISAT: a fast spliced aligner with 1016 low memory requirements Adaptive 1018 immune cells temper initial innate responses Asymptomatic and Presymptomatic 1021 SARS-CoV-2 Infections in Residents of a Long-Term Care Skilled Nursing Facility -1022 King County Feature Selection with the Boruta Package Structure of the SARS-CoV-2 spike receptor-binding domain bound 1027 to the ACE2 receptor Fast gapped-read alignment with Bowtie 2 RSEM: accurate transcript quantification from RNA-1031 Seq data with or without a reference genome Fast and accurate short read alignment with Burrows-1033 Wheeler transform CT image visual quantitative evaluation and clinical classification of 1036 coronavirus disease (COVID-19) SOAP: short oligonucleotide 1038 alignment program Evaluation 1040 and minimization of nonspecific tryptic cleavages in proteomic sample preparation A FOXO3-IRF7 gene 1044 regulatory circuit limits inflammatory sequelae of antiviral responses Genomic Analyses from Non-invasive Prenatal Testing Reveal 1048 Genetic Associations, Patterns of Viral Infections, and Chinese Population History Clinical and immunological assessment of asymptomatic 1052 SARS-CoV-2 infections Accurate Identification of SARS-CoV-2 from Viral Genome 1055 Sequences using Deep Learning. bioRxiv Moderated estimation of fold change and 1057 dispersion for RNA-seq data with DESeq2 Genomic characterisation and epidemiology of 2019 novel 1060 coronavirus: implications for virus origins and receptor binding SARS-CoV-2 Infection in Children A Unified Approach to Interpreting Model 1064 Predictions From Local Explanations to Global 1067 Understanding with Explainable AI for Trees FoxO3 controls autophagy 1070 in skeletal muscle in vivo Advances in the Physiology of GPR55 in the Central 1073 Nervous System Post-transcriptional regulation of immune responses 1075 by RNA binding proteins Tryptophan and the immune response PROBAST: A Tool to Assess Risk of 1080 Bias and Applicability of Prediction Model Studies: Explanation and Elaboration IDO takes a blow GCN2 kinase in T cells mediates proliferative arrest and anergy induction 1085 in response to indoleamine 2,3-dioxygenase Infernal 1.1: 100-fold faster RNA homology 1087 searches Cytokine mRNA Degradation in 1093 Cardiomyocytes Restrains Sterile Inflammation in Pressure-Overloaded Hearts Case-Fatality Rate and Characteristics 1096 of Patients Dying in Relation to COVID-19 in Italy The therapeutic potential of targeting tryptophan catabolism 1099 in cancer Asymptomatic cases in a family cluster with SARS-CoV-2 infection Targeting indoleamine-2,3-dioxygenase 1105 in cancer: Scientific rationale and clinical evidence The multiMiR R package and 1108 database: integration of microRNA-target interactions along with their disease and drug 1109 associations Auf1/Hnrnpd-deficient mice develop pruritic 1111 inflammatory skin disease Matrix eQTL: ultra fast eQTL analysis via large matrix 1113 operations Cytoscape: a software environment for 1116 integrated models of biomolecular interaction networks Proteomic and Metabolomic Characterization of COVID-19 Patient 1119 Sera IL-6 in inflammation, immunity, 1124 and disease A 1127 pathogenetic role for TNF alpha in the syndrome of cachexia, arthritis, and 1128 autoimmunity resulting from tristetraprolin (TTP) deficiency qqman: an R package for visualizing GWAS results using Q-Q and 1130 manhattan plots From 1133 FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices 1134 pipeline A guide to cancer 1136 immunotherapy: from T cell basic science to clinical practice Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Intestine-Specific Homeobox Gene ISX Integrates IL6 Signaling Tryptophan Catabolism, and Immune Suppression metaX: a flexible and comprehensive 1144 software for processing metabolomics data Coronavirus (COVID-19) Mortality Rate. Last updated Risk Factors Associated With Acute Respiratory Distress 1150 Syndrome and Death in Patients With Coronavirus Disease 2019 Pneumonia in Wuhan, 1151 China Plasma Metabolomic and Lipidomic Alterations Associated 1154 with COVID-19 Characteristics of and Important Lessons From 1156 the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report 1157 of 72314 Cases From the Chinese Center for Disease Control and Prevention Prediction models 1160 for diagnosis and prognosis of covid-19: systematic review and critical appraisal Transcriptomic characteristics of bronchoalveolar lavage fluid 1164 and peripheral blood mononuclear cells in COVID-19 patients RNA virus replication depends on enrichment of 1167 phosphatidylethanolamine at replication sites in subcellular membranes Therapeutic effects of lysophosphatidylcholine in 1171 experimental sepsis clusterProfiler: an R package for 1174 comparing biological themes among gene clusters RVTESTS: an efficient 1176 and comprehensive tool for rare variant association analysis using sequence data Viral and host factors related to the clinical outcome of COVID-19 Risk factors of critical & mortal COVID-19 cases: A systematic 1182 literature review and meta-analysis 1186 1187 1188 1189 Figure 1. Patient Enrollment, Study Design and Trans-omics Profile of COVID-19 1190 Severity 1191 (A) Overview of patient enrollment criteria and the study design including multi-omics profiling from 1192 blood samples of COVID-19 patients spanning four disease severities including Asym (short for 1193 asymptomatic), Mild, Severe and Critical. Venn diagram showing the overlapping of final samples pass 1194 QC for each high throughput method Gene Expression Changes through Disease Severity 1200 (A) Venn diagram showing the overlapping of genes that are significantly altered in three symptomatic 1201 groups (mild, severe and critical) compared to asymptomatic group B) GO enrichment analysis using genes in each cluster showing different changing patterns through 1204 progressive disease severity. GO direction is the median log2 fold change relative to mild of significant 1205 genes in each GO term (blue, downregulated; red, upregulated). The dot size represents GO significance Gene expression changes across four severity groups showing dynamic genes in T cell activation, 1207 interferon-gamma production, regulation of inflammatory response, regulation of inflammatory response 1208 and protein K48-linked ubiquitination D) miRNA-mRNA and lncRNA-mRNA interaction networks showed regulations of miRNAs and 1210 lncRNAs of the dynamic genes See also Figures S1-S2 and Tables S3.2, S6.1-S6 Multi-omic Changes in Proteins, Metabolites and Lipids across Four 1214 Severity Groups Clustering of longitudinal trajectories using circulating plasma analytes including proteins, 1216 metabolites and lipids Enriched GO Biological Process (BP) terms for proteins in seven clusters Each column indicates a 1219 patient sample and row representes proteins. Color of each cell shows Z-score of log2 protein abundance 1220 in that sample. 1221 (D) Enriched KEGG pathway for metabolites in seven clusters Heatmap of representing metabolites expression in Phenylalanine and Tryptophan metabolism Dynamic expression changes in Phosphatidylethanolamine lipids across four disease severity groups The dots represent the log2 fold change relative to asymptomatic for each lipid in the group. 1225 (G) Dynamic expression changes in Triglyceride lipids across four disease severity groups See also Figures S7-S9 and Table S2, S9.1-S9.3 and S10.1-10.2. 1227 1228 Figure 4. Performance of Machine Learning Model to Predict COVID-19 Patient 1229 Severity of Asymptomatic, Mild, Severe and Critical using Multi-omics Data The model was trained with cross 1231 validation using training set (n=108) after normalization and feature selection, and re-trained with the 1232 identified top 60 important features. The re-trained model was further applied to assess generalization 1233 and performance using independent testing set (n=27) Performance of the model learned in training set in terms of mean micro-average AUROC (B) and 1235 mean micro-average AUPR (C) Top 60 important features (mRNA, n=23; protein, n=19; metabolite, n=11; lipid, n=7) ranked by 1238 SHAP value. The stacked bar indicated the average impact of the feature on the model output magnitude 1239 for different classes Performance of XGBoost model based on the top 60 features for distinguishing the 4 groups of 1241 COVID-19 severity in independent testing set in terms of AUROC (E) and A UPR (F) Confusion matrix for predicting COVID-19 severity in independent testing set(n=27) UMAP plot based on the top 60 features showing the distinct separation among the 4 types of 1244 COVID-19 severity in the whole data set Comparison of performance of models learned by each single-omics data with that of multi-omics 1246 data in independent testing set in terms of AUROC (I) and AUPR (J) Heatmap of demonstrating the top 60 features profiles of the 4 groups of COVID-19 severity in the 1248 whole dataset -adamantylcarbonyl)hydrazine-1-carbothioamide, 3-(*) See also Figures S10~S14 1252 1253 Figure 5. The novel insight of COVID-19 through Progressive Disease Severity 1254 (A) Heatmap of demonstrating cytokines and chemokines DEGs expression across four disease 1255 severity Heatmap of demonstrating the expression levels of DEGs involved in RNA-binding proteins 1257 (RBP) across four disease severity The variation patterns of gene expression of cytokines and RBP, and immunological parameters 1259 across four disease severity Heatmap of demonstrating the expression levels of DEGs involved in IFN-1 responses across 1261 four disease severity Relative expression abundance of exhaustion marker genes CTLA4, BTLA, HAVCR2, ICOS and 1263 PDCD1 in T cells. The relative expression abundance of the exhaustion marker genes was defined 1264 as their expression level dividing the expression level of T cell marker gene CD3E Summary of the pathways of tryptophan metabolism 3-dioxygenase. Box 1267 plots in this panel showed the expression level change (log2-scaled original value) of selected 1268 regulated metabolites across four disease severity with Bonferroni) to test statistically significant difference of scaled intensity of each 798 lipid complex class between severity groups ( Figure S9) . 799 800 Expression data was first adjusted using robust linear model (RLM) for gender and age. 802The residuals following RLM were analyzed by Two-sided Mann-Whitney rank test 803 for each pair of comparing group and p values were adjusted using Benjamini & 804Hochberg. Differentially expressed proteins, metabolites or lipids were defined using 805 the criteria of adjust p value < 0.05 and fold change > 1.5. 806 807 Clustering 808Clustering was performed using the R package 'Mfuzz' after log2-transformation and 809 Z-score scaling of the data. For mRNA from whole blood, genes differentially 810 expressed in at least three out of the six comparison groups were clustered. For proteins, 811 metabolites, lipids from sera, all the three analytes were clustered together. 812 Pathway analysis 814To annotate the proteins and metabolites in 7 clusters, gene ontology (GO) enrichment 815 analysis were performed to obtain the enriched GO Biological Process terms of proteins 816 in different clusters by clusterProfiler (Yu et al., 2012 with AUPR values for overall. In addition, confusion matrices (predicted label as the 915 index of maximum value of the predicted probability vector) and UMAP plots (with 916 parameters of the number of neighbors being 10, the minimum distance between points 917 being 0.5 and the distance metric being Manhattan) were also generated for evaluating 918 the performance. 919To compare the performance of model based on multi-omics data to that based on 920 single-omics data, we trained XGBoost model for single-omics data using the same 921 training protocol as multi-omics data, except that we only empirically picked top 30 922 important features to train the final single-omics based XGBoost model. Moreover, we 923 selected 20 proteins and 4 metabolites mentioned in Guo's method (Shen et al., 2020) , 924where 2 proteins and 1 metabolite were not found in our data set while 2 metabolites 925 were greater than level 3 that were removed from our analysis. We trained XGBoost 926 model using these 24 features with the same training protocol. Those models were 927 evaluated on the unseen 20% independent testing set and calculated the same the 928 performance matrices as mentioned above (Figure S14) . 929To further investigate the top 60 important features, we applied Mann-Whitney U-test 930 (multiple comparisons correction with Bonferroni) to test statistically significant 931 difference of each normalized features between severity groups (Figure S10-S13) . 932 933 The data that support the findings of this study, including the genome-wide association 935 test summary statistics, expression matrices for multi-omics have been deposited in 936 CNSA (China National GeneBank Sequence Archive) in Shenzhen, China with 937 accession number CNP0001126 (https://db.cngb.org/cnsa/). 938 939