key: cord-1047905-44h511ln authors: Haiminen, Niina; Utro, Filippo; Seabolt, Ed; Parida, Laxmi title: Functional profiling of COVID-19 respiratory tract microbiomes date: 2020-09-10 journal: bioRxiv DOI: 10.1101/2020.05.01.073171 sha: 18c9079769522e4935c7b1224de435c8ee4567cd doc_id: 1047905 cord_uid: 44h511ln In response to the ongoing global pandemic, progress has been made in understanding the molecular-level host interactions of the new coronavirus SARS-CoV-2 responsible for COVID-19. However, when the virus enters the body it interacts not only with the host but also with the micro-organisms already inhabiting the host. Understanding the virus-host-microbiome interactions can yield additional insights into the biological processes perturbed by the viral invasion. Here we present a framework for comparative functional analysis of microbiomes, the results from which can lead to new hypotheses on the host-microbiome interactions in healthy versus afflicted cohorts. We carry out a functional analysis of previously published RNA sequencing data of bronchoalveolar lavage fluid from eight COVID-19 patients, twenty-five community-acquired pneumonia patients, and twenty healthy controls. The resulting microbial functional profiles and their top differentiating features clearly separate the cohorts. We also detect differentially abundant microbiome pathway signatures in COVID-19 compared to both the community-acquired pneumonia and healthy cohorts. By utilizing randomization to evaluate the significance of the findings, we highlight pathways with altered abundance in COVID-19 respiratory tract microbiomes, including decreased lipid and glycan metabolism. When clustering the samples using the top differentiating features, the COVID-19 samples are grouped together and separate from the other cohorts, except for sample 4 (Fig. 2) . While the examined 44 features were selected as those differentiating COVID-19 from CAP and healthy controls, they also separate the healthy control samples from all others; the healthy controls cluster tightly together. The results also demonstrate that the samples do not merely cluster by the total number of input reads or the fraction of functionally annotated microbial reads, since those measures vary within cohorts (Supplemental Fig. S1 ). The CAP patient samples were collected from different hospital sources, prior to the current pandemic, and represent pneumonia cases with various viruses detected in the sequencing data 5 (e.g. enterovirus, influenza virus, rhinovirus), possibly contributing to the greater variability between their microbiome functional profiles. The COVID-19 samples have more abundant EC features including (see bottom left feature cluster in Fig. 2 O-acyltransferase", 2.6.1.1 "aspartate transaminase", 3.4.11.2 "membrane alanyl aminopeptidase", 3.4.11.5 "prolyl aminopeptidase", 3.6.1.12 "dCTP diphosphatase", mapping to amino acid, carbohydrate, cofactor and vitamin, lipid, and nucleotide metabolism pathways. See also Supplemental Fig. S3 for a scatter plot of the average change per EC in COVID-19 compared to CAP and healthy cohorts. In order to systematically examine the detected features against functional pathways, all the EC features were considered against their corresponding pathways from the KEGG metabolic pathway mapping. 15 . Pathway scores (mean abundance change in COVID- 19) were computed using all the detected EC features per pathway (see Supplemental Fig. S4 and Methods). To contrast the observed scores against a background distribution, the sample labels were randomly permuted multiple times and the resulting pathway scores computed, see Fig. 3 . An empirical p-value per pathway was then computed using the obtained random background distribution. Sixteen pathways were detected as significantly lower (p < 0.01) and two pathways as significantly higher in COVID-19. Among the pathways that were lower in COVID-19, see Fig. 3 , several are related to glycan biosynthesis and metabolism (e.g. glycosaminoglycan degradation) and lipid metabolism (e.g. sphingolipid metabolism). A connection between decreased presence of host glycosaminoglycan heparan sulfate modifying bacteria and increased COVID-19 susceptibility has recently been suggested. 18 Sphingolipids are important components of biomembranes, mediating signal transduction and immune activation processes, and they have been shown to decrease in COVID-19 patient sera. 19 The differentially abundant functions and pathways can suggest novel host-microbiome interactions to be examined further. Analyzing COVID-19 respiratory tract metatranscriptomes from a functional perspective offers an additional view into their differences from healthy controls and from subjects with community-acquired pneumonia. The comparative approach taken here aims to mitigate possible experimental variation and resulting biases within individual samples, by focusing on detecting robust and consistent differences in the detected microbial functions. As a result, we identify both enzyme code features and metabolic pathways that differentiate COVID-19 respiratory tract microbiomes from healthy control and community-acquired pneumonia samples. The resulting functional profiles distinguish COVID-19 samples from others, similarly to the original taxonomy-based analysis of the community members. 5 By employing a randomization approach, we evaluated the significance of the detected differentially abundant pathways. Sixteen pathways showed significantly (p < 0.01) lower abundance in COVID-19 compared to the other cohorts, and two pathways higher abundance. Even with a more stringent threshold (p < 0.0001) four pathways appear down in COVID-19. Several decreased pathways were related to glycan and lipid metabolism, including sphingolipid metabolism; sphingolipids have been shown to decrease in COVID-19 patient sera. 19 Additionally, related to the decreased glycosaminoglycan degradation pathway, a link between decreased presence of host glycosaminoglycan heparan sulfate modifying bacteria and increased COVID-19 susceptibility has recently been suggested. 18 The differentially abundant functions and pathways can suggest novel host-microbiome interactions to be examined further. This computational study has limitations including small sample size and overall variation in the sampled microbiomes, possibly due to subject lifestyles, location, and clinical characteristics including different stages and severity of disease. Without related clinical data, connections between administered therapies and detected microbiome differences also remain elusive. The observations from this functional analysis call for further in-depth research on microbial functions and host-microbiome interactions, by connecting clinical data with microbiome sequencing. This protein domain focused amino acid matching approach supports the profiling of functions performed by known and potentially unknown organisms yet to be characterized. 20 Examining metatranscriptomic sequencing reads with the lens of functional annotation can yield additional insights into disease-related microbiome alterations. The recently published bronchoalveolar lavage fluid (BALF) metatranscriptomic sequencing data 5 with BBsplit 23 (ambiguous=random, ambiguous2=split). The paired reads were processed separately, individual reads that did not match the human, PhiX, or SARS-CoV-2 genomes (278k to 50.7M reads per sample) were retained for the microbial community functional annotation (see Supplemental Fig. S1 for the filtering results). The KEGG Enzyme Nomenclature (EC) reference hierarchy 15 was used as the functional annotation tree. The EC numbers define a four-level hierarchy. For example, 1.5.1.3. = "Dihydrofolate reductase" is a fourth (leaf) level code linked to top level code 1 = "Oxidoreductases", via 1.5. = "Acting on the CH-NH group of donors" and 1.5.1 = "With NAD+ or NADP+ as acceptor". A PRROMenade 13 database search index was constructed using the KEGG hierarchy and a total of 21.2M bacterial and 53k viral annotated protein domain sequences (of minimum length 5 AA), obtained on June 6, 2020 from the IBM Functional Genomics Platform 14 (previously known as OMXWare). An earlier release of the bacterial domain data has been discussed previously in conjunction with PRROMenade indexing. 13 Metatranscriptomic sequencing reads were annotated with PRROMenade by locating the maximal length exact match for each read, and processed as described previously. 13 Minimum match length cutoff of 11 AA (corresponding to 33 nt) was employed. Classified (non-root) read counts (6.8k to 11.5M per sample, see Supplemental Fig. S1 -S2) were post-processed to summarize the counts at the leaf level of the functional hierarchy. Leaf nodes contributing ≥ 0.05% of total annotated reads in at least one sample were retained, resulting in 633 leaf nodes to include as the features of the functional profiles. Multidimensional scaling (Matlab function cmdscale, p = 2) and permutational multivariate analysis of variance (f_permanova, iter = 100, 000, from the Fathom toolbox 24 for Matlab) were applied on pairwise Spearman's distances (Fig. 1B) . Subsequently, the profiles were processed with RoDEO 16 (P = 10, I = 100, R = 10 7 ) for robust comparability. The per-sample parameter P was determined according to the number of annotated reads as previously described (in Supplementary File 2 by Klaas et al. 25 ). A two-sample Kolmogorov-Smirnov test (kstest2 in Matlab) was applied to identify differentially abundant features between COVID-19 samples and CAP, healthy control samples. Features were ordered by p-value and top features selected for average linkage hierarchical clustering using the Euclidean distance (Fig. 2) . The KEGG 15 metabolic pathway maps were utilized to link functions with pathways, and the pathways were analyzed for changes between COVID-19 and CAP, HC. The pathways were evaluated for average abundance change as follows. Let a i be the mean RoDEO abundance of EC feature i for COVID-19 samples, b i for CAP samples, and c i for HC samples. The feature score is defined as f s i = ((a i − b i ) + (a i − c i ))/2, positive values indicating higher abundance in COVID-19. Pathway score ps j = mean{ f s EC j (1) , f s EC j (2) , ..., f s EC j (k) } was computed using the set of features, EC j , that map to the pathway j (considering only pathways with k ≥ 2). To evaluate significance of the results, the sample labels were randomly permuted 10,000 times and ps j of each pathway computed. The pathway scores across permutations were combined as the background distribution (Supplemental Fig. S4 ). An empirical p-value for pathway score ps j = x was computed as p(x) = (r + 1)/(n + 1), where n is the total number of pathway scores across all the randomizations, and r is the number of pathway scores with values < x (for low pathway scores) or > x (for high pathway scores). A pneumonia outbreak associated with a new coronavirus of probable bat origin The proximal origin of SARS-CoV-2 Depicting SARS-CoV-2 faecal viral activity in association with gut microbiota composition in patients with COVID-19 Gnotobiotic Rats Reveal That Gut Microbiota Regulates Colonic mRNA of Ace2, the Receptor for SARS-CoV-2 Infectivity Genomic Diversity of Severe Acute Respiratory Syndrome-Coronavirus 2 in Patients With Coronavirus Disease RNA based mNGS approach identifies a novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak A new coronavirus associated with human respiratory disease in China Respiratory viral infection-induced microbiome alterations and secondary bacterial pneumonia Lung inflammation and disease: A perspective on microbial homeostasis and metabolism Lung microbiota predict clinical outcomes in critically ill patients Application of probiotics for acute respiratory tract infections Proteomic and Metabolomic Characterization of COVID-19 Insular microbiogeography: Three pathogens as exemplars Fathom Toolbox for MATLAB: software for multivariate ecological and oceanographic data analysis Transcriptome characterization and differentially expressed genes under flooding and drought stress in the biomass grasses Phalaris arundinacea and Dactylis glomerata The PRROMenade methodology is associated with patent applications currently pending review at the USPTO.