key: cord-1004379-m0fjt483 authors: Peddu, Vikas; Shean, Ryan C; Xie, Hong; Shrestha, Lasata; Perchetti, Garrett A; Minot, Samuel S; Roychoudhury, Pavitra; Huang, Meei-Li; Nalla, Arun; Reddy, Shriya B; Phung, Quynh; Reinhardt, Adam; Jerome, Keith R; Greninger, Alexander L title: Metagenomic analysis reveals clinical SARS-CoV-2 infection and bacterial or viral superinfection and colonization date: 2020-05-07 journal: Clin Chem DOI: 10.1093/clinchem/hvaa106 sha: 6fca64fc4a6ac016e333ddf9fac87a1bf222917e doc_id: 1004379 cord_uid: m0fjt483 BACKGROUND: More than two months separated the initial description of SARS-CoV-2 and discovery of its widespread dissemination in the United States. Despite this lengthy interval, implementation of specific quantitative reverse transcription (qRT)-PCR-based SARS-CoV-2 tests in the US has been slow, and testing is still not widely available. Metagenomic sequencing offers the promise of unbiased detection of emerging pathogens, without requiring prior knowledge of the identity of the responsible agent or its genomic sequence. METHODS: To evaluate metagenomic approaches in the context of the current SARS-CoV-2 epidemic, laboratory-confirmed positive and negative samples from Seattle, Washington were evaluated by metagenomic sequencing, with comparison to a 2019 reference genomic database created before the emergence of SARS-CoV-2. RESULTS: Within 36 hours our results showed clear identification of a novel human Betacoronavirus, closely related to known Betacoronaviruses of bats, in laboratory-proven cases of SARS-CoV-2. A subset of samples also showed superinfection or colonization with human parainfluenza virus 3 or Moraxella species, highlighting the need to test directly for SARS-CoV-2 as opposed to ruling out an infection using a viral respiratory panel. Samples negative for SARS-CoV-2 by RT-PCR were also negative by metagenomic analysis, and positive for Rhinovirus A and C. Unlike targeted SARS-CoV-2 qRT-PCR testing, metagenomic analysis of these SARS-CoV-2 negative samples identified candidate etiological agents for the patients’ respiratory symptoms. CONCLUSION: Taken together, these results demonstrate the value of metagenomic analysis in the monitoring and response to this and future viral pandemics. On January 20, 2020, less than one month after the initial reports of a series of viral pneumonia cases in Wuhan, China, the first case of infection with the novel SARS-CoV-2 was confirmed in the United States (1) . Rapid person-to-person transmission has resulted in 614,482 total cases and 27,085 deaths within the United States as of April 15, 2020 (2) . Epidemiological analyses have shown increased mortality risk in elderly patients above 65 years age, especially with underlying comorbidities (3) (4) . Reported clinical complications that develop include sepsis in 59% of cases and acute respiratory distress syndrome in 17-29% of cases, often progressing to require mechanical ventilation (5) (6) (7) . For rapidly emerging infectious diseases, metagenomic next-generation sequencing (mNGS) offers an opportunity to both recover whole viral genomes for epidemiological purposes and to agnostically determine co-infections that may be associated with increased morbidity and mortality in emerging infectious diseases (8) . Here, we evaluated the performance of metagenomic sequencing on eight samples sent for SARS-CoV-2 diagnostic testing. mNGS was performed in under 36 hours from sample collection to analysis, and the results were confirmed using validated qRT-PCR based methods. Eight nasopharyngeal swabs in viral transport medium were sent to the University of Washington Clinical Virology laboratory for diagnostic or confirmatory testing. qRT-PCR was performed using a modified protocol of the World Health Eight unique patient samples consisting of six positive and two negative cases of suspected SARS-CoV-2 were sequenced using RNA extracted for a qRT-PCR diagnostic assay. In parallel we created mNGS sequencing libraries using a previously published protocol using ds-cDNA synthesis, followed by Nextera XT tagmentation and 20 cycles of PCR amplification (10) . These libraries were sequenced on an Illumina MiSeq using a 1x185 run with the MiSeq Reagent Kit v3 (150-cycle). Reads per million (RPM) calculations and inter-sample comparisons were performed using the RPM_summary.r script (11). Output from the pipeline was visualized using the Pavian metagenomics data explorer and interpreted by a bioinformatician as well as two board-certified pathologists, who were blinded to clinical information on the samples prior to interpretation ( Table 1) . Reads that neither aligned to HG38 nor the NT database were re-trimmed using Trimmomatic (13) . Mitochondrial sequences were depleted prior to assembly reads by alignment to the human mitochondrial genome (MN540528.1) using Bowtie2 with default options (15) . Aligned reads were removed using Samtools (17) , and then converted back to FASTQ format using Samtools fastq. BBTools bbfakereads.sh was used to split the single end sequence into a pseudo-paired end sequence for assembly with metaSPAdes (14, 18) . All available SARS-CoV-2 sequences from the Global Initiative on Sharing All Influenza Data were downloaded on 3/18/2020, consisting of 806 unique samples. Sequences with more than 5% N content were manually removed. Genome alignment was done using MAFFT with the default settings. Phylogenetic trees were built using RAxML using the GTRCATI model with 1000 bootstrap replicates. Any taxa with an RPM < 10 were filtered out in order to exclude misclassified reads, possible water contaminants, and nasal flora. Independent blinded analysis by both a bioinformatician and board-certified pathologist arrived at concordant interpretation of the results described. Despite our reference database not containing any SARS-CoV-2 genomes, the six samples that were positive for SARS-CoV-2 by qRT-PCR had reads classified to Table 2) . We were able to similarly detect and assemble SARS-CoV-2 with a C T of 29.5. As expected, our approach scaled with C T (R 2 = 0.80) (Table 2, Figure 2 ). Sample WA6-UW3 showed substantial evidence of HPIV3 infection with an RPM of 4002 consisting of 4,027 unique reads. Reads from this sample aligning to HPIV3 were also successfully de novo assembled using the Geneious 9.1.8 assembler (19) . From this assembly we were able to reproduce the full HIPV3 genome with a mean depth of coverage of 66.4. The two SARS-CoV-2 negative samples, SC5683 and SC5698, contained reads classifying to rhinovirus species A and C respectively. SC5683 contained reads classifying to both rhinovirus A71 (RPM = 1,592), as well as human rhinovirus spp. (RPM = 19,061). SC5698 contained reads classifying only to rhinovirus C3 (RPM = 454) ( Figure 3 ). Common skin flora Cutibacterium acnes were present to some degree in nearly Out of the eight total samples, the six with SARS-CoV-2 detected by metagenomic sequencing had C T s of below 30 for both the E and RdRp genes by qRT-PCR for SARS-CoV-2. In contrast, the two samples negative for SARS-CoV-2 by metagenomic sequencing, SC5683 and SC5698, had no amplification of either gene ( Table 2 ). The EXO internal control was successfully amplified in all tested samples. Phylogenetic analysis revealed that the six SARS-CoV-2 sequences found cluster within two clades representing the Washington state and European outbreaks. WA3-UW1, a traveler from Korea to Washington state, was the only sequence to cluster in the European clade. All genomes were over 99.5% identical by nucleotide relative to the reference strain (NC_045512.1). WA3-UW1 contained 3 amino acid mutations in the Using mNGS, we were able to successfully detect SARS-CoV-2 in six out of six positive samples, which were also confirmed by qRT-PCR. In addition, we were able to recover nearly full SARS-CoV-2 genomes from taxonomically unassigned reads (20) . The total time required for this testing was approximately 36 hours from receiving the sample to taxonomical assessment. Such rapid turnaround could prove invaluable in the future when presented with an unknown infectious agent. The six SARS-CoV-2 sequences we present here represent two distinct clades from the pandemic: One European and one from Washington state. Sample WA3-UW1, the only sample to cluster within the European clade, was derived from a traveler from South Korea. This sample diverged early within the clade and seems to be the terminal isolate within the United States. All other samples clustered with others from Washington state and are representative of the larger Washington State outbreak. A consequence of our reference database having been built from the 2019 Genbank NT database is that it does not contain any SARS-CoV-2 sequences. Despite this, reads with sequence homology to SARS-CoV were able to classify SARS-CoV-2 reads to the taxa "Severe acute respiratory syndrome-related coronavirus" (National Center for Biotechnology Information Taxid 694009) . This was confirmed with assembly of unassigned reads, from which we were able to retrieve nearly the whole SARS-CoV-2 genome for all positive samples. We also demonstrate that with as few as 5 million reads we can de-novo assemble a full SARS-CoV-2 genome from a sample with a C T as high as 29.5 ( Figure 2 ). Of note, the number of SARS-CoV-2 reads is driven not only by viral load, but also the number of reads going to bacterial or human sequence (online Supplemental Table 1 ). Additionally, reference genome length is not taken into account in this implementation of our pipeline. As a result, a bias is introduced as RPM will scale with the abundance of the organism, as well as increase linearly with the length of its genome. In addition to being positive for SARS-CoV-2, samples WA6-UW3, WA8-UW5, and WA9-UW6 also showed positive metagenomic results for M. catarrhalis, a gramnegative diplococcus that colonizes the nares of up to 75% of children, but only 0-4% of adults (21) (22) . The RPM of M. catarrhalis from WA8-UW5 and WA9-UW6 were 100x higher than that of WA6-UW3. These results further illustrate the ability of mNGS to detect bacterial infections and/or colonizations on a patient-by-patient basis. First Case of 2019 Novel Coronavirus in the United States An interactive web-based dashboard to track COVID-19 in real time Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention CoV-2: virus dynamics and host response Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a singlecentered, retrospective, observational study Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Clinical features of patients infected with 2019 novel coronavirus in Wuhan Metagenomic analysis identified co-infection with human rhinovirus C and bocavirus 1 in an adult suffering from severe pneumonia Comparison of Real-Time PCR Assays with Fluorescent-Antibody Assays for Diagnosis of Respiratory Virus Infections in Children Rapid Metagenomic Next-Generation Sequencing during an Investigation of Hospital-Acquired Human Parainfluenza Virus 3 Infections Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification Trimmomatic: a flexible trimmer for Illumina sequence data Fast gapped-read alignment with Bowtie 2 Faster and More Accurate Sequence Alignment with SNAP The Sequence Alignment/Map format and SAMtools metaSPAdes: a new versatile metagenomic assembler Geneious | Bioinformatics Software for Sequence Data Analysis A Metagenomic Analysis of Pandemic Influenza A (2009 H1N1) Infection in Patients from North America Moraxella catarrhalis: from Emerging to Established Pathogen Co-infection with SARS-CoV-2 and Influenza A Virus in Patient with Pneumonia, China. Emerg Infect Dis The authors declare they have no competing interests.