key: cord-1002569-0yf4fqo7 authors: Chan, A. P.; Siddique, A.; Desplat, Y.; Choi, Y.; Ranganathan, S.; Choudhary, K. S.; Diaz, J.; Bezney, J.; DeAscanis, D.; George, Z.; Wong, S.; Selleck, W.; Bowers, J.; Zismann, V.; Reining, L.; Highlander, S.; Hakak, Y.; Brown, K.; Armstrong, J.; Schork, N. J. title: A Universal Day Zero Infectious Disease Testing Strategy Leveraging CRISPR-based Sample Depletion and Metagenomic Sequencing date: 2022-05-16 journal: nan DOI: 10.1101/2022.05.12.22274799 sha: dc2e6ffa689769fdbf24a9eedc4961d1796584e6 doc_id: 1002569 cord_uid: 0yf4fqo7 The lack of preparedness for detecting the highly infectious SARS-CoV-2 pathogen, the pathogen responsible for the COVID-19 disease, has caused enormous harm to public health and the economy. It took ~60 days for the first reverse transcription quantitative polymerase chain reaction (RT-qPCR) tests for SARS-CoV-2 infection developed by the United States Centers for Disease Control (CDC) to be made publicly available. It then took >270 days to deploy 800,000 of these tests at a time when the estimated actual testing needs required over 6 million tests per day. Testing was therefore limited to individuals with symptoms or in close contact with confirmed positive cases. Testing strategies deployed on a population scale at "Day Zero" i.e., at the time of the first reported case, would be of significant value. Next Generation Sequencing (NGS) has such Day Zero capabilities with the potential for broad and large-scale testing. However, it has limited detection sensitivity for low copy numbers of pathogens which may be present. Here we demonstrate that by using CRISPR-Cas9 to remove abundant sequences that do not contribute to pathogen detection, NGS detection sensitivity of COVID-19 is comparable to RT-qPCR. In addition, we show that this assay can be used for variant strain typing, co-infection detection, and individual human host response assessment, all in a single workflow using existing open-source analysis pipelines. This NGS workflow is pathogen agnostic, and therefore has the potential to transform how both large-scale pandemic response and focused clinical infectious disease testing are pursued in the future. The current COVID-19 pandemic has exposed the threat of infectious diseases to human health and safety and the essential role that pandemic preparedness can play in combatting such threats(1). The deaths directly attributed to COVID-19 exceed 6.1M globally with over 927,000 deaths in the United States (US), making COVID-19 the third most lethal viral pandemic in the past century, behind the Spanish Flu of 1918 and HIV (2) . National and international pandemic preparedness plans are essential, as COVID-19 is likely not the last pandemic in our future. As an indication of why pandemic preparedness is so important, consider that the first confirmed case of SARS-CoV2, the cause of the COVID-19 pandemic, occurred around December 1, 2019 (3) . From that time, termed 'Day Zero,' it took ~42 days before the SARS-CoV-2 viral genome sequence was publicly released (4). One month later, the CDC produced 90 initial testing kits, which unfortunately had documented contamination issues (5) . Over the course of the next 6-8 months, the US was able to produce approximately 800,000 tests per day, when the estimated . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint need at the time was an ability to test six million individuals per day (6) . Because of limited capacity, testing was recommended only to those individuals with symptoms or in close contact with confirmed positive COVID-19 cases, leaving asymptomatic carriers as a major source of transmission. By July 2020, over 350,000 US residents had confirmed infections with over 4,500 confirmed deaths (2) . With these facts in mind, the US government has proposed a 10-year set of activities and $65B in funding for pandemic preparedness balanced with a belief of a strong return on investment given the ~$16-Trillion economic impact in the US of the COVID-19 pandemic over the past 24 months (1). With a growing global population, increased access to global travel, encroachment on previously less populated locations, the increasing number of labs researching infectious disease, and the potential for nefarious intent to weaponize biological pathogens, this modest investment seems more than justified. What is necessary to enable the proposed vision of pandemic preparedness are testing and response strategies that are deployable at Day Zero to combat any future pathogen outbreak before it progresses to a pandemic. Such an approach, by necessity, needs to be pathogen agnostic and, ideally, would provide more detailed information about an individual's reaction than the mere presence of a pathogen. NGS can fulfill those requirements. Shotgun NGS generates sequence information from every molecule in a sample (e.g., nasal swab, saliva, blood), and has shown to be applicable for human infectious disease diagnostic tests (7) (8) (9) , respiratory infections (10) , and universal pathogen detection (11) (12) (13) (14) (15) . Prior to the pandemic, an NGS-based study involving a population of ~1,000 pangolins from a wet market in Wuhan China, identified coronavirus infections in over 70% of the pangolin samples tested (16) . The authors sequenced total RNA from pangolin samples and achieved similar genome characterizations using de novo assembly and reference-based methods, ultimately showing that new viruses can be identified by . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint such an approach. Advances in sequencing technology, reductions in sequencing cost and increases in instrument throughput, the advent of portable instruments, and internet connectivity and cloud-based data analysis means that NGS technology has reached a point where it is capable of the speed and advanced data processing necessary for Day Zero deployment and of handling population-scale throughput in response to a pandemic. However, there are caveats to the use and deployment of NGS as part of a pandemic preparedness strategy. First, NGS has not been applied for routine clinical use for infectious disease diagnosis. This is because many sequencers have been housed for the most part in academic research institutions and private research organizations that lack the logistical infrastructure necessary for sample attainment, upstream processing, return of results and biosecurity level clearance to handle samples with active virus. However, with the acceptance of 'at home' sample collection and telehealth consultation via the internet, spurred by the need to keep individuals distanced from others, particularly from overburdened hospitals, the logistical infrastructure is far better than it has been before. Second, there has been a reluctance to implement NGS in hospital settings because of the amount of data generated and the computational infrastructure required to run sequence data processing pipelines. However, complex computational analysis is only needed until the pathogen of interest is identified and a reference genome of that pathogen is assembled, since mapping reads to a reference for identification purposes is relatively straightforward. Third, traditional NGS-based metagenomic tests for COVID-19 are inefficient because of relatively large sample input requirements and lower sensitivity for the virus due to overabundant, uninformative nucleic acid molecules from the human host and/or common commensal microbes. Here we present a molecular enrichment strategy to overcome these limitations in the use of NGS protocols, by using a Clustered Regularly Interspersed Short Palindromic Repeats (CRISPR-. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint Cas9) system to specifically target and remove abundant host and common human microbial sequences (17) (18) (19) . This CRISPR-NGS technology has been patented and commercialize under the CRISPRclean ® name by Jumpcode Genomics, Inc. The technology uses a multiplexed pool of single guide RNAs (sgRNA) in a ribo-nucleoprotein complex (RNP) formed with the Cas9 enzyme. Double stranded cuts are generated in undesired NGS molecules, which is then followed by adapter specific PCR to enrich for un-cut NGS library molecules for enhanced sequencing. To evaluate the performance of the technology, we performed two critical assessment steps of COVID-19 clinical specimens (i.e., nasal swabs): (1) we defined the human and bacterial ribosomal RNA (rRNA) compositions to set baselines for the assessment of rRNA depletion efficiency, and (2) we established the pathogen-microbiome compositions, using orthogonal bioinformatics protocols to assess taxonomic classification confidence. We show that SARS-CoV-2 detection sensitivity using this method is comparable to RT-qPCR based detection for samples with Ct values up to 35. We also demonstrate that the CRISPR-NGS strategy enables variant strain typing, detection of co-infecting agents, identification of antimicrobial resistance genes (AMR), and the reporting of human host responses to infection. Furthermore, using contrived samples containing viral nucleic acid, we show that the CRISPR-NGS approach can successfully detect other pathogens (e.g., Zika virus). A long-term vision is that the application of CRISPR-NGS technology accelerates patient diagnosis and preventive strategies of existing and future pathogens. Two types of samples were analyzed in this study: clinical specimens and contrived samples. For clinical specimens, human nasal swabs with COVID-19 infection status determined by RT-qPCR . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) and SARS-CoV-2) was spiked at various concentrations into human lung total RNA samples. The samples contained a 10-fold dilution series of the viral reference mix, with titers ranging from an estimated 20 copies of each viral RNA genome to 20,000 copies. Various negative controls were also prepared, including human lung total RNA only (a no-viral RNA control) and a water-only sample (no-template control). The latter was included to monitor background contaminants that may originate from molecular reagents and NGS workflows. An overview of the workflow is shown in Figure 1 . Both mock depletions (depletion without Cas9 and guide RNA) and Cas9 depletions were performed with site A specimens. Mock-depletion was not performed with site B specimens. A summary of sequencing statistics is provided in Supporting Table 1 . A key aspect of our NGS-based strategy is the use of CRISPR-Cas9 to remove ribosomal RNA sequences. To determine the effectiveness of rRNA removal, we set out to first establish the rRNA composition of the clinical nasal swab specimens by classifying sequence reads as bacterial, . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint viral, or eukaryotic, and rRNA or non-rRNA (presumably mRNA), using Kraken2. For site A clinical specimens, the average content of human and bacterial rRNA in non-depleted NGS libraries was 63% and 0.85%, respectively, of total reads. After CRISPR-based rRNA depletion, human rRNA was detected at 0.74% and bacterial rRNA at 0.02% of total reads (Figure 2 , Supporting Table 2 ). This indicates that 98% of bacterial rRNA and 99% of human rRNA were successfully depleted as a result of CRISPR-based depletion. Because of depletion, human non-rRNA, bacterial non-rRNA, and viral sequences were enriched by an average of 3.6-fold, 3-fold, and 6-fold, respectively, across all CRISPR-depleted samples. Each of these enriched sequence categories is important because they contribute to the downstream genomic characterization of SARS-CoV-2, co-infection pathogens, and host gene expression as discussed below. However, nonribosomal human RNA molecules that have little to do with the patient response to infection may be candidates for further depletion. Interestingly, the post-depletion human and bacterial rRNA profiles of site B samples were distinct from those of site A. The rRNA content of the non-depleted libraries from site B could not be determined because mock-depletions were not performed at site B. However, the library composition of human rRNA, bacterial rRNA, and virus after depletion was 4.8%, 16%, and 11%, respectively (Supporting Table 3 ). The reason for the higher proportion of bacterial rRNA after depletion in site B libraries relative to site A libraries is unclear but could be related to sample type or different methods of specimen collection, processing, and RNA extraction. Further investigation is warranted, for example, to determine whether the non-depleted bacterial rRNA sequences fall within or outside of the CRISPR guide RNA set of sequences used for rRNA depletion. In summary, the CRISPR-based rRNA depletion method presented here can achieve near-complete removal of human and bacterial rRNAs from clinical nasal swab specimens. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint Next, we determined the pathogen-microbiome composition of the COVID-19 positive clinical specimens by focusing on the microbiome sequence read space of each specimen. Two independent and orthogonal bioinformatic approaches, the alignment/assembly-based Chan Zuckerberg ID (CZID) workflow (formerly known as IDSeq) (20) and the kmer-based Kraken2-Bracken workflow (21) (22) (23) , were used for taxonomic classifications. For this analysis, all sequence reads generated for each specimen were analyzed without subsampling. The concordance of taxonomic assignments by the two different bioinformatics methods were also determined to identify taxa with confidence. Analysis of the nasal swab clinical specimens using CZID showed that SARS-CoV-2 sequences were detected on average at 88,831 RPM and 192,582 RPM from site A and B, respectively (i.e., 8 .8% and 19% of the nasal microbiome; Supporting Table 4 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In summary, we have detected multiple microbial species, including viral pathogens and commensal bacteria, from the clinical nasal swabs analyzed in this study, with classification supported by two orthogonal methods. By generating contrived samples with mock communities of viral pathogens, we were able to detect additional viral pathogens of interest. We performed an assessment of the CRISPR-NGS approach by comparing diagnostic results of SARS-CoV-2 against RT-qPCR-derived Ct data and clade assignments for the virus. We also aimed to determine the potential of pathogen-host genetic information that can be extracted for reporting and downstream analysis, such as data regarding co-infections, the nasal microbiome profile, antimicrobial resistance (AMR), and host gene expression at the collection site (i.e., the . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Table 5 . Sensitivity and specificity of the CRISPR-NGS assay with respect to SARS-CoV-2 detection were measured using two genome coverage metrics (number of uniquely aligned reads and genome breadth). Thresholds for detection were determined empirically and defined as follows: genome breath coverage >=3%, and number of uniquely aligned reads >=20 per 40M read pairs sequenced. For site A specimens with Ct values up to 35 (93 libraries), the sensitivity and specificity of SARS-CoV-2 detection were 98.4% and 100%, respectively (Table 1) . For site B specimens with Ct values up to 32 (24 libraries), sensitivity of SARS-CoV-2 detection was 95% and specificity was 100%. To evaluate clade assignment accuracy, we compared our consensus genome-based NextClade approach against clade identification via an independent PCR-based method (Variant-seq, PerkinElmer) performed for site A specimens. We applied a 10% genome breadth cut-off as the minimum requirement for accepting clade assignments from Nextclade. Using this criterion, the clade assignment accuracy is 100% for all samples that met these criteria. (i.e., 56 out of 56 libraries). . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) We also investigated whether the proposed sequencing strategy could provide information on SARS-CoV-2 functional variants, AMR sequences, and host transcriptome responses. With respect to SARS-CoV-2, we identified the spike protein L452R mutation in one specimen. This is a mutation in the receptor-binding domain of the spike protein and is implicated in antibody resistance and immune escape (24, 25) . We also identified AMR gene sequences amongst the assembled sequence contigs generated through the CZID workflow. Using AMRFinderPlus (NCBI), AMR genes were detected in a subset of the clinical specimens (n=15) from site A and B (Table 2 ). To identify differentially expressed (DE) host genes, we compared confirmed COVID-19 negative specimens to COVID-19 positive specimens with moderate to high viral loads (i.e., Ct value < 21) from site A. We identified a total of 77 up-regulated genes and 5 down-regulated genes in COVID-19 positive specimens (abs(log2FC) >= 1.5, and adj P < 0.05) (Supporting Table 6 ). Of the 82 DE genes, 19 genes overlapped with a previously identified blood-derived, interferon-stimulated gene (ISG) signature of SARS-CoV-2 infection consisting of 23 genes (26). Two interferon-inducible genes (IFI6, IFI27) from the 82 DEG list also overlapped with both the ISG signature and a blood-derived COVID-19 infection signature comprising 139 genes (26) . Novel technologies and strategies are sorely needed to combat infectious disease outbreaks. We believe strongly that NGS-based strategies not only have Day Zero deployment capabilities but are essential for a robust public health response at Day Zero and beyond because of the comprehensive nature of the data generated by such approaches. The issue with a NGS approach focused on RNA content is that human host derived RNA molecules, including ribosomal RNA, dominates the sequencer output and must be removed prior to sequencing. Our strategy is based on the in vitro application of CRISPR technology, which, because of the programmability of the CRISPR-Cas9 system, can be used to remove any known abundant and . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint uninformative molecules from NGS libraries. The workflow involved is fast, robust, and compatible with low biomass inputs. The programmability of the CRISPR system also means that new CRISPR targets can be easily added to remove different or additional molecules and increase sensitivity. This could have considerable benefit to clinical assay development because new targets can be added with minimal alteration to the assay. In the context of infectious disease, removal of additional known human host and common bacterial sequences or contaminants should continue to improve performance up until molecular diversity is exhausted or the impact on sensitivity becomes minimal. Although detection of the SARS-CoV-2 genome can be accomplished with legacy technology such as RT-qPCR, it is our assertion that generating many independent sequencing reads across the breadth of a pathogen genome provides more confident detection than a 100-bp amplified fragment using, for example, a fluorescent signal as output. Actual sequence information to characterize strains, clinically relevant sequence variants, co-infections, or host responses is unobtainable with focused RT-qPCR testing alone. Furthermore, although both RT-qPCR and amplicon-based targeted sequencing technologies are essential tools for detecting and tracking pathogens as they evolve, neither can meet the Day Zero requirement for a novel zoonotic pathogen. As emphasized throughout, the mNGS strategy we have outlined in this study can be used to rapidly identify novel pathogen strains, co-infections, and host response. Several of the non-SARS-CoV-2 viruses we identified in our samples are associated with respiratory illness. One report suggests that rhinovirus will block or inhibit SARS-CoV-2 replication in lung epithelial cells by triggering an interferon response (27) . This information could be critical to predict outcome or severity of disease. Mycobacterium phage was also highly prevalent in COVID-19 positive samples. Several reports suggest that severity of infection with SARS-CoV-2 could be influenced . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint by phage therapy or stimulate trained immunity from previous mycobacterium infections and the interactions with mycobacterium phage influenced tuberculosis (28, 29) . In a recently published study on co-infections and superinfections based on a literature review of papers published between October 2019 and February 2021, COVID-19 positive patients with co-infections (viral, bacterial, or fungal) had higher odds of mortality than those with COVID-19 alone (30) . Many of the identified co-infections had effective treatments available. There is still a need to validate our proposed CRISPR-NGS strategy across multiple laboratories, in multiple geographical regions, and in urban, rural, and remote settings. In this study, using samples collected and processed across multiple sites, human and bacterial RNA profiles differed between sites. These differences could be related to different methods of specimen collection, processing, and RNA extraction. To mitigate these effects, standardized methods and protocols need to be employed across labs and regions to generate comparable data. Also, if it is to be considered as deployable at scale, relevant databases harboring pathogen sequences and analysis pipelines will need to be standardized and validated. Ultimately, however, the ability of the CRISPR-NGS approach to provide a more comprehensive exploration of an individual's infection status provides benefits that no other single approach can. We see at least six crucially important capabilities enabled as a result of this approach. First, mNGS can generate whole genome information from an RNA virus and whole transcriptome information from an active DNA viral, bacterial, or fungal infection which provides advantages over a metagenomics approach. Whole genome information enables one to track the origin and the spread of any outbreak, regardless of the sequence similarity (or lack thereof) of the target pathogen to other pathogens. Second, it can be used to detect co-infections that have the potential to increase morbidity or mortality. This includes other viral pathogens, as well as bacterial and fungal pathogens. Third, the host immune profile could be used to predict a patient's ability to overcome infection (tolerance) or the risk of lethality from infection. Fourth, it could enable more rapid and effective . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint identification of targets for therapeutic development. Fifth, it leverages existing and established technology with proven performance and available high throughput processing protocols. Sixth, since it has Day Zero capabilities, it will have utility immediately after the first reported case of infection and potentially mitigate the effects of the next pandemic without the need to scramble national public and private resources to meet the demands of a panicked population. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05. 12.22274799 doi: medRxiv preprint in Arizona by TGen's infectious disease testing facility in Flagstaff (31, 32) . The remaining unused specimens were used for this study. The Ct values of the samples were between 14.17 and 32.02. Total RNA extraction from the nasal swab specimen was performed using the Quick-DNA/RNA MagBead kit (Zymo Research). A summary of the site A and B specimens is provided in Figure 1 and Supporting Table 1 . CRISPR-based depletion of abundant nucleic acids was performed using CRISPRclean Plus Kit (Jumpcode Genomics, Inc., San Diego) following the manufacturer's protocols. The human rRNA CRISPR guide RNA set was designed to deplete the human mitochondrial 12S and 16S genes and human nuclear 5S, 5.8S, 18S and 28S rRNA genes, as well as the 45S precursor rRNA transcripts. The accompanying pan-bacterial rRNA CRISPR guide RNA was designed to the 5S, 16S and 23S rRNA sequences of 212 bacterial species. For site A specimens, 10ng of each RNA sample was used as input in the PerkinElmer NEXTFLEX Rapid Directional RNA-Seq v2.0 library prep. Key steps in the library prep include first strand synthesis using random priming, second strand synthesis with uracil incorporation, fragment end-repair, adapter ligation and PCR. Prior to PCR amplification, the library is treated with Cas9 pre-complexed with guide RNA targeted to bacterial rRNA for 1 hour at 37 o C followed immediately by a similar treatment with Cas9 and guide RNA targeted to human rRNA. The treatments result in the cleavage of library fragments containing rRNA sequences. A subsequent Ampure XP bead-based size selection step removes cleaved fragments and excess adapter sequences. This is followed by the standard library prep PCR to amplify the remaining (uncleaved) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint library. Due to material constraints, only two CRISPR-treated libraries and one mock-treated library were generated from each sample. A total of 180 libraries were produced from 60 samples. Libraries were combined into pools and loaded on 4 lanes of an Illumina NovaSeq S4 flow cell. Sequencing was performed with 2 x 150 cycles. As for site B clinical specimens, due to limitations on RNA availability 1ng was used for library construction following the same procedure as described for site A specimens using NEXTFLEX . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint Illumina reads generated from site A and B specimens and samples were analyzed using a unified workflow described as follows. Microbiome taxonomy abundance. The 150bp paired reads were demultiplexed according to sample barcodes. The Illumina sequencing adapters were removed, and low-quality bases were trimmed using AdapterRemoval (v2.3.1). After the trimming, any reads shorter than 75bp were discarded along with their mate reads. Prior to running Kraken2-Bracken for microbial taxonomy classification, a human genome reference was built by combining GRCh38 with alternate contigs, CHM13 T2T genome (GCA_009914755.3), and the non-reference unique insertions (NUIs) identified in Wong et al (33) . All trimmed reads were mapped to the human genome reference to pre-filter host reads with 95% sequence identity and 50% read length coverage as the mapping criteria. All remaining reads after host filtering were assigned taxonomy using Kraken2 (v2.1.1) with PlusPF database (release date: 1/27/2021). The domain-level and species-level taxonomic abundance values were estimated using Bracken (v2.6.0 (22)) based on the read counts from Kraken2 (21) . For calculating the relative abundances among the microbial species (i.e., the microbiome space) via Kraken2-Bracken, the reads assigned to "human" were excluded from the denominator. For rRNA content estimation using Kraken2, a Kraken database (containing rRNA sequences from prokaryotes and eukaryotes) was built from the rRNA sequences collected from NCBI Nucleotide database using the following query: "biomol_rrna[PROP]" (as of March 17, 2021) . For CZID-based taxonomy classification, the raw reads were uploaded to the CZID public server (pipeline v6.8), which includes its own read quality control steps. The CZID workflow performs read mapping to the NCBI non-redundant protein and nucleotide sequence databases NR and NT, respectively (the NT read mapping results were used in this analysis) and read . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.12.22274799 doi: medRxiv preprint assembly to build assembled contigs. Both sets of information were used to assign taxonomy to the input sequence read (20) . The identification of microbial taxa that are likely contaminants (e.g., molecular reagents) was guided by the water blank control included in this study. A few environmental species such as Achromobacter sp. and others were considered as contaminants. Genome breadth and depth coverage. Genome sequences were collected from NCBI Genbank for the top 20 species from each of the two sites based on the Kraken2-Bracken approach and the expected ATCC viral pathogen references from the contrived samples. The 40M subsampled read pairs were mapped to the combined genome sequences using BWA-MEM (v0.7.17). Only the reads with high mapping quality were kept for the downstream analysis (i.e., 95% sequence identity and 80% read length coverage). For each species, the number of mapped reads and the number of total bases mapped were collected using Bedtools (v1.9) "multicov" and Samtools (v1.9) "depth" commands, respectively, with optional parameters "-d 0 -aa" being used for Samtools "depth" command to accurately report the depths in deeply covered regions. The genome breadth coverage was calculated by the number of genome positions covered by at least one read by the total genome size. The genome depth coverage was calculated by averaging read depths across the genome. For the samples with 10% or higher SARS-CoV-2 genome breadth coverage, the nucleotide variants were identified, and the genome sequence was built using Bcftools (v1.9) based on the identified variants having N's for the regions not covered by reads. The reconstructed genome sequence was used to identify SARS-CoV-2 clades using Nextclade (v1.9.0) with the downloaded dataset (tag: 2022-01-05T19:54:31Z). AMR gene identification. The assembled contigs from the CZID workflow with 40M subsampled read pairs were retrieved and searched against AMR genes using NCBI AMRFinderPlus . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022 Host transcriptome response. The differential gene expression and ontology enrichment analysis between COVID-19 positive (Ct value below 21) and confirmed negative samples from site A was done using DEGenR, an interactive Shiny app that provides integrated tools for performing differential gene expression, ranked-based ontological gene set and pathway enrichment analysis (34) . Within DEGenR, the raw read counts were imported, filtered, normalized using edgeR Rpackage to filter out any low-expressed genes. This was followed by differential gene expression analysis using the Empirical Bayes method (eBayes) (35) . Functional enrichment analysis of human host response genes. To identify significant biological processes associated with the COVID-19 positive samples, we use Gene Ontology (GO) databases to assess the coherence of Differentially Expressed Genes (DEGs) (36) . The Enrichr R package (37) was used to rank enriched terms among DEGs using different databases and resources, including GO biological processes. We used the Enrichr overrepresentation analysis (ORA) test incorporated (37) within DEGenR to associate biological functions to DEGs. AMR genes (Gene Symbol and Sequence Name) were identified in the assembled contigs from CZID output. The antibiotic class of each gene was determined and the number of specimens exhibiting the gene was calculated. The average percent sequence coverage and identity is shown in columns 5 and 6. American Pandemic Preparedness: Transforming Our Capabilities COVID-19) Dashboard A pneumonia outbreak associated with a new coronavirus of probable bat origin Chinese laboratory that first shared coronavirus genome with world ordered to close for 'rectification', hindering its Covid-19 research Summary of the Findings of the Immediate Office of the General Counsel's Investigation Regarding CDC's Production of COVID-19 Test Kits S. sets record with over one million coronavirus tests in a day Evaluation of Unbiased Next-Generation Sequencing of RNA (RNAseq) as a Diagnostic Method in Influenza Virus-Positive Respiratory Samples Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection Clinical Metagenomic Sequencing for Diagnosis of Meningitis and Encephalitis Metatranscriptomics to characterize respiratory virome, microbiome, and host response directly from clinical samples Viral genomic, metagenomic and human transcriptomic characterization and prediction of the clinical forms of COVID-19 A metagenomics workflow for SARS-CoV-2 identification, co-pathogen detection, and overall diversity Metagenomic Analysis Reveals Clinical SARS-CoV-2 Infection and Bacterial or Viral Superinfection and Colonization Metagenomic Sequencing To Detect Respiratory Viruses in Persons under Investigation for COVID-19 Coronavirus discovery by metagenomic sequencing: a tool for pandemic preparedness Viral Metagenomics Revealed Sendai Virus and Coronavirus Infection of Malayan Pangolins (Manis javanica) Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications Acute West Nile Virus Meningoencephalitis Diagnosed Via Metagenomic Deep Sequencing of Cerebrospinal Fluid in a Renal Transplant Patient Integrating central nervous system metagenomics and host response for diagnosis of tuberculosis meningitis and its mimics IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring Improved metagenomic analysis with Kraken 2 Bracken: estimating species abundance in metagenomics data Efficient and Quality-Optimized Metagenomic Pipeline Designed for Taxonomic Classification in Routine Microbiological Clinical Tests Predominance of antibody-resistant SARS-CoV-2 variants in vaccine breakthrough cases from the San Francisco Bay Area Acquisition of the L452R Mutation in the ACE2-Binding Interface of Spike Protein Triggers Recent Massive Expansion of SARS-CoV-2 Variants Dysregulated transcriptional responses to SARS-CoV-2 in the periphery Human Rhinovirus Infection Blocks Severe Acute Respiratory Syndrome Coronavirus 2 Replication Within the Respiratory Epithelium: Implications for COVID-19 Epidemiology Phages in the fight against COVID-19? Association of the past epidemic of Mycobacterium tuberculosis with mortality and incidence of COVID-19 Transcriptomics-based drug repositioning pipeline identifies therapeutic candidates for COVID-19 Methods for sequencing the pandemic: benefits of rapid or highthroughput processing Normalization of SARS-CoV-2 viral load via RT-qPCR provides higher-resolution data for comparison across time and between patients De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations DEGenR: An R Shiny app for differential gene expression and enrichment analysis limma powers differential expression analyses for RNA-sequencing and microarray studies From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasilikelihood pipeline Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool The authors would like to thank Michelle Fraser, Kerry Gunning and Amanda Cramer for providing clinical samples and Variant-Seq information for the study and Terese Hammond and Scott Layne for commenting on the approach discussed. We also want to acknowledge funding from the National Science Foundation awarded to NJS: FAIN number 2031819.