key: cord-0757162-yfh90cld authors: Rothman, Jason A.; Loveless, Theresa B.; Kapcia, Joseph; Adams, Eric D.; Steele, Joshua A.; Zimmer-Faust, Amity G.; Langlois, Kylie; Wanless, David; Griffith, Madison; Mao, Lucy; Chokry, Jeffrey; Griffith, John F.; Whiteson, Katrine L. title: RNA Viromics of Southern California Wastewater and Detection of SARS-CoV-2 Single-Nucleotide Variants date: 2021-11-10 journal: Applied and environmental microbiology DOI: 10.1128/aem.01448-21 sha: 36781b1eae8425aadaf9e3330cc0b2fb8fc9f6be doc_id: 757162 cord_uid: yfh90cld Municipal wastewater provides an integrated sample of a diversity of human-associated microbes across a sewershed, including viruses. Wastewater-based epidemiology (WBE) is a promising strategy to detect pathogens and may serve as an early warning system for disease outbreaks. Notably, WBE has garnered substantial interest during the coronavirus disease 2019 (COVID-19) pandemic to track disease burden through analyses of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA. Throughout the COVID-19 outbreak, tracking SARS-CoV-2 in wastewater has been an important tool for understanding the spread of the virus. Unlike traditional sequencing of SARS-CoV-2 isolated from clinical samples, which adds testing burden to the health care system, in this study, metatranscriptomics was used to sequence virus directly from wastewater. Here, we present a study in which we explored RNA viral diversity through sequencing 94 wastewater influent samples across seven wastewater treatment plants (WTPs), collected from August 2020 to January 2021, representing approximately 16 million people in Southern California. Enriched viral libraries identified a wide diversity of RNA viruses that differed between WTPs and over time, with detected viruses including coronaviruses, influenza A, and noroviruses. Furthermore, single-nucleotide variants (SNVs) of SARS-CoV-2 were identified in wastewater, and we measured proportions of overall virus and SNVs across several months. We detected several SNVs that are markers for clinically important SARS-CoV-2 variants along with SNVs of unknown function, prevalence, or epidemiological consequence. Our study shows the potential of WBE to detect viruses in wastewater and to track the diversity and spread of viral variants in urban and suburban locations, which may aid public health efforts to monitor disease outbreaks. IMPORTANCE Wastewater-based epidemiology (WBE) can detect pathogens across sewersheds, which represents the collective waste of human populations. As there is a wide diversity of RNA viruses in wastewater, monitoring the presence of these viruses is useful for public health, industry, and ecological studies. Specific to public health, WBE has proven valuable during the coronavirus disease 2019 (COVID-19) pandemic to track the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) without adding burden to health care systems. In this study, we used metatranscriptomics and reverse transcription-droplet digital PCR (RT-ddPCR) to assay RNA viruses across Southern California wastewater from August 2020 to January 2021, representing approximately 16 million people from Los Angeles, Orange, and San Diego counties. We found that SARS-CoV-2 quantification in wastewater correlates well with county-wide COVID-19 case data, and that we can detect SARS-CoV-2 single-nucleotide variants through sequencing. Likewise, wastewater treatment plants (WTPs) harbored different viromes, and we detected other human pathogens, such as noroviruses and adenoviruses, furthering our understanding of wastewater viral ecology. pathogens, such as noroviruses and adenoviruses, furthering our understanding of wastewater viral ecology. KEYWORDS COVID-19, coronavirus, microbial ecology, SARS-CoV-2, viruses, wastewater M unicipal wastewater represents a matrix containing a wide diversity of microbes and is representative of the collective waste of a human population across a catchment area. (1) . The microbial content of wastewater can be useful in determining the levels of biological contamination of an area, including the presence of human and animal feces, antimicrobial resistance genes, pathogenic bacteria, and viruses (1) (2) (3) (4) (5) (6) . Regarding viruses specifically, wastewater often contains high titers of bacteriophages and plant-infecting viruses, along with generally smaller proportions of viruses that infect animals, including humans (7) (8) (9) (10) . Many studies have used metagenomics to characterize the viral content of wastewater, but these studies typically rely on extracted DNA, which is unable to capture the wide diversity of RNA-based viruses (11) (12) (13) . As RNA viruses can be important pathogens of humans and agricultural organisms, using metatranscriptomic sequencing to study these diverse viruses in wastewater is relevant to public health and industry and may allow for a greater understanding of the ecological processes that occur in wastewater (2, 8, (14) (15) (16) . Wastewater-based epidemiology (WBE) is a useful method to detect the presence of human pathogens and may serve as an early warning system for disease outbreaks (17) . For example, WBE has been used to track the prevalence of viruses such as norovirus, rotavirus, adenovirus, poliovirus, influenza, and severe acute respiratory syndrome (SARS) coronaviruses (18) (19) (20) (21) (22) (23) , with the added benefit that WBE does not rely on public health and clinical resources (17) . Aside from basic detection, it has been shown that the viral load of wastewater often precedes clinical outbreaks and may offer a forecast of the severity of localized disease outbreaks (18, (24) (25) (26) . Infectious diseases are often underreported in clinical settings, likely due to asymptomatic cases, avoidance of health care, and incorrect diagnoses, which prevents accurate disease surveillance and incidence reporting, potentially hampering public health responses (27, 28) . While not all pathogens are excreted into wastewater at detectible levels, the presence of detectible viruses in wastewater and human fecal samples indicates that WBE may still provide useful information in monitoring disease (21, 29, 30) . The coronavirus disease 2019 (COVID-19) pandemic has placed an intense strain on health care systems worldwide and has resulted in over 4 million human deaths (31) . Caused by the 2019 emergence of the enveloped positive-sense single-stranded RNA (1ssRNA) severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (32, 33) , this virus has been reported in direct nasal swab patient samples, human feces, and wastewater samples (15, 23-25, 29, 34-36) . While detection and quantification of SARS-CoV-2 are clearly important, the predominant method of detection, reverse transcriptionquantitative PCR (RT-qPCR), does not allow for the characterization of viral variants, which is critical to monitor during the COVID-19 pandemic to track the evolution of SARS-CoV-2 (37, 38) . In light of the limitations of RT-qPCR, thousands of patient-derived SARS-CoV-2 genomes have been sequenced, and important viral variants have been discovered (37, (39) (40) (41) . However, the sequencing of patient samples relies mainly on clinical samples (42) , which were difficult to obtain during the COVID-19 pandemic (43, 44) . As wastewater represents a composite of human waste from the total catchment area, metagenomic sequencing of these samples may allow for the detection of viral variants and pathogen diversity across larger populations and regions without further burdening health care workers (11, 13, 45, 46) . Given the importance of monitoring the COVID-19 pandemic and exploring RNA viral diversity, in this study, metatranscriptomic sequencing and droplet digital PCR (ddPCR) were conducted in parallel to characterize the RNA viromes and viral load of SARS-CoV-2 of 94 influent samples from seven wastewater treatment plants (WTPs) representing a total population of 16 million individuals across Southern California. Through our study, we investigated several lines of inquiry. First, what is the diversity of RNA viruses across Southern California wastewater, and does viral abundance change longitudinally? Second, can we detect human-infecting viruses, including coronaviruses, in wastewater, and how does SARS-CoV-2 quantification from wastewater samples compare to county-level COVID-19 case counts? Third, can we use metatranscriptomics to detect and track the emergence of SARS-CoV-2 strain variants in wastewater over time? ddPCR quantification of SARS-CoV-2 and correlation with daily countywide COVID-19 cases. We quantified SARS-CoV-2 viral load in 85 influent wastewater samples from August 2020 to early January 2021 across seven WTPs in Los Angeles, Orange, and San Diego counties in California. Because of COVID-19 reporting at the county level, we correlated SARS-CoV-2 N1 gene copies per liter of influent wastewater with cases within counties only (i.e., the Hyperion [HTP] WTP was correlated with Los Angeles county COVID-19 cases) ( Table S1 in Sequencing library characteristics. We obtained a total of 1,119,674,084 quality-filtered deduplicated paired reads across 180 libraries (90 unique samples), and, as we used two different library preparation strategies, we report the statistics separately. For Viral composition of wastewater and detection of selected pathogens. As enrichment changed the viral composition of wastewater samples, we analyzed the unenriched and enriched sample data separately. Unenriched samples contained sequences from 2,495 viruses, and the top 10 most proportionally abundant viruses accounted for an average of 97.6% of the total abundance. The average relative abundances 6 standard deviation of these "top 10" viruses were as follows: tomato brown rugose fruit virus (66. We analyzed the data from Illumina respiratory virus (IRV)-enriched samples as above and found sequences from 2,215 viruses, with the top 10 most proportionally abundant viruses accounting for an average of 97.2% of total abundance. The average relative abundances 6 standard deviation of these "top 10" viruses were as follows: tomato brown rugose fruit virus (60.2 6 8.6%), pepper mild mottle virus (13.0 6 4.4%), cucumber green mottle mosaic virus (11.5 6 4.3%), tomato mosaic virus (5.0 6 3.0%), tobacco mild green mosaic virus (2.6 6 1.6%), tropical soda apple mosaic virus (1.8 6 1.4%), tomato mottle mosaic virus (1.5 6 1.4%), SARS-CoV-2 virus (0.9 6 2.3%), crAssphage (0.4 6 0.7%), and opuntia virus 2 (0.3 6 0.6%) (Fig. 2 ). We were able to detect many more reads of SARS-CoV-2 with respiratory virus enrichment, as there were only 337 SARS-CoV-2 reads (an average proportional abundance of 0.0004%) in unenriched samples, while across enriched samples, we detected 124,135 SARS-CoV-2 reads. We note that IRV enrichment did not have a large impact on our ability to detect the most abundant viruses, rather it allowed us to detect the less abundant respiratory viruses relevant to public health. The Illumina respiratory virus oligonucleotide panel enriches for 40 viruses (including SARS-CoV-2), so we were able to compare viral detection in 86 enriched and unenriched samples, along with two nonrespiratory viruses often detected in wastewater, norovirus and pepper mild mottle virus (PMMoV). In enriched libraries, we detected the presence of SARS-CoV-2 in 68 samples, human coronavirus (HCoV)-OC43 in 22 samples, HCoV-229E in one sample, influenza A in 10 samples, human adenoviruses in 15 samples, human bocaviruses in 13 samples, and noroviruses in 52 samples. In unenriched samples, we detected SARS-CoV-2 in 24 samples, HCoV-OC43 in two samples, and noroviruses in 59 samples. Likewise, we detected PMMoV in all samples regardless of enrichment (Fig. 3) , indicating that sequencing was successful. Wastewater viral ecology. We analyzed the Shannon indexes for unenriched samples and found that overall alpha diversity was significantly different between wastewater sampling sites (F [6, 87] = 7.5, P , 0.001) and then used Tukey's honestly significant difference (HSD) post hoc pairwise comparison testing to show that only NC-HTP, PL-HTP, PL-JWPCP, PL-OC, and PL-SJ alpha diversities were different from each other (adjusted P value [P adj ] , 0.05) (Fig. 4) . We also compared the Bray-Curtis dissimilarities of the samples with Adonis and found that overall treatment plants' beta diversity values were significantly different (R 2 = 0.41, P , 0.001) (Fig. 4) . Additionally, we used analysis of compositions of microbiomes (ANCOM) to test for differential abundance of viruses at greater than 0.0001 average relative abundance between treatment plants. This resulted in 16 viruses being differentially abundant between treatment plants (W . 46, P adj , 0.05 each; (Fig. 4) . In enriched samples, we compared the proportional abundances of human respiratory viruses at greater than 0.0001 average relative abundance and showed that SARS-CoV-2 and HCoV-OC43 were significantly different across treatment plants (W . 46, P adj , 0.05 each) (Fig. 4) . As we sampled the treatment plants multiple times over our study (up to 145 days from first sample), we were able to study how the viromes changed longitudinally. We compared diversity measures over time with linear mixed effects (LMEs) using treatment plant as a random effect and show that both alpha and beta diversity remained stable across the sampling periods in unenriched samples (Shannon: t = 0.21, P = 0.84; Bray-Curtis: t = 20.31, P = 0.76). We also used LME on viruses present at greater than 0.0001 average relative abundance with treatment plant as a random effect and showed that 19 viruses' relative abundances changed over time (P adj # 0.05) (Fig. S1) . We specifically were interested in SARS-CoV-2 in enriched samples and used LMEs with treatment plant as a random effect and found that the relative abundance of this virus increased between August 2020 and January 2021 (t = 4.0, P , 0.001) (Fig. S1) . Sequencing SARS-CoV-2 single-nucleotide variants in wastewater samples. We were interested in reads mapping specifically to the SARS-CoV-2 genome as a way to detect single-nucleotide variants (SNVs) in wastewater. Across the 68 enriched samples that had detectible SARS-CoV-2 reads, we obtained an average breadth of genomic coverage of 24.0% (range: 0.2% to 99.8%) per sample at an average sequencing depth of 8.1 reads per base (range: 0.002 to 177.6) ( Fig. 5; Table S2 ). After masking the likely problematic nucleotide sites (as suggested by https://virological.org/t/masking-strategies-for-sars -cov-2-alignments/480/14, March 2021 update), we obtained 2,558 SNVs (2,002 unique) across all samples; however, due to the low breadth of coverage in many samples, many of these sites may be spurious or unresolved. After applying a more stringent cutoff of 50% breadth of coverage across the SARS-CoV-2 genome, we obtained 2,060 SNVs (1,656 unique) across 14 samples (Fig. 5 ; Table S3 ). As we took samples at multiple time points per treatment plant, we were able to track the proportion of SNVs per nucleotide position over time, most notably through samples taken from the Hyperion (HTP) facility, likely due to higher viral load leading to higher sequencing depth. Within HTP, we plotted the proportion of SNVs (compared to the reference strain) over time at nucleotide positions obtained from at least three samples with greater than 50% breadth of genomic coverage. Three of the detected SNVs are apparently fixed in the viral population (sites 241, 14408, and 23403 are all 100% SNV), while the other 17 SNVs appear to vary widely over the sampling dates (overall average percent coefficient of variation [%CV] = 103%; range: 0% to 161%) with no apparent directionality (Fig. 6 ). Our composite wastewater samples contained a diversity of RNA viruses (mainly plant-infecting viruses along with lower relative abundances of animal-infecting viruses) that differed between wastewater treatment plants (WTPs), supporting previous studies indicating that location affects the presence of viruses (13) . Furthermore, several individual viruses varied over time while overall diversity remained unchanged, likely due to localized infections within WTP catchment areas (3, 9, 47) . We detected several human-pathogenic viruses across all WTPs, supporting the hypothesis that wastewater-based epidemiology (WBE) has the potential to inform public health and researchers about viral presence and distribution without relying on standard health care practices (17) . Through respiratory-virus-enriched library preparation and sequencing, we detected the presence of SARS-CoV-2 at every WTP, along with several SNVs across the SARS-CoV-2 genome. This suggests that WBE can reveal the pool of potential viral variants across large geographic areas, again without adding stress to health care systems, with the benefit that composite sampling can collect wastewater 24 h per day (45, 46, 48) . Lastly, we show that SARS-CoV-2 viral load at WTPs generally correlates with county-level COVID-19 case counts, indicating that WBE can be useful in monitoring the severity and dynamics of disease spread (18, 19, (23) (24) (25) 49) . The vast majority of viruses present in our samples were plant viruses, mainly those infecting tomatoes, peppers, and cucumbers in the genus Tobamovirus. This result is consistent with other studies, which suggests that these viruses are diverse and widespread in wastewater, likely originating from agricultural runoff or human feces (3, 7, 50) . Even though these viruses were ubiquitous throughout our samples, WTPs had significantly different overall viromes, indicating that there may be signatures of location and wastewater catchment throughout Southern California, which has been suggested from other sampling locations (13) . Aside from overall diversity, several plantor arthropod-infecting viruses were differentially abundant between WTPs, possibly due to differences in peoples' diet (and thus viral excretion), infected plant growth, and localized infections of arthropods (i.e., Hubei picorna-like viruses and Beihai permutotetra-like viruses) (3, 9, 47) . Likewise, we found that several viruses' relative abundances varied over time, regardless of treatment plant, suggesting that there are infection/subsidence dynamics or seasonal trends as previously suggested (7, 10, 49, 51) . As the day-to-day relative abundance of viruses varied, we suggest that future wastewater surveys incorporate longitudinal and composite sampling to accurately capture viral diversity. We detected several viruses pathogenic to humans across WTPs, including noroviruses, adenoviruses, bocaviruses, coronaviruses, and influenza A, which agrees with other studies, indicating that WBE is robust and applicable to multiple viruses (18) (19) (20) (21) (22) (23) . By applying metagenomic or metatranscriptomic sequencing to wastewater, we can simultaneously detect a diversity of viruses, potentially alerting public health to unknown or underreported infections or new viral strains (27, 28, 46, 52) . As we prepared sequencing libraries using two methods (unenriched and Illumina respiratory virus oligonucleotide panel-enriched shotgun metatranscriptomics), we could compare the effects of enrichment on virus detection and report that viral enrichment greatly improved our ability to detect influenza A and coronaviruses (especially SARS-CoV-2). We suggest that if researchers and the public health community are specifically interested in respiratory viruses, enrichment and the proper wastewater concentration/ extraction methods should be used for appropriate sensitivity and specificity to viruses of interest (23, 25, 34, 46, 53) . Alongside sequencing, we also used ddPCR to quantify SARS-CoV-2 in wastewater and show that our results generally correlated well with county-reported 7-day rolling average COVID-19 cases, which agrees with previous studies (23, 24, 54) . We note that SARS-CoV-2 viral load and case counts were not significantly correlated at every WTP, indicating that there is likely unknown variability in the survival of RNA within wastewater, or that there may be variable influent flow or water quality affecting the viral load at specific WTPs (55, 56) . We also note that our comparison is limited to county-level data rather than being a comparison of the areas contributing to the influent stream, and this could be obscuring the relationship-tocase data at some treatment plants. Our respiratory virus-enriched metatranscriptomic sequencing detected the presence of SARS-CoV-2 at every WTP and in most of our samples, although we note that we could not accurately quantify viruses through sequencing. Instead, the power of metatranscriptomic sequencing lies in our ability to detect SNVs across the SARS-CoV-2 genome (46) , and, depending on the sample, we often sequenced greater than 50% of the genome at appreciable read depth. For example, we detected the GSAID clade GH (lineage B.1*) markers 241C.T, 3037C.T, 14408C.T, 23403A.G, and 25563G.T at 100% prevalence where those regions of the genome were sequenced (39, 57, 58) . Likewise, as we detected over 1,000 SARS-CoV-2 SNVs across our samples, we found many SNVs of putatively unknown function that have been detected in patient samples, such as 6285C.T and 9891C.T (found in, but does not solely define, variants B. 1.525 and B.1.1.318, respectively) and 28854C.T and 28887C.T (40, 59, 60) . We also detected many SNVs also found in wastewater samples from Northern California (46) and SNVs that, to the best of our knowledge, have yet to be sequenced (41, 46, 48) . As our sequencing data are not quantitative, our study suggests that sequencing wastewater is useful for SNV detection across wide catchment areas but is not useful for the true prevalence of SNVs (46) . Furthermore, we recognize that wastewater likely represents a collection of different viruses' RNA rather than one intact virion, as SARS-CoV-2 may be relatively fragile in wastewater (55, 61) . Lastly, we note that most SNVs came from samples with the highest number of SARS-CoV-2 reads and suggest sequencing samples deeply or using tiled amplicon-based approaches to obtain a useful breadth and depth of viral genomes for variant detection (46, 62) . Conclusion. Wastewater-based epidemiology has the potential to aid public health in monitoring the spread and severity of disease outbreaks. Our research contributes to the growing field of WBE by showing that Southern California wastewater harbors a diversity of RNA viruses and that these viral populations vary over time and between WTPs. Likewise, we are able to detect human viral pathogens without increasing the burden on local health care systems, further supporting the benefit of WBE. Through ddPCR and metatranscriptomic sequencing, we were able to measure the viral load of SARS-CoV-2 in wastewater and identify potentially novel SNVs, which may assist in monitoring viral evolution and the emergence of new variants. We suggest that future researchers use longitudinal metatranscriptomic sequencing on wastewater samples to further understand the spread of RNA viruses and how these viruses change over time. Sample collection. We collected 94 1-liter 24-h composite influent wastewater samples at seven WTPs across Southern California between August 2020 and January 2021 (Table S4 in the supplemental material). The samples were aliquoted into 50-ml tubes and stored at 4°C until sample processing. Note that extractions were performed independently for ddPCR quantification and viromic sequencing. Wastewater sample processing for SARS-CoV-2 ddPCR quantification. We prepared influent samples for ddPCR following the method described in Steele et al. (63) . Briefly, we first added bovine coronavirus (BoCoV) vaccine (Bovilis; Merck & Co, Kenilworth, NJ) to 20 ml of wastewater as a sample processing control to assess viral RNA recovery. We then added MgCl 2 to a final concentration of 25 mM and adjusted the pH to ,3.5 with 20% HCl on a mixed cellulose ester membrane (type HA; Millipore, Bedford, MA) in replicates of six. We then transferred the HA filters to preloaded 2-ml ZR BashingBead lysis tubes (Zymo, Irvine, CA) and bead beat the samples with a BioSpec beadbeater (BioSpec Products, Bartlesville, OK) for 1 min. We then extracted total nucleic acids with a bioMérieux NucliSENS extraction kit with magnetic bead capture (bioMérieux, Durham, NC) following the manufacturer's protocol. SARS-CoV-2 reverse transcription-droplet digital PCR quantification and correlation with countywide COVID-19 cases. We used one-step reverse transcription-droplet digital PCR (RT-ddPCR) to quantify the N1 region of the SARS-CoV-2 N gene with primer and probe sequences designed by the CDC (63, 64) , and we quantified the bovine coronavirus using previously designed primers (63, 65) . We set up RT-ddPCR reactions following the manufacturer's instructions on a Bio-Rad Qx200 (Bio-Rad, Hercules, CA). For all assays, a minimum of two reactions and a total of $20,000 droplets were generated per sample, and at least five no-template control (NTC) reactions and two positive-control reactions were run per 96-well plate as well as extraction-specific NTCs. Each sample was required to have a minimum of three positive droplets (66) to be included in further analyses. We also assessed RNA recovery using the BoCoV exogenous control, and samples with ,3% recovery were excluded from further analyses. We used the SARS-CoV-2 ddPCR quantification data and compared viral loads with county-level COVID-19 reported case data from the State of California Health and Human Services Agency (https:// data.chhs.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state). We calculated rolling 7-day averages for COVID-19 cases in Los Angeles, Orange, and San Diego counties with the R package "zoo" v1. 8-9 (67) and ran Pearson correlations between viral load and reported COVID-19 cases with the R package "Hmisc" v4.5 (68) at each time point where we had both data points. Wastewater sample processing for metatranscriptomic sequencing. We followed a similar protocol as Crits-Christoph et al. (46) and Wu et al. (23) to concentrate viruses and extract RNA. We pasteurized 50 ml of wastewater in a 65°C water bath for 90 min and filtered the samples through a sterile 0.22-mm vacuum filter (VWR, Radnor, PA) to remove solids. We then concentrated the filtrate through ultracentrifugation at 3,000 Â g with 10-kDa Amicon filters (MilliporeSigma, Burlington, MA) by successively centrifuging then discarding the flowthrough until the entire 50-ml sample was processed. This resulted in final volumes of less than 500 ml for each sample, which we then stored at 280°C until RNA extraction. We thawed the wastewater concentrate on ice, then used an Invitrogen PureLink RNA minikit with DNase (Invitrogen, Waltham, MA) to extract RNA by following the manufacturer's protocol, quantified the resulting RNA with an AccuBlue broad range RNA quantification kit (Biotium, Fremont, CA) on a DeNovix QFX fluorometer (DeNovix, Wilmington, DE) spectrophotometer, and stored the RNA at 280°C. Next-generation sequencing library preparation. Sample library preparation and next-generation sequencing was performed by the University of California Irvine Genomics High Throughput Facility (GHTF). The GHTF used two separate library preparation strategies per sample: one library was prepared with the Illumina RNA prep with enrichment kit (Illumina, San Diego, CA), and the other library was prepared using the same preparation kit with the addition of the Illumina respiratory virus oligonucleotide panel to enrich for human respiratory viruses. The GHTF then sequenced the paired-end libraries either 2 Â 100 bp or 2 Â 150 bp (Table S4 ) on an Illumina NovaSeq 6000 with an S4 300 cycle kit and sent the data as demultiplexed FASTQ files. Bioinformatics and sequence data processing. We used the University of California Irvine High Performance Community Computing Cluster (HPC3) for all data processing on the provided FASTQ files. We removed primers, adapter sequences, and low-quality bases with "bbduk" in the BBTools software package v38.87 (69) and removed PCR duplicates with BBTools "dedupe." We downloaded the NCBI Virus RefSeq Genome database (January 2021), used Bowtie2 v2.4.1 (70) to build an index of the viral genomes, and mapped the cleaned FASTQ reads against this index. We used "Samtools" v1.10 (71) to convert the resulting SAM files to sorted BAM files and finally used "inStrain" v1.3.1 (72) to calculate all viral abundances and profile viral variants on read pairs with .90% average nucleotide identity to their respective reference genomes. Lastly, we used "mosdepth" v0.3.1 (73) and "bedtools" v2. 30 .0 (74) to calculate genome coverage and breadth across all SARS-CoV-2 sequences. For community diversity analyses, we tabulated viral abundances and normalized the reads into within-sample relative abundances in R v4.0.4 (75) . We used this table to generate Shannon diversity indices and Bray-Curtis dissimilarity matrices with the R package "vegan" v2. 5-7 (76) . We also ran Adonis permutational multivariate analysis of variance (PERMANOVA) tests on the distance matrices and performed nonmetric multidimensional scaling on the data, compared the proportional abundances of viruses with ANCOM v2.1 (77) , and compared the relative abundances of viruses over time with "lmerTest" v3.1-3 using WTP as a random effect (78) . Lastly, we plotted all figures with "ggplot2" v3.3.3 (79) , "gggenes" v0.4.1 (80), or "pheatmap" v1.0.12 (81) . Data availability. Raw sequencing data have been deposited on the NCBI Sequence Read Archive under accession number PRJNA729801, and representative code can be found at https://github.com/ jasonarothman/wastewater_viromics_sarscov2. The flux and impact of wastewater infrastructure microorganisms on human and ecosystem health Global diversity and biogeography of bacterial communities in wastewater treatment plants Eukaryotic viruses in wastewater samples from the United States Diversity and population structure of sewage-derived microorganisms in wastewater treatment plant influent Antibiotic-resistance genes in waste water Global phylogeography and ancient evolution of the widespread human gut virus crAssphage Raw sewage harbors diverse viral populations Metagenomic analysis of viruses in reclaimed water Pepper mild mottle virus as a water quality indicator Occurrence of norovirus in raw sewage-a systematic literature review and meta-analysis Characterisation of the sewage virome: comparison of NGS tools and occurrence of significant pathogens High variety of known and new RNA and DNA viruses of diverse origins in untreated sewage Identification of viral pathogen diversity in sewage sludge by metagenome analysis Insights into microbial diversity in wastewater treatment systems: how far have we come? Wastewater-based epidemiology: global collaborative to maximize contributions in the fight against COVID-19 Are RNA viruses candidate agents for the next global pandemic? A review Future perspectives of wastewaterbased epidemiology: monitoring infectious disease spread and resistance to the community level Detection of pathogenic viruses in sewage provided early warnings of hepatitis A virus and norovirus outbreaks Detection of poliovirus circulation by environmental surveillance in the absence of clinical cases in Israel and the Palestinian authority Detection of imported wild polioviruses and of vaccine-derived polioviruses by environmental surveillance in Egypt Surveillance of influenza A and the pandemic influenza A (H1N1) 2009 in sewage and surface water in the Netherlands Concentration and detection of SARS coronavirus in sewage from Xiao Tang Shan Hospital and the 309th Hospital of the Chinese People's Liberation Army SARS-CoV-2 titers in wastewater are higher than expected from clinically confirmed cases High-throughput wastewater SARS-CoV-2 detection enables forecasting of community infection dynamics in San Diego County Measurement of SARS-CoV-2 RNA in wastewater tracks community infection dynamics Lessons from a public health emergency -importation of wild poliovirus to Israel Burden of Communicable diseases in Europe (BCoDE) consortium. 2014. Measuring underreporting and under-ascertainment in infectious disease datasets: a comparison of methods Using influenza surveillance networks to estimate state-specific prevalence of SARS-CoV-2 in the United States Presence of SARS-Coronavirus-2 RNA in sewage and correlation with reported COVID-19 prevalence in the early stage of the epidemic in The Netherlands Emerging investigators series: the source and fate of pandemic viruses in the urban water cycle An interactive web-based dashboard to track COVID-19 in real time Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan WHO Statement regarding cluster of pneumonia cases in Wuhan Metagenomics of wastewater influent from Southern California wastewater treatment facilities in the era of COVID-19 Wastewater-based epidemiology as a useful tool to track SARS-CoV-2 and support public health policies at municipal level in Brazil Prolonged presence of SARS-CoV-2 viral RNA in faecal samples Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus Covid-19: variants and vaccination Data, disease and diplomacy: GISAID's innovative contribution to global health Genomic and proteomic mutation landscapes of SARS-CoV-2 GESS: a database of global evaluation of SARS-CoV-2/hCoV-19 sequences Variant analysis of 1,040 SARS-CoV-2 genomes COVID-19 outbreak: implications on healthcare operations Modeling COVID-19 scenarios for the United States Temporal detection and phylogenetic assessment of SARS-CoV-2 in municipal wastewater Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants Redefining the invertebrate RNA virosphere High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulating variants Temporal dynamics of norovirus determined through monitoring of municipal wastewater by pyrosequencing and virological surveillance of gastroenteritis cases Viromics and infectivity analysis reveal the release of infective plant viruses from wastewater into the environment Retrospective surveillance of wastewater to examine seasonal dynamics of enterovirus infections Making waves: wastewater-based epidemiology for COVID-19-approaches and challenges for surveillance and prediction A comparison of SARS-CoV-2 wastewater concentration methods for environmental surveillance SARS-CoV-2 titers in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases Persistence of SARS-CoV-2 in water and wastewater Within-day variability of SARS-CoV-2 RNA in municipal wastewater influent during periods of varying COVID-19 prevalence and positivity WHO European Region sequencing laboratories and GISAID EpiCoV group, WHO European Region sequencing laboratories and GISAID EpiCoV group. 2020. Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region Analysis of SARS-CoV-2 mutations in the United States suggests presence of four substrains and novel variants Mutational landscape and dominant lineages in the SARS-CoV-2 infections in the state of Telangana A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Coronaviruses in wastewater processes: source, fate and potential risks A benchmarking study of SARS-CoV-2 whole-genome sequencing protocols using COVID-19 patient samples Sources of variability in methods for processing, storing, and concentrating SARS-CoV-2 in influent from urban wastewater treatment plants US CDC real-time reverse transcription PCR panel for detection of severe acute respiratory syndrome coronavirus 2 Detection of bovine coronavirus using a TaqMan-based real-time RT-PCR assay Droplet digital PCR for simultaneous quantification of general and human-associated fecal indicators for water quality assessment zoo: S3 infrastructure for regular and irregular time series Hmisc: Harrell miscellaneous BBTools software package Fast gapped-read alignment with Bowtie 2 The sequence alignment/map format and SAMtools 2021. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains Mosdepth: quick coverage calculation for genomes and exomes BEDTools: a flexible suite of utilities for comparing genomic features R: a language and environment for statistical computing. R Foundation for Statistical Computing vegan: community ecology package Analysis of composition of microbiomes: a novel method for studying microbial composition lmerTest package: tests in linear mixed effects models ggplot2: elegant graphics for data analysis gggenes: draw gene arrow maps in "ggplot2 pheatmap: pretty heatmaps We thank the staff of City of Los Angeles Sanitation and Environment, Los Angeles County Sanitation District, Orange County Sanitation District, and City of San Diego Public Utilities for collecting influent. We also thank E. Macias for assistance with samples. This research was supported by Emergency COVID-19 Research Seed Funding through the University of California Office of the President Research Grants Program Office (award numbers R01RG3732 and R00RG2814) awarded to J.A.R., T.B.L., and K.L.W. and a Hewitt Foundation for Biomedical Research postdoctoral fellowship to J.A.R. This work was made possible, in part, through access to the Genomics High Throughput Facility Shared Resource of the Cancer Center Support Grant (P30CA-062203) at the University of California, Irvine, NIH shared instrumentation grants 1S10RR025496-01, 1S10OD010794-01, and 1S10OD021718-01, and access to computing resources from the UCI High Performance Cloud Computing Center.