key: cord-0958763-oqqda5ji authors: Swift, Candice L.; Isanovic, Mirza; Correa Velez, Karlen E.; Norman, R. Sean title: Community-level SARS-CoV-2 sequence diversity revealed by wastewater sampling date: 2021-08-18 journal: Sci Total Environ DOI: 10.1016/j.scitotenv.2021.149691 sha: 79c83c39dfbb26ef3fcab2350cdec12eeb86cc7d doc_id: 958763 cord_uid: oqqda5ji Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus responsible for causing the COVID-19 pandemic, can be detected in untreated wastewater. Wastewater surveillance of SARS-CoV-2 complements clinical data by offering earlier community-level detection, removing underlying factors such as access to healthcare, sampling asymptomatic patients, and reaching a greater population. Here, we compare 24-hour composite samples from the influents of two different wastewater treatment plants (WWTPs) in South Carolina, USA: Columbia and Rock Hill. The sampling intervals span the months of July 2020 and January 2021, which cover the first and second waves of elevated SARS-CoV-2 transmission and COVID-19 clinical cases in these regions. We identify four signature mutations in the surface glycoprotein (spike) gene that are associated with the following variants of concern, or VOC (listed in parenthesis): S477N (B.1.526, Iota), T478K (B.1.617.2, Delta), D614G (present in all VOC as of May 2021), and H655Y (P.1, Gamma). The N501Y mutation, which is associated with three variants of concern, was identified in samples from July 2020, but not detected in January 2021 samples. Comparison of mutations identified in viral sequence databases such as NCBI Virus and GISAID indicated that wastewater sampling detected mutations that were present in South Carolina, but not reflected in the clinical data deposited into databases. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the pandemic declared by the World Health Organization in March 2020 (Tedros, 2020) . The SARS-CoV-2 pandemic has resulted in both loss of life and an economic downturn (US Census Bureau, 2021) . Although the virus is predominantly transmitted by bioaerosols (Guzman, 2021) , fecal shedding of SARS-CoV-2 has been established from both symptomatic and asymptomatic patients , and there have been multiple reports of the detection of SARS-CoV-2 globally in both treated and untreated sewage (Crits-Christoph et al., 2020; Kumar et al., 2020; La Rosa et al., 2020; Westhaus et al., 2021) . Wastewater sampling has emerged as a useful tool to aid policy makers in making decisions about stayat-home orders and other measures to mitigate viral spread (McClary et al., 2021) . Advantages of SARS-CoV-2 wastewater surveillance include the following: (1) asymptomatic cases are captured in the data (Wu et al., 2020) , (2) wastewater trends precede clinical data by as much as 10 days (Wu et al., 2020) , and (3) wastewater data tracks infection trends independently from clinical data, without reflecting healthcare access and choices ("National Wastewater Surveillance System (NWSS) -a new public health tool to understand COVID-19 spread in a community | CDC," 2021). However, the information provided by wastewater surveillance goes beyond tracking case counts: recent studies have shown that wastewater sampling can reveal which viral variants are present in a specific location (Fontenele et al., 2021; Martin et al., 2020) as well as signature mutations that match clinical sequences or those not yet present in databases (Crits-Christoph et al., 2020; Jahn et al., 2021; Nemudryi et al., 2020) . In particular, three variants, B.1.17 (Abbott, 2021), B1.351 (Tegally et al., 2020) , and P.1 (Candido et al., 2021) are considered concerning due to their potential for higher rates of transmission. These correspond to NextStrain clades 20I, 20H and 20J, respectively (Hadfield et al., 2018) . These three variants of concern (Centers for Disease Control and Prevention, 2021) , commonly referred to as the UK (World Health Organization label Alpha), South African (Beta), and Brazilian (Gamma) variants, share several mutations, such as the spike protein mutation from asparagine to tyrosine (N501Y). The N501Y mutation is of functional significance because it is located in the receptorbinding domain of the spike protein (Tegally et al., 2020) . This domain binds to the human receptor that is thought to mediate viral entry into the cell (angiotensin-converting enzyme 2, or ACE2). The mutation from asparagine to tyrosine has been shown in mouse models to enhance binding affinity to the ACE2 receptor . In this study, we sequence SARS-CoV-2 amplified from wastewater influent samples from two wastewater treatment plants (WWTPs) in the state of South Carolina, USA: Rock Hill and Columbia. The J o u r n a l P r e -p r o o f Rock Hill WWTP is located in York County on the northern border, whereas the Columbia WWTP is located in Richland County in the central area of the state. The sampling intervals span the months of July 2020 and January 2021, which cover the first and second waves of elevated SARS-CoV-2 transmission and COVID-19 clinical cases in these regions (Figure 1 ). We report the detection of the N501Y mutation in wastewater as early as July 2020, as well as other mutations, some of which are present in variants of interest or concern. Many of these mutations, although reported in clinical data from other states or in South Carolina at other times, were not present in clinical sequences collected from patients in South Carolina during the corresponding intervals and deposited in the Global Initiative on Sharing All Influenza Data, or GISAID (Shu and McCauley, 2017) , repository.. This study affirms the value of wastewater surveillance, even from a limited number of WWTPs, as part of a statewide infectious disease mitigation program. The Columbia and Rock Hill WWTPs are both secondary (activated sludge) WWTPs that treat municipal wastewater with 6% and 3% of total flow, respectively, permitted from industry. The Columbia WWTP serves a population of 363,714, whereas the Rock Hill WWTP serves 264,117 people. The monthly average flow of Columbia WWTP is 45 MGD. The monthly average flow of Rock Hill WWTP is 12 MGD. One liter 24-hour composite wastewater samples were collected twice a week at the influent site of both the Columbia and Rock Hill WWTPs and transported on ice to the laboratory at the University of South Carolina where they were immediately processed. One mL of bovine respiratory syncytial virus (BRSV) vaccine (~80 million copies/mL) (INFORCE 3®) was added to one liter of wastewater prior to concentration in order to quantify processing and viral extraction efficiency. The average BRSV viral recovery was 4-5%. The samples were then homogenized for 10 min using laboratory blenders and 250 mL of homogenized wastewater was decanted into centrifuge bottles. The samples were centrifuged using an Avanti® J-E Centrifuge (Beckman Coulter Lifesciences, Indianapolis, Indiana) with a JS-5.3 rotor for 30 min at 4,577 g without braking. The pellets were stored at -80 ˚C and 50 mL of the supernatants were concentrated to 400 µL using Milipore Amicon 30 kDa ultrafilters. PCR tiling of SARS-CoV-2 with native barcoding based off the protocol developed by the ARTIC network (Quick, 2020) . Briefly, total RNA was transcribed into cDNA using the LunaScript® RT SuperMix Kit (New England Biolabs, Ipswich, MA). The resulting products were amplified by 40 cycles of PCR using two different primer pools (V3 design) to create 400 bp amplicons spanning the entire SARS-CoV-2 genome. The PCR products were cleaned with a 1:1 ratio of SPRISelect beads (Beckman Coulter Lifesciences, Indianapolis, IN) to sample rather than a 1:10 dilution of the PCR products as described in the protocol. The PCR products were then end-prepped using the NEBnext® Ultra TM II End Repair/dA-Tailing Module Table S1 . Sequencing data processing was performed according to the ARTIC network nCoV-2019 novel coronavirus bioinformatics protocol (Loman et al., 2020) . Basecalling and demultiplexing were performed within MinKNOW using the high-accuracy model of Guppy version 4.2.3 developed by Oxford Nanopore Technologies (ONT). The minimum barcode score was set to 40 and the dual barcoding option was applied. Reads were filtered using a Qscore threshold of 7 and reads outside of the length range of 400-700 bp were omitted to eliminate chimeric reads. Lastly, filtered reads were mapped to the SARS-CoV-2 genome (accession MN908947.3) using minimap (Li, 2018) within the artic minion command with normalization option enabled (--normalize 200). Variant calling was performed with inStrain (Olm et al., 2021) with the options -c 2 (minimum coverage of 2) and --pairing_filter all_reads (in order to accepted non-paired reads). Mutations identified within primer-binding regions and in problematic sites identified by De Maio and colleagues (De Maio et al., 2020; Maio ND et al., 2020) were removed and are not J o u r n a l P r e -p r o o f presented in Tables 2-3 . InStrain results for the Columbia and Rock Hill WWTP samples are presented in Supplementary Datasets S1 and S2. In addition to performing inStrain analysis for all barcodes, each of which corresponded to a 24-hour composite influent sample collected from either Columbia or Rock Hill WWTPs, we also performed a combined analysis of Columbia and Rock Hill during each month (July 2020 or January 2021). This method allowed us to increase the sequencing depth by including unclassified reads with incomplete barcode ligation. Since samples from July 2020 were sequenced separately from January 2021, we were still able to compare WWTP samples in time, although geographic information was not separated. The combined analysis for July 2020 and January 2021 is presented in Supplementary Datasets S3 and S4, respectively. Due to the composite nature of wastewater, in which viral fragments from different sources are combined, it is not possible at this time to determine whether mutations observed from the same sample correspond to the same viral genome. Also, viral fragments of different sizes may have stability differences in wastewater. Therefore, the observation of signature mutations in wastewater is insufficient to determine the presence of a Variant of Concern (VOC) (Centers for Disease Control and Prevention, 2021) within a community. Nevertheless, the observation of signature mutations from wastewater is an early warning of the potential presence of a VOC within a community and thus still valuable. Mutational profiles observed in the surface glycoprotein (spike) gene in Columbia and Rock Hill WWTP influent samples during the month of January 2021 are presented in Figure 2 . Samples with less than 200x average depth and 70% SARS-CoV-2 genome coverage were excluded, hence samples from July 2020 are not presented due to depth and coverage below these thresholds (Supplementary Table S2 ). The average sequencing depth and SARS-CoV-2 genome coverage for samples from the Columbia WWTP in January 2021 was 314x and 92.9%, whereas the average depth and coverage for Rock Hill WWTP samples was 249x and 85.8%. A complete list of the sequencing depth and coverage for each sample is provided in Supplementary Table S2 . Four mutations in the spike gene were detected that are present in one or more variants of interest (VOI) or variants of concern (VOC)(Centers for Disease (Hodcroft, 2021) and in Table 1 . Mutations were only considered with at least 10 reads corresponding to the divergent nucleotide and at least 50x coverage in the nucleotide position. Including the mutations found in VOI and VOC, 16 mutations in the spike gene were identified from all Rock Hill samples and 34 from Columbia samples. A full list of mutations across the entire SARS-CoV-2 genome is included in Supplementary Datasets S1 and S2 for both Columbia and Rock Hill WWTPs in July 2020 and January 2021 samples. Table 1 illustrates the value of sequencing SARS-CoV-2 from wastewater in addition to clinical samples (represented by sequences deposited in GISAID): mutations detected in VOI/VOC that were already known to be present in South Carolina by clinical sequencing data were corroborated by wastewater data, but more importantly, potential VOI/VOC were flagged by the mutations detected in wastewater that were absent from the clinical sequences in GISAID collected in January 2021. We also aimed to determine similarities in SARS-CoV-2 mutations between Columbia and Rock Hill WWTP influents. Due to the low sequencing depth for barcoded samples in July 2020 (Supplementary Table S2 ), especially for Rock Hill, we avoided comparisons for July 2020 samples between Columbia and Rock Hill. In January 2021, we identified 85 shared locations in the SARS-CoV-2 genome where a mutation occurred. The Orf1ab gene occupies most of the SARS-CoV-2 genome length (71%). Similarly, we found that the largest percentage of shared mutational locations occurred in this gene. However, 26% of the shared locations were in the nucleocapsid phosphoprotein (N) gene, which constitutes only 4% of the genome length. We further investigated the shared mutations in this gene (Supplementary Table S3 ). In this analysis, we included low-frequency mutations with no coverage threshold, but also validated the mutations by comparison to sequence databases (Supplementary Table S3 ). Another study reported that the nucleocapsid is one of the SARS-CoV-2 proteome components with a high rate of mutation (Vilar and Isom, 2021) . In particular, R203K and G204R both had a mutation rate of 0.22, indicating that they were present in 22% of all sequences as of December 2020 (Vilar and Isom, 2021) . Similarly, S194 and M234 showed high rates of mutation (Vilar and Isom, 2021) . Consistent with this study, we identified R203K, G204R, S194L, and M234I in the nucleocapsid protein as shared mutations between Columbia and Rock Hill WWTP influents in January 2021 (Supplementary Table S3 ). Therefore, mutations observed across multiple WWTP sites can reflect global mutation trends, which can be of value in identifying and investigating the spread of new, potentially more infectious strains of SARS-CoV-2. J o u r n a l P r e -p r o o f There was very little temporal overlap in the detected mutations for either the Columbia WWTP influent samples or the Rock Hill WWTP influent samples (Figure 2) , which may reflect the dynamic environment of wastewater in terms of the sewershed population served that is dependent on travel in and out of the community, as well as the wastewater conditions such as flowrate and types of contaminants. Detection of viral mutations in wastewater surveillance efforts may be sensitive to wastewater components such as salts, lipids, and urate, which can affect PCR efficiency (Farkas et al., 2020; Schrader et al., 2012) . A synonymous guanine to adenine substitution at nucleotide position 25297 (residue 1245) was detected in the spike gene in January 5 and 17 samples from the Rock Hill WWTP. A consensus adenine to guanine substitution in spike gene nucleotide position 23403, corresponding to D614G was detected on January 10 and 18 in the Columbia WWTP influent. Only one mutation in the spike gene was shared between Columbia and Rock Hill, a synonymous adenine substitution for a thymine in the reference sequence at nucleotide position 23269 (corresponding to amino acid 569). Comparison of both high-confidence (>100x coverage) and medium-confidence (>50x coverage) divergent sites found in the Columbia WWTP samples from January 6 and 31 of 2021 indicated no shared mutations in the spike gene, despite comparable coverage. Across the entire SARS-CoV-2 genome, only five shared mutations out of 41 total with at least 100x coverage were found when comparing the Columbia WWTP influent sample from January 31 to the Columbia sample from January 6, 2021. Even when expanding the comparison to include mutations with less than 100x coverage in the January 6 sample, no additional shared mutations were identified. The five shared mutations were located in the region of the genome encoding the Orf1ab gene. Out of the 34 divergent sites identified within the Columbia sample set (Supplementary Dataset S1), 31 were observed with less than 50% variant or consensus frequencies and all 16 divergent sites detected in the Rock Hill sample set were less than 50% frequency. In other words, the majority of the reads at positions with more than one allele were consistent with the reference sequence. A notable exception is the D614G mutation in the spike protein observed in both the January 10 and 18 samples, where 100% of the reads aligned to the D614G variant, which is consistent with the global trend and evidence that the D614G mutation increases infectivity (Korber et al., 2020) . The synonymous mutation of cytosine to thymine at position 22747 (residue 395) in the genome was also a majority in the January 31 sample. The mutation H49Y was detected in the January 18 Columbia sample at 50% frequency. This mutation has been reported previously in several clinical samples (Armero et al., 2021; Phan, 2020; J o u r n a l P r e -p r o o f Sixto-López et al., 2021) . The H49Y mutation has been associated with enhanced cell entry (Ozono et al., 2020) , although at present it is not considered a variant of interest or concern. Peak case counts of SARS-CoV-2 in South Carolina were lower in July 2020 (2,366 new cases on July 18, 2020) compared to January 2021 (7,678 new cases reported on January 7) (CDC Case Surveillance Task Force, 2021), resulting in lower concentrations in wastewater (Supplementary Table S1 In July 2020, the N501Y mutation of the spike gene was observed at 215x coverage. This mutation was not observed in the January 2021 combined analysis, although the SARS-CoV-2 genome had similar overall coverage (~400x) in this region to the July 2020 combined analysis. The N501Y mutation was not detected in the negative control for the July 2020 sequencing run. Globally, 34 sequences collected from June 28 to July 31, 2020, were deposited into both NCBI Virus and GISAID with the N501Y mutation. However, no sequences with the N501Y mutation were found from South Carolina in GISAID during this time frame, although there was an earlier reported observation of this mutation in May 2020. Twenty-six of the sequences with the N501Y mutation in the July time frame were from the USA (Table 2) , and all except one were collected from patients in Texas. The N501Y observation in wastewater in South Carolina during July 2020 suggests a possible transmission event of a variant with the N501Y mutation between these two states, although it is also possible that there were already variants with the N501Y mutation present in South Carolina that were not captured in GISAID. Both the evolution of SARS-CoV-2 as well as its geographic distribution are of interest in controlling the pandemic. Only 590 sequences with a collection date between January 1 and January 31, 2021 from the state of South Carolina were deposited in GISAID (Shu and McCauley, 2017) (Hatcher et al., 2017) in the SARS-CoV-2 Data Hub with the same collection period. This disparity illustrates the value of wastewater sequencing, where this study alone includes viral sequences from approximately 600,000 people. Table 2 reports nonsynonymous mutations that were detected in the combined analysis of Columbia and Rock Hill WWTP influent samples in January 2021 but were absent in July 2020. Nine nonsynonymous mutations were present in January 2021 samples but absent from July 2020, and seven of these mutations were located in the Orf1b (Figure 3 and Table 2 ). Mutations in the spike gene from July 2020 and January 2021 combined analysis of Columbia and Rock Hill are presented in Table 3 . To corroborate our findings, we searched the entirety of NCBI Virus (Hatcher et al., 2017) and GISAID (Shu and McCauley, 2017) , as well as the literature (Table 2 ), for reports of the same mutations in SARS-CoV-2. We also filtered GISAID results by location and collection date to ascertain whether the detected mutations were present in data collected from South Carolina within the appropriate month. Notably, in many instances although the mutation was present in the entire sequence database, the mutation was not detected in sequences collected from South Carolina. This further supports the power of wastewater in capturing the sequence space of SARS-CoV-2 compared to clinical sampling alone. We demonstrate here the value of amplifying and sequencing SARS-CoV-2 from wastewater to capture the sequence-space of mutations in the virus. We detected mutations reported in clinical data, including those in variants of concern, such as the N501Y mutation in the surface glycoprotein. Even during a single month, wastewater samples indicated a high degree of sequence diversity in the SARS-CoV-2 genome, with a total of 77 unique mutations (including both synonymous and nonsynonymous) in positions of at least 100x coverage detected in July 2020 samples and 230 mutations in January 2021. Wastewater samples from Columbia and Rock Hill alone captured SARS-CoV-2 sequence diversity that was absent in clinical samples from the entire state of South Carolina deposited into sequence databases like NCBI Virus and GISAID. We validated the observed mutations in SARS-CoV-2 samples from wastewater to the entirety of NCBI Virus and GISAID, as well as the literature, demonstrating that J o u r n a l P r e -p r o o f although many of the observed mutations were not detected in clinical samples collected during July 2020 or January 2021 from South Carolina, they have been observed elsewhere. Although we successfully detected signature mutations present in variants of interest or concern in wastewater, we note that detection of SARS-CoV-2 in wastewater has its limitations compared to clinical sampling. The major obstacle is that amplicons are from a mixed pool of individuals and thus it is unlikely that mutations on different amplicons or even the same amplicon can be associated with a single genome or variant. However, as we have shown in this work, signature mutations associated with specific variants can still be detected and thus complement clinical datasets, which will always be limited to the number of tested patients, who are mostly symptomatic. Clinical sequences are more useful in determining transmission events that corroborates contact tracing and further informs diseasemitigation strategies (Walker et al., 2021) , but wastewater surveillance can offer easier and faster detection of the presence of variants of interest or concern in a community, in addition to monitoring broader evolutionary trends at the population-level. Supplementary datasets are available through Mendeley Data: Swift, Candice (2021), "Community-level SARS-CoV-2 sequence diversity revealed by wastewater sampling", Mendeley Data, V1, doi: 10.17632/ng8kd9wszx.1. Sequencing data in BAM format has been submitted to the National Center for Biotechnology Information Sequencing Read Archive (NCBI SRA) and is available at BioProject accession PRJNA745177. We acknowledge the following funding sources: Center for Disease Control and Prevention #75D-301-18C-02903 and South Carolina Department of Health and Environmental Control (SCDHEC) #EQ-0-654. We are grateful to South Carolina utilities directors and operators, as well as SCDHEC for their contributions to the wastewater sampling and transportation that enabled this work. We gratefully acknowledge Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based. A full list of Originating and Submitting laboratories referenced in this work is available in Supplementary Datasets S5-S8. Abbott, S., 2021. Estimated transmissibility and severity of novel SRS-CoV2 Variant of Concern J o u r n a l P r e -p r o o f First sequence for this variant in GISAID was submitted from a sample collected on 02-24-21. 2 First sequence for this variant in GISAID was submitted from a sample collected on 05-05-21. J o u r n a l P r e -p r o o f Intra-host diversity of sars-cov-2 should not be neglected: Case of the state of Victoria Genomics and epidemiology of a novel SARS-CoV-2 lineage in Manaus COVID-19 Case Surveillance Public Use Data | Data | Centers for Disease Control and Prevention SARS-CoV-2 Variant Classifications and Definitions Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants Masking strategies for SARS-CoV-2 alignments -Novel 2019 coronavirus / Software and Tools -Virological Wastewater and public health: the potential of wastewater surveillance for monitoring COVID-19 High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulating variants Adaptation of SARS-CoV-2 in BALB An overview of the effect of bioaerosol size in coronavirus disease 2019 transmission CoVariants: SARS-CoV-2 Mutations and Variants of Interest Detection of SARS-CoV-2 variants in Switzerland by genomic analysis of wastewater samples Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus First proof of the capability of wastewater surveillance for COVID-19 in India through detection of genetic material of SARS-CoV-2 First detection of SARS-CoV-2 in untreated wastewaters in Italy Minimap2: Pairwise alignment for nucleotide sequences nCoV-2019 novel coronavirus bioinformatics protocol Issues with SARS-CoV-2 sequencing data -Novel 2019 coronavirus / nCoV-2019 Genomic Epidemiology -Virological Tracking SARS-CoV-2 in sewage: Evidence of changes in virus variant predominance during COVID-19 pandemic SARS-CoV-2 Wastewater Surveillance for Public Health Action: Connecting Perspectives from Wastewater Researchers and Public Health Officials During a Global Pandemic Temporal Detection and Phylogenetic Assessment of SARS-CoV-2 in Municipal Wastewater 2021. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains Naturally mutated spike proteins of SARS-CoV-2 variants show differential levels of cell entry Genetic diversity and evolution of SARS-CoV-2 nCoV-2019 sequencing protocol v3 (LoCost) [WWW Document PCR inhibitors -occurrence, properties and removal GISAID: Global initiative on sharing all influenza data -from vision to reality Structural insights into SARS-CoV-2 spike protein and its natural mutants found in Mexican population WHO Director-General's opening remarks at the media briefing on COVID-19 -11 Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa Initial Impact of COVID-19 on One year of SARS-CoV-2: How much has the virus changed? Characterization of SARS-CoV-2 genetic structure and infection clusters in a large German city based on integrated genomic surveillance, outbreak analysis Detection of SARS-CoV-2 in raw and treated wastewater in Germany -Suitability for COVID-19 surveillance and potential transmission risks SARS-CoV-2 titers in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases Molecular and serological investigation of 2019-nCoV infected patients : implication of multiple shedding routes Table 3 . Spike gene mutations identified in this work and associated references. NCBI SRA and GISAID were searched on June 16, 2021. High-confidence mutations are located in positions with at least 100x coverage, whereas medium-confidence mutations are located in positions with at least 50x coverage. Literature references are not comprehensive. Acknowledgement for contributions from GISAID are included in Supplementary Dataset S7.