key: cord-0723347-d2c7slzk authors: Pérez-Cataluña, Alba; Chiner-Oms, Álvaro; Cuevas-Ferrando, Enric; Díaz-Reolid, Azahara; Falcó, Irene; Randazzo, Walter; Girón-Guzmán, Inés; Allende, Ana; Bracho, María A.; Comas, Iñaki; Sánchez, Gloria title: SPATIAL AND TEMPORAL ANALYSIS OF SARS-CoV-2 DIVERSITY CIRCULATING IN WASTEWATER date: 2021-12-24 journal: Water Res DOI: 10.1016/j.watres.2021.118007 sha: 1f6632dfcc34797a448ffbfb646141fabb03b778 doc_id: 723347 cord_uid: d2c7slzk Wastewater-based epidemiology (WBE) has proven to be an effective tool for epidemiological surveillance of SARS-CoV-2 during the current COVID-19 pandemic. Furthermore, combining WBE together with high-throughput sequencing techniques can be useful for the analysis of SARS-CoV-2 viral diversity present in a given sample. The present study focuses on the genomic analysis of SARS-CoV-2 in 76 sewage samples collected during the three epidemiological waves that occurred in Spain from 14 wastewater treatment plants distributed throughout the country. The results obtained demonstrate that the metagenomic analysis of SARS-CoV-2 in wastewater allows the detection of mutations that define the B.1.1.7 lineage and the ability of the technique to anticipate the detection of certain mutations before they are detected in clinical samples. The study proves the usefulness of sewage sequencing to track Variants of Concern that can complement clinical testing to help in decision-making and in the analysis of the evolution of the pandemic. The family Coronaviridae is a family of enveloped RNA viruses generally associated with mild respiratory and gastrointestinal infections (Shang et al., 2020) . Nevertheless, in recent decades new and highly pathogenic zoonotic coronavirus (CoVs) have emerged such as the Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) (Drosten et al., 2003; Ksiazek et al., 2003) , the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) (Zaki et al., 2012) and, most recently, SARS-CoV-2 which has resulted in the CoronaVirus Disease 2019 pandemic. Transmission of SARS-CoV-2 occurs mainly through aerosols or respiratory secretions but it has also been found that, due to its replication capacity in the gastrointestinal tract (Xiao et al., 2020) , it is excreted in feces and urine, as was previously reported for its counterparts SARS-CoV and MERS-CoV. For this reason, it has been possible to detect the genetic material of the virus in the feces of not only symptomatic, but also asymptomatic people (Polo et al., 2020) . These findings have led to the use of wastewater monitoring for SARS-CoV-2. As for other pathogens, the use of Wastewater-Based Epidemiology (WBE) has proven to be a very useful tool as an early detection warning system, allowing trend-estimations as well as establishing correlations between different epidemiological indicators (Bivins et al., 2020; Medema et al., 2020; Randazzo et al., 2020b Randazzo et al., , 2020a . One of the reasons for the success of WBE is that wastewater samples are a noninvasive and inexpensive source of information to investigate the spread of SARS-CoV-2 within a community. Moreover, it provides real-time information on the circulating lineages of SARS-CoV-2, which is essential for the development of vaccines and drugs. This is particularly relevant in view of the current situation where the world's population is being vaccinated against SARS-CoV-2 and where, due to the appearance of emerging lineages, vaccine effectiveness might be compromised (Zhou et al., 2021) . Massive parallel sequencing techniques applied to sewage samples allow us to analyze a large number of SARS-CoV-2 genomes, including those present in symptomatic and asymptomatic persons. Through the analysis of sequences, it is possible to detect low-frequency variants (LFV) and to infer which lineages are circulating at a certain time and place (Bar-Or et al., 2021; Crits-Christoph et al., 2021; Dharmadhikari et al., 2021; Herold et al., 2021; Izquierdo-Lara et al., 2021; La Rosa et al., 2021; Nemudryi et al., 2020; Rios et al., 2021) . Additionally, genomic analyses may allow to detect the entry of described lineages or Variants of Concern (VOCs) into populations, as well as the appearance of emerging lineages, to characterize new outbreaks, and to aid in viral strains tracking (Bar-Or et al., 2021; Crits-Christoph et al., 2021; Izquierdo-Lara et al., 2021; La Rosa et al., 2021; Nemudryi et al., 2020; Rios et al., 2021) . These studies also evidenced that improvement on sequencing techniques must be performed in order to reduce error rates, as the case of Nanopore sequencing (Nemudryi et al., 2020) . Despite these limitations, the published works showed that genomic analysis of SARS-CoV-2 in wastewater should be used as a complementary tool in epidemiological surveillance. This aspect has grown in significance because during the spread of SARS-CoV-2, different mutations (i.e. (Singh et al., 2021) , or immunity. These characteristics, if they occur, can aggravate the epidemiological situation in certain areas, so the detection of new lineages and the appearance of VOCs in any specific population is crucial to overcome the current pandemic situation and control the spread In the framework of SARS-CoV-2 wastewater monitoring in Spain, grab samples were collected from 14 treatment plants located in different parts of the Spanish territory, with equivalent inhabitant values ranging from 60,600 to 1,900,800. The samples taken between April 2020 and January 2021 encompass the three waves that have affected the country. The first wave occurred between March and April of 2020, the second wave in November 2020, and the third wave between January and February 2021. For each sample, 200 mL of wastewater samples were artificially inoculated with porcine epidemic diarrhea virus (PEDV) as process control with a final concentration of 4.5 log (PCRU/L), and concentrated following an aluminum-based adsorption precipitation method (AAVV, 2018; Pérez-Cataluña et al., 2021; Randazzo et al., 2020b) . Then, 200 mL of wastewater was adjusted to pH 6.0. Precipitation by Al(OH)3 was carried out by mixing 1 part of 0.9 N AlCl3 per 100 parts of sample. Next, the solution was mixed at 150 rpm for 15 min, centrifuged at 1,700 × g for 20 min, and the resulting pellet was resuspended in 10 mL of 3% beef extract (pH 7.4) then stirred at 150 rpm for 10 min at room temperature (RT). Finally, the suspension was centrifuged at 1,900 × g for 30 min and the pellet resuspended in 1 mL of phosphate buffered saline solution (PBS, pH 7.4). After this, concentrated samples were stored at -80 ° C until analysis. Nucleic acid extraction from wastewater concentrates was performed using an automated method with the Maxwell RSC Pure Food GMO and authentication kit (Promega) with slight modifications (Pérez-Cataluña et al., 2021) . Firstly, 300 μL of concentrated samples were mixed with 400 μL of cetyltrimethyl ammonium bromide (CTAB) and 40 μL of proteinase K solution. The mixed sample was incubated at 60 °C for 10 min and centrifuged for 10 min at 16,000 × g. Next, the resulting supernatant was transferred to the loading cartridge and 300 μL of lysis buffer added. The cartridge was then loaded in the Maxwell® RSC Instrument (Promega) using the "Maxwell RSC Viral total Nucleic Acid" running program for the nucleic acid extraction. The obtained RNA was eluted in 100 μL nuclease-free water. Negative controls were included by using nuclease-free water instead of concentrated sample. PrimeScriptTM RT-PCR Kit (Perfect Real Time) (Takara Bio, USA) targeting a genomic region of the nucleocapsid gene (N1 region) using primers, probes and conditions previously described (CDC, 2020) . The complete genomic RNA of SARS-CoV-2 (ATCC VR-1986D) and nuclease free water were used as positive and negative controls, respectively. Samples with RT-qPCR cycle threshold (Ct) values below 36 were selected for sequencing analysis. Genomic sequencing of SARS-CoV-2 present in selected wastewater samples was carried out following ARTIC protocol version 3 for retrotranscription and amplification by multiplex PCR (Quick, J. (2020) ; https://www.protocols.io/view/ncov-2019-sequencingprotocol-v3-locost-bh42j8ye). Sequencing libraries were built using the Nextera Flex kit (Illumina) and sequenced on Illumina MiSeq platform by paired-end reads (2x200). Raw reads were cleaned for adaptors and low quality nucleotides by using cutadapt software (Martin, 2011) and reformat.sh from bbmap (sourceforge.net/projects/bbmap/), respectively. Nucleotides with Phred score lower than 30 were discarded. Clean reads were aligned to the genome of SARS-CoV-2 isolate Wuhan-Hu-1 (MN908947.3) using the Burrows-Wheeler Aligner v0.7.17-r1188 with default parameters ) and indexed by samtools . For the analysis of genomic coverage for each sample, only nucleotides with at least 20X depth were taken into account. Nucleotide substitutions and deletions regarding SARS-CoV-2 isolate Wuhan-Hu-1 genome (MN908947.3) were detected with the aligned reads using mpileup from samtools (Li, 2011) and the command variants of ivar software (Grubaugh et al., 2019) . For the assumption of one nucleotide polymorphism, at least a 50X depth of the alternative nucleotide and quality score higher than 30 were used as cutoff. Alignments were manually curated to avoid nucleotide substitutions that corresponded to incorrectly trimmed adaptors (Nemudryi et al., 2020) . Information about SARS-CoV-2 mutation distribution worldwide was obtained from outbreak.info (Mullen et al., 2020) . A total of 76 sewage samples positive for SARS-CoV-2 by RT-qPCR (Ct < 36) collected throughout the three epidemiological waves were sequenced during this study (Supplementary figure S1). Samples were grouped in three regions: north (2 WWTPs, n=8), center (7 WWTPs, n=39), and south (5 WWTPs, n=29). Results showed Ct values of SARS-CoV-2 target N1 ranged from 26.59 to 34.75 (Table S1) Sequence analysis showed a total of 627 nucleotide substitutions and 20 deletions (Table 1) Table 1) . Some of these nucleotide substitutions and deletions were present along with the homologous nucleotide of SARS-CoV-2 isolate Wuhan-Hu-1 genome (Figure 2 ) evidencing the presence of multiple genomes in wastewaters. Mean values of the frequency of these nucleotide polymorphisms were 70±35% for synonymous substitutions, 56±38% for non-synonymous substitutions and 27±18% for deletions in samples from the first and second waves (Figure 2A) . These values in the third wave samples were 53±34%, 43±34%, and 19±22% for synonymous substitutions, non-synonymous substitutions, and deletions, respectively ( Figure 2B ). Table 2 shows the non-synonymous nucleotide substitutions (n=49) and deletions (n=3) found in the spike glycoprotein gene. Among these polymorphisms, 18 of them were not previously described in genomes obtained from Spanish sequences, according to the database available at https://outbreak.info/ (Mullen et al., 2020) . However, two of these nucleotide substitutions (amino acid substitutions G404V and G648V) have been found at low frequencies among the reads obtained in the sequencing of Spanish genomes from clinical samples. These results evidence the ability of this technique to detect mutations that are in low percentage in the viral population and from different lineages. Interestingly, some of these amino acid substitutions in the spike protein were found in sewage at the same time or even weeks or months before their appearance in genomes from clinical samples. For example, among nucleotide substitutions that have been detected in Spain in clinical samples, two spike mutations (G639S and V642G) were found in waters around the same time that they appeared in clinical genomes, while spike mutations A648V was found in waters 6 weeks before, and mutations S884F, G404V, and A372T were found in waters between 4 and 5 months before their detection in clinical genomes. It should be noted that, in the case of G404V, its first detection occurred at very low percentages of sequencing reads in some clinically obtained genomes (n=2), and its appearance at higher frequencies in one clinical genome was 5 months later. Additionally, for these genomic mutations, the number of clinical cases was very low, ranging from 1 to 6 cases. Moreover, mutations A893T, L1152S, and N1173K had not been detected in Spanish clinical genomes but their detection in other countries occurred after detection in Spanish wastewater, more specifically 3, 4, and 8 months before, respectively. These results, along with those obtained by other authors who found genomes and single nucleotide polymorphisms (SNPs) widely described in the clinical samples (Crits-Christoph et al., 2021; Izquierdo-Lara et al., 2021) , show that high-throughput sequencing of SARS-CoV-2 in wastewater is a very useful complementary tool for studies and decisionmaking related to the epidemiology of the virus. Identification of B.1.1.7 (VOC 202012/01) The highly transmissible B.1.1.7 lineage of SARS-CoV-2 contains 16 characteristic nonsynonym nucleotide substitutions and deletions (Rambaut et al., 2020) and was first detected in Spain in week 52 of 2020 in different Spanish regions (Madrid, Basque Country, and the Balearic islands). The characteristic mutations described in the genome of the B.1.1.7 lineage were searched for in our sequencing data. These mutations corresponded to 2 nucleotide substitutions and one deletion in ORF1a, 6 nucleotide substitutions and 2 deletions in spike gene, 3 nucleotide substitutions in ORF8, and one nucleotide substitution in N gene (Figure 3 ). Amino acid substitutions S235F of nucleocapsid protein was not shown because it was absent or not covered. Samples with Ct values below 36 for N1 were analyzed, starting from the week 52 of 2020 up to week 7 in the case of samples from region C5. Among the analyzed samples, only samples from regions S1, S3, C3, C4, and C5 showed characteristic mutations of the B.1.1.7 lineage. None of the samples showed all the 18 markers that were searched for, at the same time. The highest presence and frequency of mutations was found in nucleocapsid (detected in 60% of the samples) and in ORF8 region (detected in 45% of the samples), and the lowest in ORF1a. Interestingly, the three characteristic mutations in ORF8 (Q27stop, R52I, and Y73C) were detected together in 4 of the 9 samples. Only one sample, S1-4-2021, showed 9 out of 15 characteristic mutations. The deletion of spike amino acids 69 and 70 (S:Δ69/70) was found in 2 samples from different geographical regions, that were present along with deletion ΔY144. Although some of these mutations can belong to different lineages (Table 2) SARS-CoV-2 has created a pandemic scenario unprecedented in modern times. The rapid spread of this virus together with the appearance of emerging linages has also mobilized the scientific community like never before. Its detection in wastewater has been very helpful for the epidemiological study in large populations and is currently being implemented worldwide. However, few studies using mass sequencing have been published. • The present study describes the mutations found in SARS-CoV-2 genomes isolated from wastewater in 14 different regions of Spain. This is the first study carried out in Spain that analyzes the diversity of SARS-CoV-2 present in wastewater in the three epidemiological waves which occurred between 2020 and 2021. • These results confirm the potential of sewage sequencing to detect new mutations and lineages of SARS-CoV-2, which is of utmost relevance for the monitoring efforts of emerging vaccine-escape SARS-CoV-2 mutants in the forthcoming post-vaccination era. • Genomic sequencing of viruses found in wastewater provides complementary results to those of clinical laboratories, as has been demonstrated in various ways such as the confirmation of the initial detection of low number of reads on genomes from clinical specimens that was later confirmed in wastewater samples; the detection of amino acid substitutions in the spike protein weeks or months before their discovery in clinical samples; or the known amino acid substitutions in the spike protein detected for the first time in Spain. • This technique provides complementary information for SARS-CoV-2 surveillance, allowing both the control of lineages including VOC and VOI already described and the detection and control of new emerging lineages. • This data supports the hypothesis that the study of wastewater using high-throughput sequencing techniques is a useful and effective tool that can be implemented worldwide in support of public health for the epidemiological control of SARS-CoV-2. Wastewater-Based Epidemiology Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Genome Sequencing of Sewage Detects Regionally Prevalent SARS-CoV-2 Variants High throughput sequencing based direct detection of SARS-CoV-2 fragments in wastewater of Pune Identification of a Novel Coronavirus in Patients with Severe Acute Respiratory Syndrome An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar Genome Sequencing of SARS-CoV-2 Allows Monitoring of Variants of Concern through Wastewater Monitoring SARS-CoV-2 Circulation and Diversity through Community Wastewater Sequencing, the Netherlands and Belgium A Novel Coronavirus Associated with Severe Acute Respiratory Syndrome Key SARS-CoV-2 Mutations of Alpha, Gamma, and Eta Variants Detected in Urban Wastewaters in Italy by Long-Read Amplicon Sequencing Based on Nanopore Technology Fast and accurate short read alignment with Burrows-Wheeler transform The Sequence Alignment/Map format and SAMtools Cutadapt removes adapter sequences from high-throughput sequencing reads Presence of SARS-Coronavirus-2 RNA in Sewage and Correlation with Reported COVID-19 Prevalence in the Early Stage of the Epidemic in The Netherlands Temporal Detection and Phylogenetic Assessment of SARS-CoV-2 in Municipal Wastewater Comparing analytical methods to detect SARS-CoV-2 in wastewater Making waves: Wastewater-based epidemiology for SARS-CoV-2 -Developing robust approaches for surveillance and prediction is harder than it looks 2020. nCoV-2019 sequencing protocol v3 (LoCost). protocols Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations -SARS-CoV-2 coronavirus / nCoV-2019 Genomic Epidemiology -Virological Metropolitan wastewater analysis for COVID-19 epidemiological surveillance SARS-CoV-2 RNA in wastewater anticipated COVID-19 occurrence in a low prevalence area Monitoring SARS-CoV-2 variants alterations in Nice neighborhoods by wastewater nanopore sequencing Recent Insights into Emerging Coronavirus: SARS-CoV-2. ACS Infect. Dis. acsinfecdis.0c00646 Structure-Function Analyses of New SARS-CoV 1.351 and B.1.1.28.1: Clinical, Diagnostic, Therapeutic and Public Health Implications Evidence for Gastrointestinal Infection of SARS-CoV-2 Isolation of a Novel Coronavirus from a Man with Pneumonia in Saudi Arabia Impact of mutations in SARS-COV-2 spike on viral infectivity and antigenicity A Novel Coronavirus from Patients with Pneumonia in China This study was supported by projects "VIRIDIANA" (AGL2017-82909/ AEI/FEDER, The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.Names of specific vendors, manufacturers, or products are included for informational purposes only and does not imply endorsement by Authors or their affiliations.