key: cord-0853184-1qkmc8si authors: Agrawal, S.; Orschler, L.; Schubert, S.; Zachmann, K.; Heijnen, L.; Tavazzi, S.; Gawlik, B. M.; de Graaf, M.; Medema, G.; Lackner, S. title: A pan-European study of SARS-CoV-2 variants in wastewater under the EU Sewage Sentinel System date: 2021-06-14 journal: nan DOI: 10.1101/2021.06.11.21258756 sha: 12819760c8a5fdf64ebb721877b6581ff357d5f2 doc_id: 853184 cord_uid: 1qkmc8si Wastewater based surveillance employing qPCR has already shown its utility for monitoring SARS-CoV-2 at community level, and consequently the European Commission has recommended the implementation of an EU Sewage Sentinel System. However, using sequencing for the determination of genomic variants in wastewater is not fully established yet. Therefore, we focused on the sequencing analysis of SARS-CoV-2 RNA in wastewater samples collected across 20 European countries including 54 municipalities. The results provide insight into the abundance and the profile of the mutations associated with the variants of concerns: B.1.1.7, P.1, B.1.351 and B.1.617.2, which were present in various wastewater samples. This study shows that integrating genomic and wastewater-based epidemiology (WBE) can support the identification of variants circulating in a city at community level. Undeniably, the sudden emergence of SARS-CoV-2, which has caused a global pandemic, is 2 a significant and in many regards unprecedented threat to public health. SARS-CoV-2 rapidly 3 resulted in a high number of people requiring hospitalization, casualties and major socio-4 economic disruptions, with consequences which we still do not fully oversee. Consequently, 5 most countries have been forced to implement severe lockdown measures to ensure the 6 physical distance between people and interrupt virus transmission (1). Overall, the high 7 transmission rate and the rapidly evolving nature of the virus, leading to the emergence of 8 new variants that may transmit more readily and evade the immune response, raise broad 9 concerns about SARS-CoV-2 (2-4). 10 The current phase of the COVD-19 pandemic is shaping into an era of genomic surveillance 11 to track the genomic changes in the SARS-CoV-2 virus, which belongs to the family 12 Coronaviridae, genus Betacoronavirus. According to PANGO lineages as of now, 1503 13 SARS-CoV-2 variants are known since the initial detection of SARS-CoV-2 by sequencing 14 (5) . In the last few months, genomic epidemiology, the analysis of genome sequences, has 15 revealed some fast-spreading and highly virulent SARS-CoV-2 variants (3, 6, 7), making 16 them variants of concern (VOC) (8, 9) . This development underlines the importance of CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint European countries that had provided wastewater samples (Fig. 1) . The increased COVID-19 5 incidence rate corresponded with the increase in sequences of B.1.1.7 among the clinical 6 samples sequenced in these countries. For P.1 and B.1.351, no general pattern appeared and 7 they were only present to a small fraction. During this time, also very few B.1.617.2 8 sequences were reported (Fig. 1) . While looking at the emergence of the mutations during 9 March 2021, we found that among the most abundant mutations across all countries, the 10 D614G (spike protein) and P323L (non-structural protein, Nsp12) mutations were detected in 11 all the countries (S. fig.2 ). CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint Co-occurrences of D614G and P323L have often been reported (24), however, some samples 1 from Germany and China lacked P323L (25). The spike protein N501Y and H69del 2 mutations were also highly abundant in most of the countries, earlier associated with B.1.1.7 3 and later with all VOCs except B.1.617.2 (S. fig.1, S.fig.2 ). In some countries the S106del, 4 G107del, F108del mutations, which are ORF1b signature mutations of the VOCs P.1, B.1.1.7 5 and B.1.351 (26), were also among the abundant mutations (S. fig.2 ). Across the twenty 6 European countries twenty-six ORF1ab mutations, fourteen spike protein, eight nucleocapsid 7 (N) protein, six ORF8, and three ORF3 mutations were among the dominant mutations, 8 exhibiting spatial and temporal variation (S. fig.2 ). these mutations, 619 mutations were observed at >2.5%, 311 mutations at >5% and 23 20 mutations at >50% allele frequency (Fig.2) . Most of the 23 mutations were present in all but 21 a few samples. For example, the W131C mutation was only detected in wastewater samples 22 from Denmark (Fig.2 ). W131C is one of the important mutations in ORF3a, which is found 23 to assist the ion channel formation and thereby supports the virus in its infectivity (27). The 24 A220V mutation was only identified in samples from Lithuania (Fig.2) , which corresponds to 25 the high count of A220V reported in clinical patient samples (S. fig.2 ). 26 27 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Although many low-frequency mutations were observed, the read abundance of low-9 frequency mutations was mostly similar to the abundance of the high-frequency mutations 10 (S. fig.4 ). The highest count (ranging between 16-60) was observed for mutations associated 11 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint with B.1.1.7 in all the samples (Fig.3) , followed by B.1.351. P.1 and B.1.617.2, which had a 1 low count of associated mutations but included signature mutations. For example, signature 2 spike protein mutations (i.e. L452R, T478K, P681R, D950N) (9, 28) of B.1.617.2 were 3 identified in some of the wastewater samples. Abundance of Spike protein AA mutations 6 As the spike protein AA mutations have been associated with changes to characteristics of 7 SARS-CoV-2, leading to an increase in transmissibility and reduced efficacy of treatments, a 8 particular list of spike protein AA mutations is used by the European Centre for Disease 9 Prevention and Control (ECDC) for characterizing the VOCs (9, 29). Therefore, we also 10 assessed the read abundance of these spike protein AA mutations in our wastewater samples 11 (Fig. 3 ). 12 Overall, D614G was most abundant, followed by: P681H, T716I, A570D, S982A, H69del, 13 Y144del, D1118H, N501Y, K417N, E484K and others, in decreasing order. Only six out of 14 the 27 AA mutations (i.e. D1118H, D614G, H69del, N501Y, P681H, S982A, and T716I) 15 were present in all the samples (Fig.3 ). A570D and Y144del were identified in 53 samples. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint (32). Another two ORF8 protein mutations, Y73C and K68stop, were present in most of the 1 samples. The Y73C mutation is known to be unique in B.1.1.7 (33). Although no clear 2 relevance of K68stop is known, it is reported to be present in SARS-CoV-2 genomes with the 3 highest number of spike protein mutations (32). The P681H spike protein mutation was the 4 third most abundant across the samples. In total, 13 spike protein mutations were amongst the 5 dominant mutations (Fig. 4 ). CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint 8 mutations in genomes deposited in GISAID (https://www.gisaid.org) is low, which 1 according to a previous study is due to bias in samples sequenced (32). Across most of the 2 wastewater samples, we detected a high occurrence of ORF8 mutations (i.e. Q27stop, R52I) 3 ( Fig. 4) , which provides evidence for the circulation of SARS-COV-2 variants containing 4 these mutations in the sampled regions. Also, wastewater sequencing data can reveal spatial 5 prevalence of mutations. For example, the Q57H mutation was dominant in all three samples 6 from Finland, which is consistent with clinical genomic data of Finland (40). This We gratefully acknowledge the contribution from the originating laboratories responsible for 14 obtaining the specimens and the submitting laboratories where genetic sequence data were 15 generated and shared via the GISAID Initiative (https://www.gisaid.org). We thank all 16 WWTP operators for providing wastewater samples, We also thank Ray Izquierdo-Lara for 17 critically reading the manuscript. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint Sequencing For this study, 24 h composite wastewater samples were collected between weeks 10 -13 of 2021 (i.e. 10 th to 30 th March 2021) and shipped TU Darmstadt (Darmstadt, Germany), packed with icepacks (approximately at 6°C), for sequencing analysis. In Darmstadt, one litre of the untreated wastewater was filtered through a 0.45 μm electronegative membrane filter to concentrate the SARS-CoV-2 RNA, followed by extraction using the Fast RNA Blue Kit (MP Biomedicals) according to the manufacturer's protocol. Another 500 ml of the untreated wastewater was concentrated by ultrafiltration in 100 kDa Centricon® Plus-70 centrifugal ultrafilters (Merck) and RNA was extracted using the Ultra Microbiome kit (Thermofisher Scientific) according to the manufacturer's protocol. Both RNA extracts were pooled together for downstream analysis. From the pooled RNA, cDNA was synthesized using SuperScript™ VILO™ Master Mix (Thermofisher Scientific), followed by library preparation using the Ion AmpliSeq SARS-CoV-2 Research Panel (Thermofisher Scientific) according to manufacturer's instructions. This panel consists of 237 primer pairs, resulting in an amplicon length range of 125-275 bp, which cover the near-full genome of SARS-CoV-2. We performed multiple sequencing runs to achieve a high number of reads per sample. For each sequencing run, eight libraries were multiplexed and sequenced using an Ion Torrent 530 chip on an Ion S5 sequencer (Thermofisher Scientific) according to manufacturer's instructions. We used the SARS-CoV-2 Research Plug-in Package, which we installed in our Ion Torrent Suite software (v5.12.2) of Ion S5 sequence. We used the SARS_CoV_2_coverageAnalysis (v5.16) plugin, which maps the generated reads to a SARS-CoV-2 reference genome (Wuhan-Hu-1-NC_045512/MN908947.3), using TMAP software included in the Torrent Suite. The summary of mapping of each sample is provided in S. Table 1 . For mutation calls, additional Ion Torrent plugins were used, similar to our previous study (1). First, all single nucleotide variants (SNVs) were called using Variant Caller (v5.12.0.4) with "Generic -S5/S5XL (510/520/530) -Somatic -Low Stringency" default parameters. Then, for annotation and determination of the base substitution effect, COVID19AnnotateSnpEff (v1.3.0.2), a plugin developed explicitly for SARS-CoV-2, was used. The raw metagenomic sequence data were uploaded to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under Submission ID SUB9829162, BioProject number PRJNA736964. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint S. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint qPCR Methods Samples were received at the KWR laboratory and processed as previously described (2) . The N2 assay targeting a fragment of the nucleocapsid gene, as published by US CDC (US-CDC 2020), was used to quantify SARS-CoV-2 RNA in the sewage samples. All RT-PCR's were run as technical duplicates on 5 µl extracted nucleic acid. RT-qPCR reactions on serial dilutions containing RT-ddPCR calibrated EURM-019 single stranded RNA (provided by the Joint Research Centre) were used to construct calibration curves that subsequently were used to quantify N2 in RNA extracted from the sewage samples. Reactions were considered positive if the cycle threshold was below 40 cycles. CrAssphage CPQ_064 specific PCR (3) was used to quantify this DNA-virus that is ubiquitously present exclusively occurs in human intestinal tracts in high concentrations. Assays were performed in duplicate on 5 µl 1:10 diluted extracted nucleic acid. Quantification was performed using PCR assays on dilution series of a synthetic quantified gBlock (obtained from IDT, Leuven, Belgium) containing the CPQ_064 gene fragment. We downloaded the variant surveillance data package from GISAID on 31th May 2021. This data package consists of information about the identified variants, the corresponding amino acid (AA) mutations and sample location. We filtered the dataset, limiting it to human samples with complete coverage. This dataset was used to determine associations between the amino acid (AA) mutations detected in wastewater samples and AA mutations, with their corresponding pangolin lineage, reported from clinical samples. From the GISAID data package, we also determined the fraction of clinical samples reporting the current variants of concern (VOC): (1) B.1.1.7, (2) P.1, (3) B.1.351, (4) and B.1.617.2 (4, 5). Data analysis was performed in R(v3.6.2) using the ggplot (v3.3.3) package for data visualization, and pheatmap (v1.0.12) for hierarchy clustering and heatmap construction. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint We determined the abundance spike protein mutations, which are suggested by the European Centre for Disease Prevention and Control (ECDC) for the characterization of the current VOCs (5), reported for clinical human samples. We found that the abundance of the mutations varied for the respective country, though associated with the same VOC (S. fig.1 ). For example, four out of eight spike protein mutations for P.1 for sequences from Slovakia were not reported. qPCR results SARS-CoV-2 N2-gene RNA was detected in all 54 samples in concentrations ranging from 0.4 -735 gene copies/ml (S. fig 3) . The concentration protocol for NGS provided sufficient read depth (S. Table 1) , even with the low SARS-CoV-2 concentration samples. The crassphage concentration (S. fig 3) is an index of the dilution of human fecal input in wastewater. This varied between the cities; in INF_21002_B and INF_21003_BA the Crassphage concentrations were markedly low, indicating high dilution of human fecal input in these wastewaters. Normalizing the SARS-CoV-2 N2 concentration for this dilution would markedly change the ranking of the wastewaters, which implies that normalization of 'raw' SARS-CoV-2 concentrations in wastewater for the level of dilution of human fecal input is necessary when linking these data to COVID-19 prevalence data. Fraction of variants of concern in wastewater samples We determined the relative abundance of the VOCs based on the abundance of reads associated with certain AA mutations. As there is quite an overlap between the AA mutations of different SARS-COV-2 variants, for better determination of the relative abundance of the VOCs, we specifically looked for the abundance of unique and shared AA mutations corresponding to each VOC (S. fig.5 ). Categorization of mutations as unique or shared was based on the percentage of sequences for associated mutations submitted in GISAID. We looked for percentage of sequences for each mutation for every lineage. Then for each VOC, mutations reported in more than 0.5% of total number of sequences for each VOC were selected. Among these selected mutations, mutations that are associated with more than one lineage were categorized as shared mutation otherwise they were associated with the respective VOC. The abundance of mutations associated with B.1.1.7 was highest among the samples ranging from 15 to 40%, followed by abundance of mutations associated with B.1.351, P.1, and B.1.617 (S. fig.5 ), which is similar to the clinical sequencing data. The mutations associated with B.1.351 were detected in 33 samples, whereas for B.1.617 were detected in 21 samples and for P.1 in 15 samples. B.1.351 mutations were detected in all samples from Finland, Germany, and Sweden; however, the total relative abundance of these mutations varied from 2 to 8%. The relative abundance of the shared AA mutations accounted for 55 to 70% across all samples (S. fig.5 ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint S. fig. 1 : Count (occurrence of mutation in number of genome sequences submitted in GISAID) of spike protein mutations, which are considered for characterization of VOC by ECDC (5, 6) , found in the variant surveillance data for clinical samples of GISAID dated 31 st May 2021. The counts are presented in log10 scale. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint S. fig. 2 : Heatmap representing the top 20 abundant mutations in each country during different time period, based on the count (occurrence of mutation in genome sequences submitted in GISAID) of the mutations. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint S. fig. 3 : qPCR based analysis showing the concentration of SARS-CoV-2 N2 gene copies detected in each sample. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint S. fig. 4 : Scatter plot showing the distribution of the read abundance of each mutation against the allele frequency of each mutation. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint S. fig. 5 : Relative abundance of the variants of concern (VOC) and other variants, based on the abundance of the reads associated with each SNP, respectively. AA mutations shared among different SARS-CoV-2 VOCs are represented as "B. 1.1.7_B.1.351_B.1.617_P.1". . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258756 doi: medRxiv preprint Metatranscriptomic Analysis Reveals SARS-CoV-2 Mutations in Wastewater of the Frankfurt Metropolitan Area in Southern Germany Presence of SARS-Coronavirus-2 RNA in Sewage and Correlation with Reported COVID-19 Prevalence in the Early Stage of the Epidemic in The Netherlands Quantitative CrAssphage PCR Assays for Human Fecal Pollution Measurement European Centre for Disease Prevention and Control, SARS-CoV-2 variants of concern as of 3 Risk realted to the spread of new SARS-CoV-2 variants of concern in the EU/EEA -first update