key: cord-0303942-2zyiyya4 authors: Fuhrmeister, E. R.; Voth-Gaeddert, L. E.; Metilda, A.; Tai, A.; Batorsky, R. E.; Veeraraghavan, B.; Ward, H.; Kang, G.; Pickering, A. title: Surveillance of potential pathogens and antibiotic resistance in wastewater and surface water from Boston, USA and Vellore, India using long-read metagenomic sequencing date: 2021-04-26 journal: nan DOI: 10.1101/2021.04.22.21255864 sha: fb86620cd6a7fa7374e4aa731502f9d0cd3fd7a3 doc_id: 303942 cord_uid: 2zyiyya4 Environmental sampling (wastewater) could be an efficient surveillance strategy to capture global emerging trends in the spread of antibiotic resistance. Long-read DNA sequencing can resolve the genetic context of antibiotic resistance genes (ARGs) and is a promising tool for non- culture-based monitoring of antibiotic-resistant pathogens and ARGs in environmental samples, but has not been rigorously validated against conventional methods. We tested long-read sequencing using the portable Nanopore MinION for surveying pathogens, ARGs, and antibiotic- resistant pathogens in municipal wastewater, hospital wastewater, and surface water collected from Boston, USA and Vellore, India. We compared detection of enteric pathogens by assembly of long reads, with and without short-read polishing, and unassembled raw long reads for ARGs to multiplex real-time PCR. Using real-time PCR as a benchmark, long-read metagenomics was 49% sensitive and 75% specific at pathogen detection in assembled contigs, and 16% sensitive and 100% specific at detecting 28 clinically relevant resistance genes in raw long reads. Short- read polishing did not substantially improve pathogen identification or impact ARG identification in the assembled contigs, demonstrating that short-read polishing is not required, which greatly reduces costs. The high specificity of ARG detection supports portable long-read sequencing as a valuable tool to profile ARGs and antibiotic-resistant pathogens for environmental surveillance programs. Globally, the proliferation of antibiotic resistance remains a critical threat to public health. Antibiotic-resistant infections are more prone to treatment failure, which in turn increases mortality rates, treatment time, and health care costs 1 . Antibiotic-resistant bacteria (ARB) have been documented in the environment (e.g. soils, surface water, sediments), particularly in locations with anthropogenic pressures such as antibiotic residues in wastewater outflows 2,3 . However, human and environmental reservoirs of ARB are not well understood in part because there has been limited standardized monitoring. Existing antibiotic surveillance systems are largely limited to isolates from clinical settings 4,5 . Antibiotic resistance compounds the threat of enteric pathogen infection (e.g. Shigella, Salmonella typhi, Helicobacter pylori) which is common in many low-and middle-income countries 6-8 . Similar to antibiotic resistance, there are few standardized monitoring systems in place for enteric pathogens in high-or low-income countries. The WHO Global Foodborne Infections Network 9 is a promising surveillance system that surveys isolates of antibiotic-resistant enteric pathogens in humans, animals, and the environment but has been limited to selected pathogens. The rapid global expansion of wastewater surveillance for monitoring SARS-CoV-2 10 presents a unique opportunity to implement a system for antibiotic resistance and other pathogens. Surveying wastewater is an efficient way to monitor enteric pathogens and antimicrobial resistance excreted in feces over large populations 11 . A robust surveillance system would also include environmental reservoirs (e.g. soil and water) as well as animals to encompass a "One Health" approach 12 . One barrier to implementing monitoring systems is selecting appropriate methods for detection. Previous studies have used diverse methods to detect antibiotic resistance genes (ARG) in wastewater, sludge, soil, water, and sediment, making it difficult to compare results across locations and sample types 13 . Culture-based methods are able to identify phenotypic resistance, however most bacteria are not easily cultured and the number of isolates examined can vary widely across studies 13 . Polymerase chain reaction (PCR) can detect low abundance genes, but targets must be pre-specified prior to analysis and determining cooccurrence of multiple genes within the same organism (e.g. pathogen gene and antibiotic resistance gene located in the same genome) is not possible. High-throughput sequencing is a promising tool for building improved antibiotic resistance monitoring systems 14 . Gene targets do not need to be pre-specified which allows for broad assessments of ARG diversity. However, for the most common platforms (e.g. Illumina), startup costs remain high, turnout time can be slow, devices are not easily moveable, and short DNA reads are limited in their ability to reveal the genetic context of ARGs 15, 16 . The Oxford Nanopore Technologies (ONT) MinION is a pocket-sized, portable sequencing device that has been used for real-time detection of pathogens during outbreaks (e.g. Zika virus, SARS-Cov-2) 17, 18 and to detect ARGs in cultured isolates 19 . Nanopore sequencing differs from other next-generation technologies in that DNA is run through protein pores, changing the electrical current which is analyzed to determine the sequence 20 . The small size and portability of the device renders it a good candidate for surveillance applications around the world. While there are significant costs associated with flow-cells and reagents, there is no large capital investment needed for the sequencing devices as opposed to other metagenomic sequencing technologies (Illumina). However, there has been limited application of the Nanopore MinION for detection of pathogens or ARGs from environmental water samples and few studies have used metagenomic long-read sequencing (as opposed to amplicon long-read sequencing) [21] [22] [23] [24] . In the study of antibiotic resistance genes in the environment, there is a need to link resistance genes to their host organisms in order to identify pathogens carrying resistance and evaluate their genetic context. While long reads are promising in this area, the read length obtained can vary significantly depending on DNA extraction procedures and library preparation protocols. For example, bead beating, a common procedure used to lyse DNA in hardy environmental samples (e.g. wastewater, activated sludge, soil), can shear DNA fragments. We therefore investigated the utility of assembled long reads for surveillance since assembly can improve taxonomic classification and identification of mobile genetic elements (plasmids) 25 . In this study, we compare long-read sequencing via the Nanopore MinION to real-time PCR for identifying pathogens and ARGs in environmental waters. We also compare identification of enteric pathogens and pathogens from the CDC threat list for ARGs with and without short-read polishing of long-read assembled contigs. Furthermore, we test these methods across different water sample types including wastewater, surface water, and hospital wastewater from Boston, USA and Vellore, India. In India, several recent studies reported high levels of ARGs in child stool samples and in wastewater 26, 27 . Prevalence and diversity of resistance genes varies geographically 14 , and the ARGs and antibiotic-resistant bacteria causing resistant infections in the US may be different than those in India 28 . Environmental samples were collected from the greater Boston area, in Massachusetts, USA (n=6; October 2017) and within the city of Vellore, in Tamil Nadu, India (n=6; April 2018). The sampling locations included three different types of environmental samples hypothesized to have different profiles of ARGs and microbial taxa. Boston sampling included wastewater (influent) entering three wastewater treatment plants (WWTPs), two surface water samples from rivers (downstream of the WWTP discharge points), and one hospital wastewater sample (collected before entering the combined public sewer). Vellore sampling included wastewater (influent) entering two WWTPs, three surface water samples from drainage canals, and one hospital wastewater sample (collected before onsite treatment occurred). Whirl Pak bags (Nasco, Modesto, CA) were used to collect 2.5L of water from each sampling site. Samples were concentrated by 0.22 µm Nanoceram (Argonide, Sanford, FL) disc filters and DNA was extracted from filters using the Qiagen DNeasy PowerWater kit (Germantown, MD). A mock mixed culture of six bacterial species with and without antibiotic resistance was developed by the Christian Medical College (CMC) laboratory for analysis by long-read sequencing (Table S1) . Further details on sample collection, processing, and DNA extraction can be found in the supplementary material. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) (Table S2) . Samples were also analyzed using a real-time PCR antibiotic resistance array card (Qiagen, Cat. no. 330261 BAID-1901ZRA), which has 83 ARG targets (Table S3) . Long-read sequences were obtained using an R9.4.1 flow cell on Oxford Nanopore Technology's (ONT) MinION (Oxford, United Kingdom). Library preparation was conducted using the SQK-LSK109 ligation sequencing kit (ONT) according to the manufacturer's instructions. DNA extracts of all samples, except for the mock community, were sequenced on an Illumina HiSeq2500 (PE150 Rapid V2) by the Tufts Genomic Core Facility. The Nextera DNA XT kit (Illumina, San Diego, CA) was used for library preparation. The datasets generated in this work are available in NCBI's Sequence Read Archive under BioProject PRJNA672704. Long read sequences were basecalled and trimmed using Guppy (ONT, v3.3.0). Any reads with a quality score < 7 were removed 30 . Bbmap (v38.7 ) 31 was used to remove adapters and PhiX sequences from Illumina short reads and sickle 32 was used for quality trimming. Long read sequences were assembled using metaFlye (v2.6) 33 and polished using Racon (v1.4.13) 34 , followed by Medaka (ONT, v0.11.0). For short-read polishing, contigs were polished again with Racon using Illumina reads. In this study, "polished contigs" or "short-read polishing" refers to assembled contigs that underwent the additional polishing step, where short reads are aligned to the long-read assembly in order to improve the sequence accuracy, whereas "unpolished contigs" refers to contigs that did not undergo polishing with short reads. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) was used to identify contigs as plasmids 25 . To calculate abundance of resistance genes and compare ARG identification between long-read sequencing and real-time PCR, long reads, without short-read polishing, were aligned to the ResFinder database as described above. For absolute abundance, ARG counts were normalized by gigabase pairs (Gbp) of data classified as bacterial taxa when aligned to the RefSeq database as described above. All data analysis was conducted in R (v3.5.0). In our analysis of potential pathogens, we included contigs that were classified as A. hydrophila, B. fragilis, C. jejuni, C. coli, C. difficile, H. pylori, Salmonella spp., V. cholerae, A. baumannii, E. feacium, K. pneumoniae, Shigella spp., S. aureus, S. pneumoniae, and P. aeruginosa. This set of pathogens encompasses bacterial pathogen targets included on the pathogen array card and those on the CDC threat list for antibiotic resistance 39 . In our analysis comparing long-read sequencing to real-time PCR for ARG identification, we selected a subset 28 resistance genes from the array card to compare to Resfinder based on known clinical relevance. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04. 22.21255864 doi: medRxiv preprint In order to compare detection of pathogens by real-time PCR and long-read sequencing, we calculated sensitivity and specificity using real-time PCR as the gold standard. Here, sensitivity was defined as the true positive rate and specificity as the true negative rate. To evaluate pathogen classification we used the Centrifuge score, or sum of squared lengths of matched segments 38 . Significant differences between unpolished and polished contigs were assessed using log10 transformed Centrifuge scores and paired t-tests. When comparing scores for individual pathogens, p-values were adjusted to correct for multiple comparisons (15 pathogens) using the Bonferroni method 40 . To compare antibiotic resistance identification between unpolished and polished contigs, we calculated concordance by dividing the number of times the same ARG was identified in the polished and unpolished contigs by the total number of ARG identifications in the contigs (total hits). An average of 6.6 Gbp were obtained per sample from Illumina sequencing (Table S4) . Long-read sequencing yielded an average of 3.4 Gbp of data per sample, excluding the mock community. All real-time PCR NTCs and extraction blanks were negative for pathogen gene targets. The resistance gene ermB amplified in the real-time PCR no template control and was therefore excluded from the analysis. All contigs assembled from long reads in the mock community sample containing ARGs were identified as one of the five expected taxa at the species or subspecies level or as a plasmid ( Figure S1 ). Of the 75 contigs assembled in the mock community sample, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint including contigs without ARGs, 72 were classified as one of the expected six taxa (five with ARGs, one without) at the species or subspecies level. By real-time PCR, Boston samples were positive for, on average, 12 ± 6 out of the 24 possible bacterial pathogen gene targets whereas Vellore samples were positive for 19 ± 1 bacterial pathogen genes ( Salmonella species, H. pylori, C. difficile, and S. aureus that were not identified by PCR. Many of these contigs were relatively short (<2000 bps) and had low Centrifuge scores (sum of squared lengths of matched segments 38 ); therefore, they are likely false positives (Table S7 ). The average log10 Centrifuge score was not significantly different between polished and unpolished contigs identified as pathogens (mean polished= 6.3, mean unpolished= 6.0, p-value =0.08, Table S7 ). However, when evaluating each pathogen individually, Centrifuge scores were significantly higher for polished versus unpolished paired contigs classified as A. baumannii, A. hydrophila, E. coli, K. pneumoniae, and P. aeruginosa (Figure 2 ; paired t-test, Bonferroni corrected p-value < 0.0001 for all). When including unpaired contigs that changed taxa classification after polishing ( Figure S2 ), there was a significant difference between the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) By real-time PCR, Boston samples were positive for, on average, 53 ± 17 out of the 82 possible antibiotic resistance gene targets whereas Vellore samples were positive for 60 ± 6 antibiotic resistance genes (Table S8 ). Using real-time PCR as a benchmark, raw long-read metagenomic . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Figure 1 , and Table S8 ). It is notable that despite < 100% sequencing accuracy, long-read sequencing did not produce any false positives in the subset of 28 clinically-relevant resistance genes. The overall abundance of resistance genes was highest in the hospital effluent samples (Vellore hospital: 4260 ARGs/Gbp; Boston hospital: 2114 ARGs/Gbp (Table S4) ). 8% of ARGs in the Boston hospital effluent mapped to colistin resistance genes ( Figure S3 ). Specifically, reads aligned to the mcr-3 gene and the reads were classified as Aeromonas spp., which is consistent with other studies 41, 42 . Aeromonas spp. carrying mcr-3 genes have been found in humans, retail meat, and the environment in China and in wastewater in Japan 41,42 . In the polished contigs classified as A. baumannii and P. aeruginosa we identified aminoglycoside resistance genes in Boston wastewater, surface, and hospital samples ( Figure 3a ). Contigs were also identified as C. difficile containing tetracycline resistance genes in the Boston hospital effluent and contigs classified as S. aureus with lincosamide resistance genes were found in Boston wastewater. In Vellore, all but one surface sample contained contigs classified as potential pathogens with resistance genes that span numerous drug classes ( Figure 3a ). Clinically relevant resistance genes found in pathogens included armA in A. baumannii and blaCTX-M in E. coli in the hospital wastewater in Vellore. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint Were detected numerous resistance genes, from different classes of antibiotics, on plasmids in both Boston and Vellore (Figure 3b ). Beta-lactamase genes were present on plasmids in the hospital effluents from both cities in addition to wastewater and surface samples in Vellore. It is important to note that beta-lactamase genes vary in clinical significance at the allele level, and therefore only some are extended spectrum beta-lactamases (ESBLs), conferring resistance to antibiotics of last resort. For example, the primary alignment for a blaTEM gene found on a plasmid in the Boston hospital effluent was blaTEM-6 which is an ESBL, but the primary alignment on a plasmid in the Vellore hospital effluent was blaTEM-1, which is not (supplemental data) 43 . The concordance between ARG identification at the allele level in polished and unpolished contigs was 78% for genes conferring resistance to beta-lactamases (Table S10) . Among all drug classes evaluated the average concordance between polished and unpolished contigs was 86 ± 9% at the allele level and 98 ± 2% at the gene level. Nanopore is a relatively new technology and accordingly there are few assembly pipelines available that have been rigorously evaluated 33, [44] [45] [46] [47] . Recently, there has also been emphasis on the potential utility of short reads to improve long-read assemblies through polishing 34 ). While short-read polishing has been shown to improve frame-shift errors 48, 49 and increase accuracy of error-prone long reads 34 , here we found limited impact of short-read polishing on pathogen identification or ARG classification. Previous work has also recommended the use of hybrid . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint assemblers of both long and short reads (e.g. OPERA-MS 50 and HybridSPAdes 51 ), but in a preliminary analysis of our data, OPERA-MS produced contigs on average shorter or similar in length to our unassembled long reads (data not shown). Previous literature evaluating assemblers on mock communities has demonstrated that metaFlye, the assembler used in this work, performed well and was able to produce highly contiguous genomes 33, 45 . It is important to consider that assembler performance varies by coverage and complexity of microbial communities. For example, in an investigation of short, long, and hybrid assemblers for analysis of wastewater metagenomes, metaFlye produced the longest contigs and most co-occurrences of antibiotic resistance genes with mobile genetic elements and pathogen gene markers but was also shown to be vulnerable to misassemblies at low coverages 52 . Canu 44 has also been identified as a promising assembler for long reads but requires a significant amount of computing time and memory, often exceeding that available on university high-performance computer clusters with job time and memory limitations 45, 52 . This study contributes valuable metrics (sensitivity and specificity compared to real-time PCR) for using long-read metagenomic sequencing for identification of pathogen genes in wastewater and surface waters. Previous studies using long-read sequencing of environmental water samples have explored alternative workflows that may be useful for pathogen surveillance applications, but have not validated those methods against real-time PCR for multiple pathogens. A study utilizing long-read sequencing for water quality testing in a Montana River found that long-read sequencing was able to detect pathogenic E. coli genes in coliform enriched cultures but not in the uncultured samples 23 . Another study evaluating long-read sequencing of the 16S rRNA gene . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint (amplicon sequencing) in water from informal settlements in Nepal found that amplicon sequencing correlated with qPCR results for detection of Vibrio cholera but recommended combined use to avoid false positives 24 . Other studies have found nanopore sequencing (metagenomic and amplicon) useful for profiling spatial and temporal variation in the core communities and function of bacteria in river samples 53, 54 . Overall, the samples from Vellore had more pathogen genes detected via real-time PCR and more contigs identified as potential pathogens using metagenomics. In a previous study that used a similar TaqMan array card for detecting enteric pathogen prevalence in young children in Vellore, the most prevalent bacterial pathogens were enteroaggregative E. coli, enteropathogenic E. coli, and Campylobacter species 55 . Similar pathogen profiles were found in our own analysis of surface and wastewater samples in Vellore using real-time PCR. In this work, we detected Campylobacter at the species level in all Vellore samples, and two samples (wastewater and surface water) were positive for C. jejuni or C. coli genes by real-time PCR (Table S5) . We detected few bacterial pathogens in the Boston hospital wastewater but did detect enteroaggregative E. coli (aaiC) and S. flexneri (clade 1-type 3 restriction enzyme) genes. We also found all pathogenic E. coli genes (aaiC, aatA, aggR, eae, bfpA, estA, LT, STh, STp, stx1, stx2, or ipaH) in all Boston wastewater samples, but it is difficult to contextualize these results because there is limited monitoring or reporting of enteric pathogens, outside of outbreak investigations, in the United States. While there are efforts by the Centers for Disease Control and Prevention to monitor . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint waterborne illnesses, it is known that cases of diarrheal illness are significantly underreported in the United States 56-58 . The blaNDM-1 gene, which confers resistance to carbapenems, was found in all Vellore samples but not in Boston samples. These results suggest blaNDM-1 resistance could be widespread in India, where the gene is thought to have originated 59 , but has yet to become prevalent in Boston. It is important to note that our sample size was limited in each country. By long-read sequencing, relative abundance of resistance genes varied by drug class and exhibited location-specific trends. Boston samples were dominated by macrolide, lincosamide, streptogramin (MLS) resistance whereas Vellore samples had a higher relative abundance of aminoglycoside and betalactam resistance ( Figure S3) . A high abundance of macrolide resistance in Boston is consistent with a previous study of resistance genes around the world, that found a high relative proportion of macrolide resistance in Europe and North America 14 . A previous study of the Indian gut microbiome showed that study participants in urban areas carried resistance genes to nearly all classes of antibiotics 60 . In our study in Vellore, an urban area, contigs that were classified as potential pathogens or plasmids also carried resistance genes to most classes of antibiotics including aminoglycosides, beta-lactams, MLS, nitroimidazole, folate pathway antagonists (including sulfonamide resistance), phenicol, quinolone, and tetracycline. In comparing resistance genes found in this study using metagenomics to resistance genes found previously in clinical isolates, we were able to identify clinically relevant resistance genes, particularly on plasmids, in Boston and Vellore. In a recent review of antibiotic resistance in . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; hospitals in India, the most common extended spectrum beta-lactamases were CTX-M-types 61 , which we found (blaCTX-M-15) using metagenomic sequencing in E. coli in the Vellore hospital effluent. In the review, common carbapenem resistance genes in E. coli isolates were NDM and OXA48-like genes, and common beta-lactamase genes found in K. pneumoniae isolates in India were blaTEM, blaCTX-M, and blaOXA48-like. In our study, few beta-lactamase genes were classified on an E. coli contig (blaCTX-M-15) or K. pneumoniae contig (blaOXA), and most were found on contigs classified as plasmids. Many of the identified plasmid types (e.g. pR23, pKP3-A, pYD786-4, pKp_Goe_414-1, pFOS18) can be found in Enterobacteriaceae. Of the primary alignments to OXA genes on plasmids in the Vellore hospital effluent (OXA-1, OXA-164, OXA-181, OXA-392, OXA-58), one (OXA-181) was an OXA48-like allele. E. coli isolates in hospitals in India also contained resistance genes to other classes of antibiotics that we identified using metagenomic sequencing on plasmids in the Vellore hospital effluent, including tetA and tetB, dfrA17, sul1, and catB3 61 . We also identified armA, conferring aminoglycoside resistance, on A. baumanii in Vellore hospital wastewater which is in agreement with the presence of armA in 96.5% of isolates of A. baumanii from hospitals in India 61 . Unlike Vellore clinical isolates, a study of Boston hospital isolates of Enterobacteriaceae found that most contained blaKPC resistance, specifically KPC-2, KPC-3, and KPC-4 62 . We identified KPC-2 resistance genes on plasmids in both the Boston and Vellore hospital effluents, but KPC resistance is thought to be less common in India 61 . It should be noted that our results are likely a conservative estimate of the utility of long-read sequencing, and achieving more sequencing depth through increased flow cell yields (number of base pairs sequenced) could generate even more contigs classified as pathogens or pathogens . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint containing resistance genes. Removing shorter DNA fragments through size selection is one method that can be employed to increase flow-cell yields since shorter fragments are preferentially sequenced. In addition, identification of pathogens with sequencing is limited by the databases and methods employed to identify taxonomy. Since database limitations are difficult to overcome, we improved our ability to identify pathogens by using assembled contigs, instead of raw reads, in our pathogen analysis and used our mock community results to assess classification accuracy. Assembly increased the size of fragments (mean N50 reads=3462 bp, mean N50 contigs=47400 bp, excluding the mock community) although it is notable that the assembly for the mock community produced significantly longer contigs than the environmental samples (mock N50= 2.9 Mbp). Lastly, resistance genes identified using sequencing methods, without culturing and antibiotic susceptibility testing, are putative and do not guarantee phenotypic resistance. Wastewater-based epidemiology has gained major momentum in the past year in response to COVID-19 as it provides the ability to monitor for SARS-CoV-2 RNA at the population level 10 . Our study results are timely for informing how SARS-CoV-2 wastewater monitoring could be expanded to include surveillance of other pathogens and antibiotic resistance genes. In addition, building-level surveillance has recently been implemented for identifying COVID-19 hotspots and similar sampling scales could be useful for monitoring antibiotic resistance, such as at hospitals 63 . Using long-read sequences we identified mobile resistance genes to most drug classes of antibiotics in the hospital wastewater in Vellore. Previous surveillance of hospital effluent in India . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint has led to discovery of novel carbapenemases 26, 64 . Ongoing research during the COVID-19 pandemic to optimize sampling scales (building, sewer system, treatment plant) 65 ,66 and method of sampling (composite, grab, moore swab) 67,68 will also be applicable for wastewater sampling for enteric pathogens and antibiotic resistance. We thank Jie Liu for her assistance with the array card for pathogens. We would also like to acknowledge support from the Tufts Institute for the Environment. E.R.F was supported by the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint For each sample, 2.5L of water was collected with a disposable 100mL Dippas Sampling Dipper to fill five-500mL Whirl Pak bags (Nasco, Modesto, CA). Samples were placed in a cooler with ice packs, transported to the site's laboratory facility, and filtered (Boston samples at Tufts Medical Center and Vellore samples at Christian Medical College). Filtration was conducted using a Millipore filtration unit and 0.22 µm Nanoceram disc filters (Argonide, Sanford, FL). Due to high turbidity, filters were changed when flow rates were negligible. Filters were stored at -80 °C until DNA extraction. The DNeasy PowerWater kit (Qiagen, Germantown, MD) was used for DNA extraction according to the manufacturer's instructions with one modification. To accommodate the large number of filters per sample, all filters for one sample were soaked in PBS solution for 30 minutes followed by perturbation of each filter surface. The supernatant was decanted and retained for bead beating. One extraction blank was included per 6 samples. Following extraction, the concentration and purity of the DNA was assessed by spectrophotometry (NanoDrop ND-200) at each laboratory. Extractions were conducted by the same lab technician across both labs. The extracted DNA was stored at -20 °C until downstream processing. Each real-time PCR TAC Pathogen array card held viral, bacterial, fungal, protozoan, and helminth targets (Table S2 ) and could hold 8 samples. For each sample, 20 µl of sample was combined with 50 µl RT-PCR buffer (AgPathID One-Step RT-PCR Kit, Applied Biosystems), 4 µl of enzyme mix, and 6 µl of nuclease free water for a total volume of 100 µl. Each sample was mixed and deposited into the sample port on the card. The card was centrifuged twice at 1,200 rpm for 1 minute, then sealed, and the sample ports removed. The card was then placed into the thermal cycler (QuantStudio 12K Flex, Applied Biosystems) and run under the following cycling conditions: 45°C for 20 minutes, 95°C for 10 minutes, and 45 cycles of 95°C for 15 seconds followed by 60°C for 1 minute. One extraction blank was run as part of the 8 samples and the card contained two internal positive controls. Each real-time PCR antibiotic resistance (AR) array card held 83 different ARG targets (Table S3) plus positive controls. One sample was used per card plus one extraction blank was also analyzed (n=13). For each sample, 500 ng of DNA was combined with 1275 µl qPCR mastermix (Microbial qPCR Mastermix, Qiagen) and a sufficient amount of nuclease free water to reach a total volume of 2550 µl. 25 µl of sample was then added to each well of the array card. Each card was run in a thermal cycler (StepOne Plus, Applied Biosystems) with the following cycling conditions: 95 °C for 10 minutes followed by 40 cycles of 95 °C for 15 seconds and 60 °C for 2 minutes. Manufacturer's recommendations for baseline and threshold values were used; 20 cycles and 0.2, respectively. A mock sample was developed by the CMC laboratory and included Pseudomonas aeruginosa, Methicillin-resistant Staphylococcus aureus, Klebsiella pneumoniae, Escherichia coli, Stenotrophomonas maltophilia, and Burkholderia pseudomultivorans. Each bacterial strain in the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint mock community had a set of known ARGs (listed in Table S1 ). The set of pathogenic bacterial strains used included a mix of gram positive and gram-negative bacteria, varying levels of GC content, varying genome sizes, and varying abundances of the microorganisms within the mock sample. Five of the six bacterial pathogens contained antimicrobial resistance genes. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint Figure S1 : Taxonomic classification and AMR genes in contigs assembled from long-read sequences in the mock community. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 26, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Figure S2 : Log transformed centrifuge scores, sum of squared lengths of matched segments, of unpaired contigs classified as potential pathogens before and after short-read polishing. Significant differences after correcting for multiple comparisons are indicated by *. The line through each box denotes the median centrifuge score, the lower and upper box boundaries denote the first and third quartiles, respectively, and the whiskers extend to the minimum and maximum scores up to 1.5 times the interquartile range, with outliers plotted as dots. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 26, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint Table S9 : Conditional probabilities comparing raw long-reads (unassembled and unpolished) to real-time PCR for antibiotic resistance gene identification. Figure S3 : Relative abundance of resistance genes by drug class using raw long-reads. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 26, 2021. ; https://doi.org/10.1101/2021.04.22.21255864 doi: medRxiv preprint A European One Health Action Plan against Antimicrobial Resistance (AMR) Tackling Antibiotic Resistance: The Environmental Framework Global Monitoring of Antimicrobial Resistance Based on Metagenomics Analyses of Urban Sewage One Chromosome, One Contig: Complete Microbial Genomes from Long-Read Sequencing and Assembly Determining Hosts of Antibiotic Resistance Genes: A Review of Methodological Advances Multiplex PCR Method for MinION and Illumina Sequencing of Zika and Other Virus Genomes Directly from Clinical Samples Rapid Implementation of SARS-CoV-2 Sequencing to Investigate Cases of Health-Care Associated COVID-19: A Prospective Genomic Surveillance Study Sharing of Carbapenemase-Encoding Plasmids between Enterobacteriaceae in UK Sewage Uncovered by MinION Sequencing. Microb The Oxford Nanopore MinION: Delivery of Nanopore Sequencing to the Genomics Community Urban Wastewater Treatment Plants as Hotspots for Antibiotic Resistant Bacteria and Genes Spread into the Environment: A Review Mobile Antibiotic Resistome in Wastewater Treatment Plants Revealed by Nanopore Metagenomic Sequencing. Microbiome Metagenomic Profiling of Microbial Pathogens in the Little Bighorn River A Comparative Assessment of Conventional and Molecular Methods, Including Minion Nanopore Sequencing, for Surveying Water Quality Software Tools for Clustering, Reconstruction and Typing of Plasmids from Draft Assemblies. Microb. genomics Hospital Wastewater Releases of Carbapenem-Resistance Pathogens and Genes in Urban India Antimicrobial Resistance in India: A Tracking Global Trends in the Effectiveness of Antibiotic Therapy Using the Drug Resistance Index A Laboratory-Developed TaqMan Array Card for Simultaneous Detection of 19 Enteropathogens Evaluation of Oxford Nanopore's MinION Sequencing Device for Microbial Whole Sickle: A Sliding-Window, Adaptive, Quality-Based Trimming Tool for FastQ Files Scalable Long-Read Metagenome Assembly Using Repeat Graphs Fast and Accurate de Novo Genome Assembly from Long Uncorrected Reads ResFinder 4.0 for Predictions of Phenotypes from Genotypes Pairwise Alignment for Nucleotide Sequences Reference Sequence (RefSeq) Database at NCBI: Current Status, Taxonomic Expansion, and Functional Annotation Centrifuge: Rapid and Sensitive Classification of Metagenomic Sequences Antibiotic Resistance Threats in the United States The Bonferonni and Sidak Corrections for Multiple Comparisons Prevalence and Genetic Analysis of Mcr-3-Positive Aeromonas Species from Humans Potential KPC-2 Carbapenemase Reservoir of Environmental Aeromonas Hydrophila and Aeromonas Caviae Isolates from the Effluent of an Urban Wastewater Treatment Plant in Japan Canu: Scalable and Accurate Long-Read Assembly via Adaptive k-Mer Weighting and Repeat Separation Assembly Methods for Nanopore-Based Metagenomic Sequencing: A Comparative Study Fast Mapping and de Novo Assembly for Noisy Long Sequences Fast and Accurate Long-Read Assembly with Wtdbg2 Annotated Bacterial Chromosomes from Frame-Shift-Corrected Long-Read Metagenomic Data. Microbiome Ultra-Deep, Long-Read Nanopore Sequencing of Mock Microbial Community Standards. Gigascience Hybrid Metagenomic Assembly Enables High-Resolution Analysis of Resistance Determinants and Mobile Elements in Human Microbiomes An Algorithm for Hybrid Assembly of Short and Long Reads Critical Evaluation of Short, Long, and Hybrid Assembly for Contextual Analysis of Antibiotic Resistance Genes in Metagenomic Analysis of Planktonic Riverine Microbial Consortia Using Nanopore Sequencing Reveals Insight into River Microbe Taxonomy and Function Freshwater Monitoring by Nanopore Sequencing Enteropathogens and Gut Inflammation in Asymptomatic Infants and Children in Different Environments in Southern India Waterborne Disease Outbreaks Associated With Environmental and Undetermined Exposures to Water -United States Surveillance for Waterborne Disease Outbreaks Associated with Drinking Water -United States Emergence of a New Antibiotic Resistance Mechanism in India, Pakistan, and the UK: A Molecular, Biological, and Epidemiological Study Antimicrobial Susceptibility Profile & Resistance Mechanisms of Global Antimicrobial Resistance Surveillance System (GLASS) Priority Pathogens from India Multi-Institute Analysis of Carbapenem Resistance Reveals Remarkable Diversity, Unexplained Mechanisms, and Limited Clonal Outbreaks Concentrations of Antibiotics Predicted to Select for Resistant Bacteria: Proposed Limits for Environmental Regulation Sewage Effluent from an Indian Hospital Harbors Novel Carbapenemases and Integron-Borne Antibiotic Resistance Genes. Microbiome Designing a Typhoid Environmental Surveillance Study: A Simulation Model for Optimum Sampling Site Allocation