key: cord-0280786-nil1vv6h authors: Alfano, Niccolò; Dayaram, Anisha; Axtner, Jan; Tsangaras, Kyriakos; Kampmann, Marie-Louise; Mohamed, Azlan; Wong, Seth T.; Gilbert, M. Thomas P.; Wilting, Andreas; Greenwood, Alex D. title: Non-invasive surveys of mammalian viruses using environmental DNA date: 2020-11-30 journal: bioRxiv DOI: 10.1101/2020.03.26.009993 sha: 754c7b37dba0f983df04870ff84228cc84bb0221 doc_id: 280786 cord_uid: nil1vv6h Environmental DNA (eDNA) and invertebrate-derived DNA (iDNA) have been used to survey biodiversity non-invasively to mitigate difficulties of obtaining wildlife samples, particularly in remote areas or for rare species. Recently, eDNA/iDNA have been applied to monitor known wildlife pathogens, however, most wildlife pathogens are unknown and often evolutionarily divergent. To detect and identify known and novel mammalian viruses from eDNA/iDNA sources, we used a curated set of RNA oligonucleotides as viral baits in a hybridization capture system coupled with high throughput sequencing. We detected multiple known and novel mammalian RNA and DNA viruses from multiple viral families from both waterhole eDNA and leech derived iDNA. Congruence was found between detected hosts and viruses identified in leeches and waterholes. Our results demonstrate that eDNA/iDNA samples represent an effective non-invasive resource for studying wildlife viral diversity and for detecting novel potentially zoonotic viruses prior to their emergence. The RNA was reverse transcribed using SuperScript III and IV (Thermo Fisher Scientific) with random 114 hexamers prior to second-strand synthesis with Klenow fragment (New England Biolabs). The resulting 115 double-stranded cDNA/DNA mix was sheared to an average fragment size of 200 bp using a M220 focused 116 ultrasonicator (Covaris). Sheared product was purified using the ZR-96 DNA Clean & Concentrator-5 kit 117 (Zymo). Dual-indexed Illumina sequencing libraries were constructed as described by Meyer and Kircher 118 (2010) with the modifications described in Alfano et al. (2015) . Each library was amplified in three replicate 119 reactions to minimize amplification bias in individual PCRs. The three replicate PCR products for each 120 sample were pooled and purified using the MinElute PCR Purification Kit (Qiagen). Negative control libraries 121 were also prepared from different stages of the experimental process (extraction, reverse transcription, 122 library preparation and index PCR) and indexed separately to monitor any contamination introduced during 123 the experiment. Amplified libraries were quantified using the 2200 TapeStation (Agilent Technologies) on 124 D1000 ScreenTapes. 125 The targeted sequence capture panel was designed based on the oligonucleotide probes represented on the 127 Virochip microarray (Wang et al. 2002 cells were also removed. Ninety-two 70-mer oligonucleotides covering (spaced end-to-end) the entire pol 141 and gB genes of Equine herpesvirus 1 (EHV-1) were included as PCR screening of the water samples 142 indicated they were positive for this virus (data not shown). The resulting 13,532 oligonucleotides were 143 examined for repetitive elements, short repeats, and low complexity regions, which are problematic for probe 144 design and capture, using RepeatMasker. Repetitive motifs were identified in 234 oligonucleotides, which 145 were removed. The final targeted sequence capture panel consisted of 13,298 unique sequences which 146 were synthesized (as a panel of biotinylated RNAs) at MYcroarray. 147 Fig. 2) . The other contigs clustered phylogenetically, suggesting they represent two new species of a 243 rhabdovirus related to lyssaviruses (Suppl. Fig. 2) . Although in most cases one contig per sample was 244 observed, in five samples (L4, L12, L23, L58, L68) two different viruses were found. Most of the 245 oligonucleotide baits were specific for the L gene which encodes the RNA-dependent RNA polymerase. All 246 recovered contigs mapped to the L gene (Suppl. Fig. 3A-C) . The viral contig sequences were confirmed by 247 PCR and Sanger sequencing for L55 and L58 (Suppl. Fig. 3D) . 248 All Coronaviridae contigs matched a bat betacoronavirus as determined by BLAST searches with identities 249 between 70-73% (Suppl. Tab. 3). The resulting sequence did not cluster in any of the four clades 250 representing the known Coronaviridae genera, suggesting it may represent a novel coronavirus genus 251 10 (Suppl. Fig. 4) . Each contig overlapped with the coronavirus RNA-dependent RNA polymerase gene 252 (orf1ab), the viral region mainly targeted by the RNA oligonucleotide baits (Suppl. Fig. 5) . 253 Anelloviridae contigs matched either porcine torque teno virus (PTTV) (95-96% identity), a giant panda 254 anellovirus (GpAV) (81-92% identity) or a masked palm civet torque teno virus (Pl-TTV) (83-92% identity) 255 (Suppl. Tab. 3). The PTTV contigs were found in two samples (L8 and L37), while the GpAV and Pl-TTV 256 contigs were detected in six samples. GpAV was the best match in four samples (L7, L17, L36, L39) and Pl-257 TTV in three (L21, L25, L39). In sample L39 both were identified. Every Anelloviridae contig mapped to the 258 non-coding region of the relative reference genome since all Anelloviridae baits targeted the same 259 untranslated region (Suppl. Fig. 6A, C, E) . The non-coding region sequenced is not phylogenetically 260 informative and therefore, phylogenetic analysis could not be performed. Viral contigs were confirmed by 261 PCR and Sanger sequencing for samples L7, L17, L25 and L37 (Suppl. Fig. 6B, D, F) . 262 Three Circoviridae contigs matching a porcine circovirus (PCV) (100% identity) were identified in L7 and L59 263 (Suppl. Tab. 3). Two non-overlapping but adjacent contigs were retrieved from L7. A single contig 264 overlapping with one of the two contigs determined from L7 was recovered from L59 (Suppl. Fig. 7A ). The 265 contigs mapped to the PCV replication protein (Rep), targeted by the Circoviridae baits (Suppl. Fig. 7A ). The 266 two overlapping contigs of L7 and L59 were confirmed by PCR and Sanger sequencing (Suppl. Fig. 7B) . 267 Since the identity of the contigs with known viral sequences in GenBank was 100%, no phylogenetic analysis 268 was performed. 269 Parvoviridae contigs with the highest similarity to porcine parvovirus (PPV) were found in L8 (1 contig with 270 98% identity) and L14 (2 contigs with 74-77% identity) (Suppl. Tab. 3). The contig of L8 clustered within the 271 Tetraparvovirus genus, close to ungulate parvoviruses (porcine, ovine and bovine PV), while the contigs of 272 L14 within the Copiparvovirus genus, close to PPV4 (Suppl. Fig. 8) . Two of the three contigs mapped to the 273 replicase gene, while one from L14 mapped to an intergenic region (Suppl. Fig. 7C ). Whereas the replicase 274 region of PPV was covered by Parvoviridae baits, the intergenic region was not (Suppl. Fig. 7C ). This portion 275 of the virus may have been recovered by other non-Parvoviridae baits targeting that region non-specifically. Fig. 9 ). The contigs mapped to the polymerase gene, which 280 the exogenous retrovirus baits were designed to target (Suppl. Fig. 7D) . 281 Viral enrichment was low in both leech and waterholes samples, ranging from 0.00005% (sample L18) Fig. 11A ). JSRV from this sample was further confirmed by PCR 326 (Suppl. Fig. 11A ). Mongolia sediment sample SM20 was positive for Equine adenovirus (100% identity) with 327 a contig mapping to a region comprised between the pVI and hexon capsid genes (Suppl. Fig. 11B ). Given 328 that multiple equine species are found in the Gobi Desert in Mongolia, it is likely that the water sources 329 sampled may have been frequented by these species [10] . The sediment sample from Tanzania ST38 was 330 positive for a Zetapapillomavirus related to the Equus caballus papillomavirus and Equus asinus 331 papillomavirus (74% identity; E1-E2 genes) (Suppl. Fig. 11C; Suppl. Fig. 12) , consistent with the detection of 332 Plains zebra's (Equus quagga) DNA from this water source (Seeber et al. 2019) . 333 We demonstrate for the first time that both eDNA and iDNA sources can be used to survey known and novel 335 viruses. Indeed, many of the viruses we identified from these two sources were highly divergent from 336 13 available viral reference genomes (homology 45-100%, average 80%). We show that DNA and RNA viruses 337 could be detected in 59% and 23.8% of the iDNA (leech) and eDNA (waterhole) samples, respectively. The 338 congruence of host DNA assignment for leeches and viral families identified suggests that bloodmeals are 339 useful for determining viral diversity. The detection of equine viruses from African and Mongolian waterholes, 340 where intense wild equid visitation rates were directly observed, suggests eDNA derived viruses reflect host 341 utilization of the water rather than other environmental sources such as fomites. While host assignments are 342 difficult to establish for novel viruses from eDNA and to a lesser extent in iDNA, as e/iDNA samples can 343 contain DNA of multiple hosts, the results do narrow the possible number of taxa down to a small portion of 344 the overall faunal diversity within the regions examined. For example, the novel coronavirus was strongly, 345 but not in a statistically significant way, associated with sambar. This suggests targeted sampling and virus 346 specific PCRs could be used to examine prevalence in the species and to establish whether they are a 347 potential viral reservoir. Narrowing down the potential taxa that need to be screened in biodiversity hotspots 348 will be critical, particularly for viruses hosted at low prevalence within hotspot regions. limitations because the short baits can capture divergent and degraded DNA. With our capture system we 357 were able to identify viral sequences with up to 55% divergence from known viral genomes. The 358 comprehensive viral group representation in the bait set allows for the determination of both viral presence 359 and viral diversity. The ability of oligonucleotides with substantial divergence from the target sequence to 360 capture more distantly related sequences is particularly useful in virology since most viruses are 361 uncharacterized in wildlife and many evolve rapidly (Howard & Fletcher 2012). The overall enrichment was 362 low, but this was not unexpected since it is unlikely that any leech sample was strongly viremic or that large 363 amount of virus was shed into water. Nevertheless, our viral capture system generated a two fold order of 364 magnitude higher viral enrichment compared to shotgun sequencing. Furthermore, viral enrichment was 365 concentrated in the regions of the viral genome where baits were designed, leading to high coverage (up to 366 MAFFT multiple sequence alignment software version 7: improvements 483 in performance and usability', Molecular biology and evolution SortMeRNA: fast and accurate filtering of ribosomal RNAs in 486 metatranscriptomic data Fast gapped-read alignment with bowtie 2 Human Ebola outbreak resulting from direct exposure to fruit bats in Luebo Vector-borne and zoonotic diseases Illumina sequencing library preparation for highly multiplexed target capture 494 and sequencing Design-and 496 model-based recommendations for detecting and quantifying an amphibian pathogen in 497 environmental samples Empty forests, empty stomachs? Congo and Amazon Basins', International Forestry Review Broad surveys of DNA viral diversity obtained through viral metagenomics of mosquitoes The long terminal repeat of Jaagsiekte 504 sheep retrovirus is preferentially active in differentiated epithelial cells of the lungs Toward a quantification of risks at the nexus of conservation and health: The case of 508 bushmeat markets in Lao PDR Viral pathogen detection by metagenomics and pan-viral group polymerase chain reaction in 511 children with pneumonia lacking identifiable etiology', The Journal of infectious diseases Debugging diversity-a pan-continental exploration of the potential of terrestrial blood-515 feeding leeches as a vertebrate monitoring tool iDNA from terrestrial haematophagous leeches as a wildlife surveying and monitoring 518 tool-prospects, pitfalls and avenues to be developed African waterholes', Molecular ecology resources Origins of HIV and the AIDS pandemic', Cold Spring Harbor 523 perspectives in medicine African swine fever epidemic Unbiased 528 probabilistic taxonomic classification for DNA barcoding RAxML version 8: a tool for phylogenetic analysis and post-analysis of large 530 phylogenies Wildlife trade and the emergence of infectious 532 diseases Identifying conservation priorities in a defaunated tropical biodiversity hotspot Detective: an automated system for virus identification from high-throughput sequencing data Microarray-based detection and genotyping of viral pathogens 479X) at these positions. This allowed for the assembly of viral genome fragments for which we could 367 reconstruct the phylogenies and enabled further PCR methods to be successfully implemented to confirm 368 some of the viral capture results. 369Using short RNA baits to capture highly conserved sequences from every known vertebrate viral genome is 370 a useful and relatively inexpensive approach for providing an initial viral identification (Figure 1) . However, to 371 fully characterize each virus, the RNA oligonucleotide bait set would need to be customized to retrieve full 372 length viral genomes which has also been done successfully for novel divergent viruses (Alfano et al. 2016 ). 373This is a possible strategy to further investigate only viruses of interest, whereas initial screening with full 374 length genomes for all viruses is likely too costly. 375Several novel viruses were identified from leech bloodmeals with our approach, which is not unexpected as 376 little is known about the virology of wildlife in Southeast Asia, where the leeches were collected. Several viral 377 contigs were phylogenetically distinct from known viruses and may represent new genera. For example, the 378 novel coronavirus identified in leech bloodmeals did not cluster with any of the known Coronaviridae clades. 379We could also tentatively associate the novel corona-and rhabdoviruses with cervids, which are regularly 380 sold as bushmeat in wildlife markets (Nasi et al. 2011 The collection of wild haematophagous invertebrates such as leeches or water and sediments has both 385 advantages and disadvantages compared to invasively collected wildlife samples. Large amounts of DNA 386 can be extracted from bloodmeals, in particular when leeches are processed in bulk. We pooled up to 77 387 leeches and many of our leech bulk samples contained a diverse mix of mammalian DNA. A disadvantage of 388 leeches is that they cannot be found in all environments: for example haematophagous terrestrial species 389 are restricted to tropical rainforests of Asia, Madagascar and Australia (Schnell et al. 2018 ). In addition, for Tanzania, M for Mongolia and L for leeches. The leech host assignment for each leech sample is shown 565 on the right (see Suppl. Tab. 3 for further details). 566 Supporting information is available at MEE online. 568