key: cord-0280801-etv2zgvw authors: Graham, Ema H.; Adamowicz, Michael S.; Angeletti, Peter C.; Clarke, Jennifer L.; Fernando, Samodha C.; Herr, Joshua R. title: Diversity, abundance, and host specificity of the human skin associated circular and single stranded DNA virome date: 2022-05-23 journal: bioRxiv DOI: 10.1101/2022.05.22.492996 sha: 1c52a2acc6dbc0d53515a6f2a8bc0182c287b3d5 doc_id: 280801 cord_uid: etv2zgvw The human skin is our point of contact with the microbial world, yet little is known about the diversity of the skin virome. Studies of the human skin virome have focused on bacteriophage and double-stranded DNA viral genomes, however, there have been few efforts to characterize circular single-stranded DNA viruses that populate human skin. Here, we evaluate the diversity of the circular single-stranded DNA virome collected across three anatomical skin locations from 60 human individuals with five time-point collections spanning six-months. Our analyses resulted in the identification of 272 novel and unique Rep-encoded single-stranded DNA viruses associated with human skin. Sequence similarity networks and maximum likelihood estimations of the Rep and Capsid protein amino acid sequences from our sequencing and public database references, reveal family level stability of the Cressdnaviricota across the study participants and a larger host-range than previously thought for these putative multi-host pathogens. Understanding viral diversity in the environment and the potential for epidemics driven 55 by host switching is imperative to anticipate and combat the emergence of novel 56 zoonotic viral pathogens. To develop effective pathogenic mitigation strategies, we first 57 must be able to identify viral communities and predict those that have the capability to 58 cause significant pathogenesis to humans after acquisition from non-human hosts, such 59 as wild or domesticated animals. This presents a challenge since viral diversity in 60 current reference databases is poorly characterized due to the limited number of studies 61 investigating environmental, human, and animal associated viromes. Notably, 62 metagenomic sequencing techniques have contributed to the discovery and 63 identification of novel DNA and RNA viral genomes in the environment [1] . 64 The human skin is the first point of contact with many environmental pathogens, such Redondoviridae) [20] [21] [22] [23] [24] [25] [26] . In addition to infecting eukaryotes, there is evidence to 100 suggest that the Smacoviridae, a family of fecal associated Cressdnaviruses Human skin associated viral metagenome assemblies. Human skin virome samples 130 were collected from 60 subjects across three anatomical locations with five repeated 131 collections performed over a six-month interval after the initial collection time point. 132 Following DNA sequencing, viral metagenome assemblies from these samples were 133 computationally assessed for presence of small circular DNA viruses. We assembled a 134 total of 87,106 viral metagenome contigs from the skin samples of the 60 subjects in 135 this study. A total of 62,101 viral metagenome contigs were greater than 1kb in size. 136 Accounting for genomes found across multiple subjects, we identified a total of 683 137 unique viral genomes that were identified as having characteristics of complete small Smacoviridae (highlighted in blue), the closest instance of host switching was from that 246 of avian, swine, and primate associated Smacoviridae (Fig. 4) . When compared to the 247 host switching events observed in the phylogeny of the Capsid genes there were more 248 observed in the Rep gene phylogeny. When evaluating the Capsid gene, the closest 249 instance of host switching was from that of bat, avian, primates and swine (Fig. 5 ). (Fig. 4, 6) . These sequence similarity networks 266 displayed distinct groupings that can be considered that of hypothetical genus level 267 groupings (Fig. 7 ). For some novel genus level groupings identified in this study, we 268 observed that host associations were conserved within specific hypothetical genus level 269 designations (Fig. 7D) . However, for all of these instances there are small numbers of 270 genomic samples present in these groupings. In comparison to the networks of the 271 Smacoviridae hypothetical genus level taxonomic groupings (Fig 7A-B) , host specificity 272 recognized at the genus level was more conserved in that of the Smacoviridae than that The viral genomes we identified in this study contain Rep genes that are clustered 298 predominantly with known eukaryotic infecting Genomoviridae (Fig. 1E) Here we showed that the family Genomoviridae were highly abundant, stable over time, 333 and consistently present across the five time points from multiple skin anatomical 334 locations assessed for the 60 study participants (Fig. 1C-E) . This is consistent with our 335 findings from our previous study where the viral family Genomoviridae was shown to be 336 stable over time for 42 study participants [9] . Not only did we previously show that these 337 viruses were a part of the core human skin virome, but when performing a phylogenetic 338 analysis of all non-redundant Genomoviridae protein sequences in this study, it became 339 clear that the Genomoviridae not only have broad host range but are also categorized 340 as having associations with multiple cell types even within the same host. Due to their 341 stability over time, we hypothesize that these viruses are established viruses that are 342 continually replicating or newly colonizing cells from the environment, though their 343 pathogenesis is currently unknown. This is in opposition to the Smacoviridae which we 344 observe are temporal and appear to be environmentally contracted or are pathogenic in 345 non-human hosts. Here we confirm the persistent presence of the Genomoviridae on 346 the human skin and conclude there is a high probability for transfer, and, due to their 347 immense host range, there is potential for host-switching events to occur through 348 contact. We therefore hypothesize there is potential for zoonotic infections via host-349 switching from the Genomoviridae. 350 We found many instances of clustering of unclassified Genomoviridae Rep amino acid 351 sequences, and we hypothesize that these may be novel families and genera of the The funding agencies had no role in study design, sample collection, data interpretation, 559 or the decision to submit this work for publication. Consensus statement: Virus taxonomy in the age of 564 metagenomics The science of the host-virus network Climate change increases cross-species viral transmission 568 risk Topographical and temporal diversity of the human skin 587 microbiome The skin microbiome The human skin microbiome Metagenomic analysis 593 of virus diversity and relative abundance in a eutrophic freshwater harbour Analysis of Different Size Fractions 596 Provides a More Complete Perspective of Viral Diversity in a Freshwater 597 A catalog of tens of thousands of viruses from human 599 metagenomes reveals hidden associations with chronic diseases Mucosal and cutaneous human papillomavirus infections and cancer 602 biology Epidemiology and Burden of Human Papillomavirus 604 and Related Diseases, Molecular Pathogenesis, and Vaccine Evaluation Discovery of several thousand highly diverse circular DNA 607 viruses Cressdnaviricota: a Virus Phylum Unifying Seven Families of Rep-Encoding Viruses with Single-Stranded Evolutionary history of ssDNA bacilladnaviruses features 612 horizontal acquisition of the capsid gene from ssRNA nodaviruses Smacoviridae: a new family of animal associated 615 single-stranded DNA viruses Redondoviridae, a Family of Small, Circular DNA Viruses of 617 the Human Oro-Respiratory Tract Associated with Periodontitis and Critical 618 Entamoeba and Giardia parasites implicated as hosts of 620 CRESS viruses Nanovirus Disease Complexes: An Emerging Threat in the Modern Family Genomoviridae: 2021 taxonomy update CRISPR analysis suggests that small 626 circular single-stranded DNA smacoviruses infect Archaea instead of humans Pervasive chimerism in the replication 629 associated proteins of uncultured single-stranded DNA viruses Gemykibivirus Genome in Lower Respiratory Tract of Elderly 632 Woman with Unexplained Acute Respiratory Distress Syndrome Small circular single stranded DNA viral genomes in 635 unexplained cases of human encephalitis, diarrhea, and in untreated sewage Novel cyclovirus species in dogs with 638 hemorrhagic gastroenteritis A Preliminary Study of the Virome of the South American 640 Free-Tailed Bats (Tadarida brasiliensis) and Identification of Two Novel 641 Basic local 643 alignment search tool Fast and sensitive taxonomic classification for 645 metagenomics with Kaiju Improved metagenomic analysis with Kraken 647 Sequence-based taxonomic framework for the 649 classification of uncultured single-stranded DNA viruses of the family A diverse group of small circular ssDNA viral genomes in 652 human and non-human primate stools Merkel cell polyomavirus and two previously unknown polyomaviruses are 655 chronically shed from human skin Metagenomic Discovery of 83 New Human Papillomavirus 657 Types in Patients with Immunodeficiency Review of psittacine beak and feather 659 disease and its effect on Australian endangered species A review of porcine circovirus 2 associated syndromes and diseases Single-Stranded DNA (CRESS DNA) Viruses: Ubiquitous Viruses with Small 665 Genomes and a Diverse Host Range Discovery and genetic characterization of diverse 668 smacoviruses in Zambian non-human primates Truly ubiquitous CRESS DNA viruses 670 scattered across the eukaryotic tree of life Genome-Wide Variation in Betacoronaviruses The coronavirus proofreading exoribonuclease mediates 674 extensive viral recombination A quality control tool for high throughput sequence data Sickle: A sliding-window, adaptive, quality-based trimming 680 tool for FastQ files (Version 1.33) BBMap: A Fast, Accurate, Splice-Aware Aligner MEGAHIT: An ultra-fast 685 single-node solution for large and complex metagenomics assembly via succinct 686 de Bruijn graph QUAST: Quality assessment 688 tool for genome assemblies Aligning sequence reads, clone sequences and assembly contigs with 690 BWA-MEM Cenote-Taker 2 democratizes virus discovery and sequence annotation Accelerated for clustering the 695 next-generation sequencing data The EFI Web Resource for Genomic 697 Leveraging Protein, Genome, and Metagenome Databases 698 to Discover Novel Enzymes and Metabolic Pathways Cytoscape: A software Environment for integrated models of 701 biomolecular interaction networks MAFFT multiple sequence alignment software 703 version 7: Improvements in performance and usability IQ-TREE: A fast 706 and effective stochastic algorithm for estimating maximum-likelihood 707 phylogenies UFBoot2: Improving the ultrafast bootstrap approximation Multiple sequence alignment with high accuracy and high 712 throughput FastTree 2 -Approximately maximum-714 likelihood trees for large alignments Interactive tree of life (iTOL) v5: An online tool for 716 phylogenetic tree display and annotation Fast gapped-read alignment with Bowtie 2. Nat The Sequence Alignment/Map format and SAMtools R: a language and environment for statistical computing. R 723 Foundation for Statistical Computing Elegant Graphics for Data Analysis NCBI Taxonomy: A comprehensive update on curation, 729 resources and tools Log10 total abundance summed across all time points for a subject at an anatomical 743 location for the identified family level groupings of the novel human skin associated 744 Persistence of the Cressdnavirus family level grouping of the novel 745 human associated Cressdnaviruses across the five time points is represented by the color scale 746 with red being present in five out of the five time points and blue being present in zero out of the 747 five time points. For (C.) and (D.) heatmaps each row is an anatomical location Sequence similarity 749 network of Cressdnaviricota reference sequences obtained from NCBI and the human skin 750 associated novel CRESS-like viruses identified in this study. EFI-EST was used to conduct 751 pairwise alignments of the conserved amino acid sequences of the Cressdnaviricota phylum 752 Rep gene [55]. For family level clustering an E-value cutoff of 10 -60 was used. Cytoscape was 753 used for network visualization and color labeling Human skin associated Genomoviruses identified in this study are highlighted in yellow. Four 817Rep gene amino acid sequences from differing Geminiviruses were used as an outgroup control 818for this network clustering. 819 820