key: cord-257321-l1swyr6g authors: Chen, Lihong; Liu, Bo; Wu, Zhiqiang; Jin, Qi; Yang, Jian title: DRodVir: A resource for exploring the virome diversity in rodents date: 2017-05-20 journal: J Genet Genomics DOI: 10.1016/j.jgg.2017.04.004 sha: doc_id: 257321 cord_uid: l1swyr6g Emerging zoonotic diseases have received tremendous interests in recent years, as they pose a significant threat to human health, animal welfare, and economic stability. A high proportion of zoonoses originate from wildlife reservoirs. Rodents are the most numerous, widespread, and diverse group of mammals on the earth and are reservoirs for many zoonotic viruses responsible for significant morbidity and mortality. A better understanding of virome diversity in rodents would be of importance for researchers and professionals in the field. Therefore, we developed the DRodVir database (http://www.mgc.ac.cn/DRodVir/), a comprehensive, up-to-date, and well-curated repository of rodent-associated animal viruses. The database currently covers 7690 sequences from 5491 rodent-associated mammal viruses of 26 viral families detected from 194 rodent species in 93 countries worldwide. In addition to virus sequences, the database provides detailed information on related samples and host rodents, as well as a set of online analytical tools for text query, BLAST search and phylogenetic reconstruction. The DRodVir database will help virologists better understand the virome diversity of rodents. Moreover, it will be a valuable tool for epidemiologists and zoologists for easy monitoring and tracking of the current and future zoonotic diseases. As a data application example, we further compared the current status of rodent-associated viruses with bat-associated viruses to highlight the necessity for including additional host species and geographic regions in future investigations, which will help us achieve a better understanding of the virome diversities in the two major reservoirs of emerging zoonotic infectious diseases. Zoonotic diseases comprise a significant and increasing proportion of all emerging human infectious diseases, and most of them originate from wildlife (Kruse et al., 2004; Jones et al., 2008) . Two mammalian orders, Rodentia (rodents) and Chiroptera (bats), represent the most relevant potential sources of new zoonoses (Calisher et al., 2006; Meerburg et al., 2009; Smith and Wang, 2013) . The Rodentia is the single largest order of mammals. There are more than 2000 living species of rodents, which comprise~40% of all mammalian species, including mice, rats, hamsters, guinea pigs, voles, chinchillas, chipmunks, gophers, muskrats, gerbils, woodchucks, and many others (Huchon et al., 2002) . Despite their great species diversity, all rodents share a common feature e a single pair of continuously growing incisors, which are used to gnaw food, excavate burrows, and defend themselves. Rodents are native on all continents except Antarctica, and they show a wide range of lifestyles, ranging from terrestrial, subterranean, arboreal, to aquatic habitats (Wolff and Sherman, 2007) . Rodents are common hosts for pathogens that transmit diseases to humans and domestic animals because of their properties of close association with humans, large social group size, intense social interaction, high population density and widespread geographic distribution. Since the Middle Ages, rodents have been known to contribute to human diseases, as black rats were associated with the spread of the plague (Perry and Fetherston, 1997) . However, rodents are also a threat to public health in modern times. They have been implicated as reservoir hosts of zoonotic pathogens, such as viral hemorrhagic fever viruses, including arenaviruses (Junin, Machupo, and Lassa) and hantaviruses (Hantaan, Dobrava, and Sin Nombre) (Enria and Pinheiro, 2000; Goeijenbier et al., 2013) . Also, global climate change and continued urbanization have led to increased problems with rodent-associated zoonoses (Mills et al., 2010) . Recent metagenomic studies have identified a wide diversity of novel viruses in rodents from families/ genera that contain important human pathogens, including new species/variants of picornaviruses, hepaciviruses and sapoviruses (Firth et al., 2014) . Therefore, it is necessary to enhance our knowledge of the virome diversity in rodents to facilitate future prevention and control of emerging zoonotic diseases. Bats are the second most diverse mammalian orders on the earth, with more than 1200 living species identified worldwide (Altringham, 2011) . The importance of bats as natural hosts for several important viral agents, including Ebola virus, Marburg virus, Hendra virus, Nipah virus, Severe Acute Respiratory Syndrome (SARS) coronavirus, and Middle East Respiratory Syndrome (MERS) coronavirus, has been established (Calisher et al., 2006; Wong et al., 2007; Ge et al., 2013; Smith and Wang, 2013; Munster et al., 2016) . Due to the wide-scale microbial surveillance programs conducted and novel high-throughput detection methods developed in recent years, there are a plethora of newly identified viruses in both rodents and bats. Our group already constructed the first database of bat-associated viruses (DBatVir) in 2014 (Chen et al., 2014) . In this study, we present a newly established DRodVir database for a thorough and modern understanding of rodent-associated viral zoonoses (http://www.mgc.ac.cn/DRodVir/). Moreover, we used the information from two sister databases to perform a comparative analysis on the virome diversities of the two major reservoirs of emerging zoonotic infectious diseases. DRodVir is a sequence-centric database, since molecular methods are now commonly used in virus detection and functional analysis. To retrieve all available sequences of rodent-associated viruses from the public domain, we first performed exhaustive searches in both the PubMed and Nucleotide databases of NCBI using a group of keywords pertaining to rodents. Then, the retrieved GenBank records were downloaded to a local system and parsed by an in-house BioPerl script to generate human-readable tables of the metadata, including the sampling time, location, rodent species, specimen type (e.g., feces, blood, or tissues), and viral detection method (e.g., PCR or metagenomics), for further review. For published sequences, additional meta information was extracted from the related literature using manual curation. As the DRodVir database focuses on natural rodent-associated mammal viruses that are most relevant to emerging zoonotic infectious diseases, the following filter criteria are employed to exclude unrelated records: i) samples derived from animals other than rodents; ii) sequences of phages; iii) insect or plant viruses; iv) laboratory-cultured viruses in model rodents. Taxonomic information of all viruses and rodents was derived from the Taxonomy database of NCBI. In addition, we collected information concerning the viruses (e.g., genome architecture, average size and virus graphics) and the rodents (e.g., common names and known geographic ranges) from the ViralZone databases (Masson et al., 2013) and the ICUN Red List (www.iucnredlist.org), respectively. Moreover, the established phylogenetic relationships between different rodent families from a previous study were carefully incorporated into the database (Meredith et al., 2011) . The DRodVir database reused the background data schema and the majority of the foreground Perl CGI and JavaScript codes of the DBatVir database because they are efficient to provide versatile and user-friendly interface (Chen et al., 2014) . Given that many users of DRodVir and DBatVir databases may overlap, the similar architecture makes the new database instantly familiar with previous users. In addition, the two sister databases with similar web design and analytical functions will make it very easy to perform any comparative analysis on bat-and rodent-associated viruses for all users. The NCBI BLAST and MUSCLE programs were integrated into the database to allow users to conduct sequence similarity searches and multiple sequence alignment on the web page (Altschul et al., 1997; Edgar, 2004) . The FastTree program and jsPhyloSVG library were used for online phylogenetic tree construction and visualization, respectively (Price et al., 2010; Smits and Ouverney, 2010) . The main page of DRodVir database provides a highly responsive and intuitive user interface, with a look similar to a desktop application rather than a traditional website. On the left side is a multifunctional menu panel for easy navigation, and on the right side is a tabbed content panel for presenting tables, figures, trees, BLAST outputs, and other results. To maximize the visible section of the content panel for users with limited screen size, the menu panel can be collapsed into a clickable vertical bar automatically (depends on visitors' screen resolution setting) or manually (by single click the icon on the top right corner) (Fig. 1A) . Features that were previously available only in standalone applications, including collapsible menus, expandable trees, sortable grids, tabbed panels and live statistical pie charts, provide high performance and an improved user experience. DRodVir inherits the major features from the DBatVir database (Chen et al., 2014) . Users can either browse or search the database contents. In browse mode, the menu panel offers three submenus in accordion style for users to browse the database by categories of viruses/rodents or geographic regions. Users who are familiar with certain viruses can use the 'browse by virus' submenu to explore the current host range in rodents and their geographic distributions (Fig. 1B) , whereas those who are interested in given types of rodents may try the 'browse by rodent' submenu to investigate the known virome diversity in different rodentia hosts (Fig. 1C ). In addition, the 'browse by region' submenu allows the users to easily summarize the current research efforts conducted on rodentassociated viruses for any countries or continents (Fig. 1D ). All three submenus are organized as hierarchical and expandable trees and can be easily switched on/off by single click on the title bar. Each branch of the trees in the submenus provides a direct link to individual tab in the main content panel, which presents a uniform sortable grid that includes brief information of the viruses, specimens, associated rodents, determined sequences and related literature (Fig. 1A) . Each line of the table can be further expanded by double click to show additional information, such as virus detection method, GenBank accession number(s), sequence submission date, submitters and their affiliations. A clickable linear map is available for each full-length viral sequence to highlight the genome architecture of the virus (Fig. 1A) . For brevity, the table provides up to 100 records by default. Handy toolbar is available for users to page up/down if more records exist, or alternatively, turn off paging to force to display all records within a single table. In addition, the main table offers various manipulating functions, i.e., sorting/ filtering the table by column contents, reordering the columns, adjusting column width or hiding some columns. In search mode, two parallel query forms are provided to the users for text information search and BLAST sequence similarity search, respectively ( Fig. 2A) . The text search engine allows the extraction of virus/rodent/specimen/sequence information from the database using either a simple query for a quick start or combined search patterns for advanced usages. The search results are organized into an individual high-performance grid in the content panel as aforementioned. To facilitate online data analysis, two visualization tools are integrated into the result table: i) a statistical pie chart is available with a single click on the column title of virus family, rodent species/family, sample type and sampling country (Fig. 2B) ; ii) a global map with indicative markers is provided for the column of sampling country to better illustrate the geographic distribution of the rodent-associated viruses (http://www.mgc.ac. cn/cgi-bin/DRodVir/main.cgi?func¼map). In addition, the result table and the related sequences can be easily downloaded as Excel and FASTA files respectively for further offline analyses. To bridge the gap between virologists and zoologists, the information concerning the rodents including common names, known distribution and phylogenetic relationships of rodents is also available and The hierarchical taxonomic tree shows virus families (bolded branches) by default for brevity. C: The second layer of the main menu for browsing the database by rodent category. Several branches are expended to show lower taxonomic levels (i.e., subfamily, genus and species). D: The third layer of the main menu for browsing the database by geographic regions. The geographic tree is organized by continents and countries. searchable in the DRodVir database (Fig. 2C) . The BLAST sequence similarity search form enables the users to submit their own sequences for homology comparison among all known rodentassociated viruses. Moreover, an automatic pipeline for multiple sequence alignment and online phylogenetic tree construction is offered to facilitate follow-up sequence analysis. Detailed step-bystep instruction of all aforementioned usages of the DRodVir database is available in the online help page (http://www.mgc.ac. cn/DRodVir/howto.htm). The genomic studies on rodent-associated viruses were initiated earlier than on bat-associated viruses. Both have been boosted by the genomic era during the past two decades. In addition, metagenomic approaches or high-throughput sequencing (HTS)-based methods have played a major role in the discovery of novel viruses in both rodents and bats in recent years (Fig. 3A) . However, many GenBank records provide insufficient meta information about the detection methods applied, so the contributions of metagenomics or HTS-based approaches are systematically underestimated in Fig. 3A . Currently,~25% sequences of rodent-or bat-associated viruses are deposited in public domains without related literature. To date (November 2016), 5491 viruses from 26 families and 6268 viruses from 24 families were identified from rodents and bats, respectively. However, only 21 viral families are shared by rodent-and bat-associated viruses. Viruses from families Arenaviridae, Arteriviridae, and Picobirnaviridae, carried by rodents, have not yet been detected in bats. Similarly, viruses from Filoviridae, identified in bats, have not been found in rodents so far (Fig. 3B) . We avoided the use of lower taxonomic concepts such as genus or species because of the inconsistent criteria used for such definitions as described previously (Anthony et al., 2013) and because there are many newly identified viruses grouped into unclassified genera or species in public data. Our current data show that the virome diversities in rodents and bats are similar at the family level, though a previous study suggests that bats harbor more zoonotic viruses per species than rodents (Luis et al., 2013) . Remarkably, the aforementioned viruses were identified from 194 rodent species and 246 bat species. Given that there is nearly twice the number of living rodent species as bat species, the current investigation efforts cover onlỹ 9% of the rodent species but~21% of the bat species. Thus, the virome diversity in rodents has probably been underestimated compared with that of bats due to the limitation of the currently available data. Future studies to screen additional rodent species are required to achieve a better estimation of the virome diversity in rodents. The geographic distribution of the hosts being investigated is also an important factor to affect the estimation of virome diversity in both rodents and bats, since some viruses carried by the hosts are likely regional even for the same host species. For example, the Ebola virus and Marburg virus were exclusively identified from bats in Africa. Therefore, a better understanding of the virome diversity in either host requires the investigation efforts to cover as many countries as possible. Indeed, the current studies already include countries from all six continents. Thus far, the coverage of countries in most continents is far from comprehensive (Fig. 3C) . Although the species diversity of rodents/bats in different countries is different and the current efforts in most countries could not cover all resident host species, further investigations dedicated to those unexplored regions of each continent are essential for our understanding of virome diversity in both rodents and bats. The current data are disproportionally distributed in different viral families for both rodent-and bat-associated viruses. This could be a reflection of particular public concerns on some wellknown zoonotic viruses carried by rodents or bats. For example, 52% of the rodent-associated viruses currently identified were from family Bunyaviridae (mostly are Hantaviruses), whereas~41% of the bat-associated viruses reported so far were from family Rhabdoviridae (mostly are Lyssaviruses). Nevertheless, the biased data may also be a helpful measure for the potential distribution range of each viral family in the hosts. For instance, viruses from the family Bunyaviridae are the most widely distributed viruses among rodent-associated viruses, which have been detected in 109 rodent species from 64 countries worldwide (Fig. 3B) . However, although the current data of viruses from the family Coronaviridae (i.e., coronavirus) in bats are less than half of those from family Rhabdoviridae, coronaviruses were surprisingly identified in 121 bat species from 39 countries, which makes them the most widely distributed bat-associated viruses thus far (Fig. 3B) . Because the causative agents of both SARS and MERS pandemics are coronaviruses, their widespread occurrence in bats further highlights their potential threats to public health. Also, the current data show that viruses from some other families such as Rhabdoviridae, Paramyxoviridae, Flaviviridae, Herpesviridae, Arenaviridae, and Picornaviridae are broadly distributed in terms of host species (rodents and/or bats) and geography (Fig. 3B) . Therefore, these viral families should be the hotspots for future studies dedicated to the prevention and control of zoonotic viral diseases. As of November 2016, the DRodVir database has collected information on 5491 rodent-associated animal viruses of 26 virus families detected from 194 rodent species in 93 countries worldwide. These data will give us an overview and snapshot of the current research regarding rodent-associated viruses and provide a substantial source for future attempts to assess and predict epidemic risks. Moreover, this work provides the scientific community with a framework and platform for further exploring the virome diversity of rodents. Mammals are supposed to harbor over 320,000 undiscovered viruses (Anthony et al., 2013) . Rodents and bats are important reservoirs for an increasing number of zoonotic infectious diseases with a significant impact on public health. The two most diverse and geographically widespread mammalian orders play vital roles in the maintenance, evolution, and spread of many emerging infectious disease agents. Our comparative analysis primarily explored the virome diversities in rodents and bats based on the summary of previous studies and highlighted the necessity to cover additional host species and geographic regions in future investigations. For easy comparative analysis on rodent-and bat-associated viruses in further studies, we plan to combine the two sister databases, DBatVir and DRodVir, to form an integrated resource for the two major reservoirs of emerging zoonoses in the future. Bats: from Evolution to Conservation Gapped BLAST and PSI-BLAST: a new generation of protein database search programs A strategy to estimate unknown viral diversity in mammals Bats: important reservoir hosts of emerging viruses DBatVir: the Database of Bat-associated viruses. Database (Oxford) MUSCLE: multiple sequence alignment with high accuracy and high throughput Rodent-borne emerging viral zoonosis: hemorrhagic fevers and hantavirus infections in South America Detection of zoonotic pathogens and characterization of novel viruses carried by commensal Rattus norvegicus Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor Rodent-borne hemorrhagic fevers: under-recognized, widely spread and preventableepidemiology, diagnostics and treatment Rodent phylogeny and a timescale for the evolution of Glires: evidence from an extensive taxon sampling using three nuclear genes Global trends in emerging infectious diseases Wildlife as source of zoonotic infections A comparison of bats and rodents as reservoirs of zoonotic viruses: are bats special? ViralZone: recent updates to the virus knowledge resource Rodent-borne diseases and their risks for public health Impacts of the cretaceous terrestrial revolution and KPg extinction on mammal diversification Potential influence of climate change on vector-borne and zoonotic diseases: a review and proposed research plan Replication and shedding of MERS-CoV in Jamaican fruit bats (Artibeus jamaicensis) Yersinia pestiseetiologic agent of plague FastTree 2eapproximately maximumlikelihood trees for large alignments Bats and their virome: an important source of emerging viruses capable of infecting humans jsPhyloSVG: a javascript library for visualizing interactive and vector-based phylogenetic trees on the web Rodent Societies: an Ecological and Evolutionary Perspective Bats as a continuing source of emerging infections in humans