key: cord-0900341-1ybw8crc authors: Henzy, Jamie E.; Gifford, Robert J.; Kenaley, Christopher P.; Johnson, Welkin E. title: An Intact Retroviral Gene Conserved in Spiny-Rayed Fishes for over 100 My date: 2016-12-30 journal: Molecular Biology and Evolution DOI: 10.1093/molbev/msw262 sha: e6cc1aee080a2a9a1b6818106c13a0cc24589b42 doc_id: 900341 cord_uid: 1ybw8crc We have identified a retroviral envelope gene with a complete, intact open reading frame (ORF) in 20 species of spiny-rayed fishes (Acanthomorpha). The taxonomic distribution of the gene, “percomORF”, indicates insertion into the ancestral lineage >110 Ma, making it the oldest known conserved gene of viral origin in a vertebrate genome. Underscoring its ancient provenence, percomORF exists as an isolated ORF within the intron of a widely conserved host gene, with no discernible proviral sequence nearby. Despite its remarkable age, percomORF retains canonical features of a retroviral glycoprotein, and tests for selection strongly suggest cooption for a host function. Retroviral envelope genes have been coopted for a role in placentogenesis by numerous lineages of mammals, including eutherians and marsupials, representing a variety of placental structures. Therefore percomORF’s presence within the group Percomorpha—unique among spiny-finned fishes in having evolved placentation and live birth—is especially intriguing. Vertebrate genomes include vast numbers of sequences derived from the infection of germ cells by retroviruses, which integrate into host DNA as a normal part of their replication cycle (Jern and Coffin 2008; Johnson 2015) . Most of these endogenous retrovirus (ERV) sequences are overwritten by mutations over time, but occasionally an ERV gene is coopted for a function that benefits the host, and becomes fixed in the population. While investigating the origins of a sequence that had been published as potentially representing a remnant of a filovirus in the genome of the stickleback (Gasterosteus aculeatus) (Belyi et al. 2010) , we found that the sequence is actually part of an ORF encoding a complete and intact retroviral envelope glycoprotein. Retroviral glycoproteins stud the surface of the virion and mediate receptor binding and membrane fusion, allowing entry of the viral cargo into the target cell. The UCSC Genome Browser (Speir et al. 2016) indicates orthologs of the stickleback sequence in two species of pufferfishes, Takifugu rubripes and Tetraodon nigroviridis, and BLAST searches of the NCBI databases with the full-length translated sequence recovered the ORF from an additional 17 species of spiny-rayed fishes (supplementary table S1, Supplementary Material online). Although no matches were found within the EST database, a search of the NCBI Sequence Read Archive (SRA) revealed an exact match to the stickleback sequence among RNA transcripts expressed in embryonic tissue at 3 days post-fertilization (supplementary methods S1, Supplementary Material online). All species belong to the crown group known as the Percomorpha, thus we named the ORF "percomORF". Analysis of the flanking regions revealed that percomORF is positioned within an intron of a gene encoding an auxilin-like protein, DNAJC6, that is widely and highly conserved among vertebrates. We were able to trace the DNAJC6 coding region across 18 exons in Xiphophorus maculatus, and percomORF lies within the last intron, between exons 17 and 18, in opposite orientation to DNAJC6 ( fig. 1 ). This distinctive location allowed us to establish that percomORF is in a syntenic position in all 20 species and thus arose from a single event in a common ancestor. We also examined the corresponding regions in other teleost lineages to generate a presence/absence profile ( fig. 2 ). Its presence in multiple lineages of Percomorphaceae indicates that percomORF originated no more recently than 109-140 Ma, the estimated range for the Percomorphaceae divergence Betancur-R et al. 2013) . Despite its ancient age, percomORF encodes all of the typical features of a functional retroviral envelope glycoprotein, with no stop codons interrupting the predicted reading frame in any of the species, implying maintenance of function for the benefit of the host, or exaptation. For example, the maintenance of receptor-binding function is suggested by 14 cysteines positionally conserved in the region corresponding to the surface (SU) subunit ( fig. 1 ). Cysteines in SU typically help the protein fold into a globular conformation that functions in receptor binding (Chiou et al. 1992 ; van Anken et al. 2008) . Likewise, percomORF retains motifs associated with membrane fusion such as: (1) a run of basic residues typical of a canonical furin cleavage site by which SU and the fusion subunit (TM) are cleaved to generate a fusion-ready trimer of heterodimers and (2) a stretch of hydrophobic residues at the N-terminus of TM, consistent with the fusion peptide. The CX 6 CC motif in the region coding for TM is typical of retroviruses whose subunits are joined by a covalent disulfide LETTER ß The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com bond (Pinter et al. 1997; Henzy and Coffin 2013) , and these motifs are usually found in conjunction with an isomerase domain (CXXC) in SU, which has been shown to mediate bond dissociation upon receptor binding (Wallin et al. 2004 ). Interestingly, percomORF encodes three such motifs ( fig. 1 ). In addition to the conservation of motifs associated with function, exaptation is further supported by dN/dS analysis (sup plementary table S2, Supplementary Material online). Under various evolutionary models, we found that between 189 and 232 of the 422 codons analyzed have dN/dS values that are consistent with a regime of strong purifying selection (average dN/dS: 0.25), while only a handful of positions (0-5) have dN/ dS ratios suggesting positive selection. One feature of the stickleback sequence that suggested it may be a filoviral glycoprotein is 32% amino acid sequence identity with Reston ebolavirus glyco-protein over a region in the TM subumit encoding 89 residues ( fig. 1 , blue bar) (Belyi et al. 2010 ). However, although filovirus glycoproteins are known to share structural features with retroviral glycoproteins (Gallaher 1996; Weissenhorn et al. 1998; Jeffers et al. 2002) , several motifs encoded by percomORF are specific to retroviruses. These are (1) isomerase motifs in SU, which are common in retroviruses with covalently-bonded subunits (Pinter et al. 1997; Wallin et al. 2004 ; Henzy and Coffin 2013) but have not been found in filoviruses; (2) a fusion peptide that is N-terminal, whereas filoviral fusion peptides are internal and flanked by cysteines and (3) a motif (QNRAALD) in the immunosuppressive domain (ISD) that is typical of nonmammalian retroviruses rather than filoviruses (Henzy and Johnson 2013) . A consensus sequence generated from an alignment of all 20 orthologs retains these motifs (supplementary fig. S1 , Supplementary Material online), consistent with their presence in the ancestral retroviral protein. Another feature that suggested that the stickleback sequence may be a filoviral gene is its existence in isolation from other viral genes (Belyi et al. 2010) . Indeed, all of the percomORF loci we identified contained no discernible retroviral genes within a 10-kb flanking window, consistent with previous findings (Belyi et al. 2010) . Whereas retroviruses insert a DNA copy of the entire viral genome (known as a "provirus") into host DNA as a normal part of replication, other endogenous virus sequences are thought to arise through the interaction of viral mRNA with a retrotransposon such as a LINE-1 element (Horie et al. 2010; Katzourakis and Gifford 2010; Taylor et al. 2010) , and thus exist as isolated genes. However, at a different locus we found a bona fide retroviral provirus in the Japanese eel (GenBank: KI305800.1) with a predicted Env protein that has 50% amino acid identity with the original partial sequence (compared with only 32% identity seen with Reston ebolavirus), serving as additional evidence for the retroviral origin of percomORF. Notably, we did not find any degenerate sequences more closely related to percomORF than that of the Japanese eel, likely reflecting divergence of percomORF's evolutionary trajectory as a host gene from that of the originating retrovirus. The presence of percomORF as an isolated gene may reflect its age. Notably, the Fv-1 gene of mice, like percomORF, is Below is a graph representing the average pairwise percent nucleotide identity of each region, generated using GENEIOUS v. 6.0.4, available from http://www.geneious.com. Dark green indicates 100% identity across all sequences at a given site; green, >30%, identity; red, <30% identity. Height of graph at each position conveys the fraction of sequences with an identical nucleotide at that position. Bottom, conserved features of PercomORF. SU and TM subunits are indicated at the bottom of the figure; green vertical lines represent cysteines, with approximate positions of the CXXC motifs in SU, and the CX 6 CC motif in TM indicated; fusion peptide (fp), immunosuppressive domain (ISD), and transmembrane region (tm) are shown as patterned boxes; SP, signal peptide; Y, predicted N-glycan site, with parentheses at partially conserved site, and asterisk marking the "heptad stutter"; ct, cytoplasmic tail. The blue bar above TM represents the partial sequence from the stickleback that was included in a study on filoviruses in mammalian genomes (Belyi et al. 2010 ). Intact Retroviral Env Gene in Fishes . doi:10.1093/molbev/msw262 MBE present in some species as an isolated gene with no discernible proviral sequence flanking it (Schlecht-Louf et al. 2010 ). An analysis of Fv-1 orthologous sites suggests that the locus has undergone multiple insertions, deletions, and rearrangements in various lineages, which may have caused the loss of proviral sequence in mice while the gene itself was maintained under selection (GR Young and JP Stoye, personal communication). The location of percomORF within the intron of DNAJC6 allowed us to compare the flanking sequence among the 20 orthologs. Similar to Fv-1, the regions flanking percomORF are very poorly conserved both in sequence and in length ( fig. 1) , consistent with loss of proviral sequence due to degeneration of neutral sites over a long period of evolution. By the same reasoning, it is likely that any markings of noncanonical integration, such as the inverted repeats that accompany LINE-1-mediated insertion, would be erased by mutations. Notably, the much "younger" ERVWE1 locus-a co-opted env gene originating from a retroviral insertion in the genome of a hominoid ancestor 25-40 Ma-is part of a provirus in which the gag and pol genes have acquired multiple inactivating mutations (Mallet et al., 2004) . Among the features that percomORF does share with filovirus glycoproteins is a rigidly conserved predicted N-glycan site in the N-terminal portion of TM ( fig. 1 and Near et al. (2012) and based on 9 nuclear genes and 36 fossil age constraints. Orders in which percomORF or an empty site was found are indicated by red boxes and black boxes, respectively. Lineages that are part of the Ovalentaria clade are indicated. The S, Pg, and N (top) signify the Silurian, Paleogene, and Neogene geological periods. Henzy et al. . doi:10.1093/molbev/msw262 MBE which has been associated with viruses that infect via an endocytic pathway (Igonet et al. 2011) . However this feature is typical of nonmammalian retroviruses (Henzy and Johnson 2013) , as well as filoviruses (Koellhoffer et al. 2012) , snake arenaviruses (Higgins et al. 2014; Koellhoffer et al. 2014) , influenza virus, and SARS coronavirus (Supekar et al. 2004 ). On phylogenetic trees, gammaretrovirus-like env sequences form two large clades that reflect the presence or absence of the predicted heptad stutter. The PercomORF TM forms a wellsupported branch within the heptad stutter superclade ( fig. 3) , which also includes a syncytin from squirrel-related rodents, MAR-1 (Redelsperger et al. 2014 ). Basal to the PercomORF clade is the TM sequence from the Japanese eel, possibly representing the original viral lineage from which percomORF arose. Exapted retroviral glycoproteins described in the literature have been associated with several functions (Jern and Coffin 2008; Lavialle et al. 2013; Johnson 2015; Malfavon-Borja and Feschotte 2015) . Fv-4, for example, is a partial retroviral glycoprotein sequence in mice that expresses a defective envelope protein (Ikeda and Sugimura 1989 ) that binds to its cognate receptor, downregulating or blocking its expression (Kai et al. 1986 ) and thereby protecting the host from infection by related retroviruses (Limjoco et al. 1993) . Other examples exploit the fusogenicity of the glycoprotein. Mammalian "syncytins", for example, are maintained in various lineages for their ability to fuse syncytiotrophoblast cells during formation of the placenta (Lavialle et al. 2013) , and appear to contribute to myoblast fusion in mice (Redelsperger et al. 2016) . Syncytin-mediated cell-cell fusion is mechanistically analogous to membrane fusion during entry of the virus into the target cell. Thus, syncytins must maintain the fusion-mediating domains of retroviral Env proteins in addition to receptor-binding regions. The maintenance of fusionrelated motifs in percomORF suggests that it could provide a fusogenic function. Moreover, the predicted cytoplasmic tail (CT) of percomORF is unusually short (supplementary fig. S1 , Supplementary Material online) compared with that of related retroviral glycoproteins. The C-terminal 16 amino acids of the CT of murine leukemia viruses (MLVs) is cleaved by a viral protease to activate fusion (Yang and Compans 1997) , and in HIV-1, truncation of the CT has been shown to increase cell-cell fusion in vitro (Kolchinsky et al. 2001 ). Therefore, a truncated CT in percomORF could represent selection for fusogenicity. The divergence of the Percomorpha was associated with an expansion in the diversity of reproductive modes. Although the majority of its members are lecithotrophic (deriving nutrients from the yolk), the Percomorpha are the only group within the Acanthomorpha that includes taxa that have evolved placentation and viviparity (Wourms 1981) . Specifically, most of the taxa with these traits belong FIG. 3 . Relationship of PercomORF to other gammaretroviral-like Env proteins. A neighbor-joining tree was generated (midpoint-rooted) using GENEIOUS v. 6.0.4, based on an amino acid alignment spanning the TM coding region, excluding the cytoplasmic tail. PercomORF sequences (collapsed into a clade represented by the gray triangle) cluster with the group of gammaretroviral-like Env proteins that carry a "heptad stutter" in the N-terminal heptad repeat. The lower clade consists of gammaretroviral sequences that do not include the stutter. All of the sequences besides PercomORF occur in the context of proviruses. Intact Retroviral Env Gene in Fishes . doi:10.1093/molbev/msw262 MBE to the clade "Ovalentaria", which means "sticky eggs", referring to the demersal eggs with adhesive filaments that are produced by taxa of this group . Ovalentaria are known to have very complex and highly derived reproductive strategies involving peculiar chorionic morphology (Reznick et al. 2007) , and have experienced the evolution and loss of placentas on numerous occasions (Reznick et al. 2002; Pollux et al. 2009 ). While the majority of percomORF-positive species from this limited study are not viviparous, reflecting the composition of the Percomorpha group as a whole, it is intriguing to speculate on a role for percomORF in the dynamic innovation in reproductive strategies that characterizes this group, possibly contributing to syncytia formation in placenta, "sticky eggs", or another chorionic feature. In conclusion, percomORF is to our knowledge the oldest intact gene of viral-older by as much as 50 My than syncytin-Car1, which entered the germline of carnivores an estimated 60-85 Ma (Cornelis et al. 2012) . The maintenance of an intact ORF and canonical functional motifs, its expression in stickleback embryos, together with dN/dS analyses strongly suggest that percomORF has been coopted by the host species. The role of similar retroviral env genes in chorionic structures and placenta formation in mammals, combined with percomORF's association with the only group of fish in which placentation evolved leads naturally to speculation on a syncytin-like role for percomORF. However, follow-up studies involving a wider range of taxa and morphological data; expression profiles in a variety of tissue types and developmental stages; and assays for fusogenicity of the protein, are needed to assess its role in the evolution of ray-finned fishes. Supplementary methods S1, tables S1 and S2 and figure S1 are available at Molecular Biology and Evolution online. Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes The tree of life and a new classification of bony fishes Studies on the role of the V3 loop in human immunodeficiency virus type 1 envelope glycoprotein function Ancestral capture of syncytin-Car1, a fusogenic endogenous retroviral envelope gene involved in placentation and conserved in Carnivora Similar structural models of the transmembrane proteins of Ebola and avian sarcoma viruses Betaretroviral envelope subunits are noncovalently associated and restricted to the mammalian class Pushing the endogenous envelope Influence of a heptad repeat stutter on the pH-dependent conformational behavior of the central coiled-coil from influenza hemagglutinin HA2 Endogenous non-retroviral RNA virus elements in mammalian genomes X-ray structure of the arenavirus glycoprotein GP2 in its postfusion hairpin conformation Fv-4 resistance gene: a truncated endogenous murine leukemia virus with ecotropic interference properties Covalent modifications of the Ebola virus glycoprotein Effects of retroviruses on host genome function Endogenous retroviruses in the genomics era Relationship between the cellular resistance to Friend murine leukemia virus infection and the expression of murine leukemia virus-gp70-related glycoprotein on cell surface of BALB/c-Fv-4wr mice Endogenous viral elements in animal genomes Structural characterization of the glycoprotein GP2 core domain from the CAS virus, a novel arenavirus-like species Crystal structure of the Marburg virus GP2 core domain in its postfusion conformation Increased neutralization sensitivity of CD4-independent human immunodeficiency virus variants Paleovirology of "syncytins", retroviral env genes exapted for a role in placentation Transgenic Fv-4 mice resistant to Friend virus Fighting fire with fire: endogenous retrovirus envelopes as restriction factors The endogenous retroviral locus ERVWE1 is a bona fide gene involved in hominoid placental physiology Resolution of rayfinned fish phylogeny and timing of diversification Localization of the labile disulfide bond between SU and TM of the murine leukemia virus envelope protein complex to a highly conserved CWLC motif in SU that resembles the active-site sequence of thiol-disulfide exchange enzymes Evolution of placentas in the fish family Poeciliidae: an empirical study of macroevolution Capture of syncytin-Mar1, a fusogenic endogenous retroviral envelope gene involved in placentation in the Rodentia squirrel-related clade Genetic evidence that captured retroviral envelope syncytins contribute to myoblast fusion and muscle sexual dimorphism in mice Independent evolution of complex life history adaptations in two families of fishes, livebearing halfbeaks (zenarchopteridae, beloniformes) and poeciliidae (cyprinodontiformes) Independent origins and rapid evolution of the placenta in the fish genus Poeciliopsis Retroviral infection in vivo requires an immune escape virulence factor encrypted in the envelope protein of oncoretroviruses The UCSC Genome Browser database: 2016 update Structure of a proteolytically resistant core from the severe acute respiratory syndrome coronavirus S2 fusion protein Filoviruses are ancient and integrated into mammalian genomes Only five of 10 strictly conserved disulfide bonds are essential for folding and eight for function of the HIV-1 envelope glycoprotein The evolution of pharyngognathy: a phylogenetic and functional appraisal of the pharyngeal jaw key innovation in labroid fishes and beyond Isomerization of the intersubunit disulphide-bond in Env controls retrovirus fusion Crystal structure of the Ebola virus membrane fusion subunit, GP2, from the envelope glycoprotein ectodomain Viviparity: the maternal-fetal relationship in fishes Analysis of the murine leukemia virus R peptide: delineation of the molecular determinants which are important for its fusion inhibition activity This work was supported in part by National Institutes of Health grant AI083118 (W.E.J.) and by start-up funds provided by Boston College.