key: cord-0336565-j2ugoquy authors: Lasso, Gorka; Honig, Barry; Shapira, Sagi D. title: A sweep of earth’s virome reveals host-guided viral protein structural mimicry; with implications for human disease date: 2020-06-18 journal: bioRxiv DOI: 10.1101/2020.06.18.159467 sha: 5685587174cd71c139ec68a9587bcadc30300be8 doc_id: 336565 cord_uid: j2ugoquy Viruses deploy an array of genetically encoded strategies to coopt host machinery and support viral replicative cycles. Molecular mimicry, manifested by structural similarity between viral and endogenous host proteins, allow viruses to harness or disrupt cellular functions including nucleic acid metabolism and modulation of immune responses. Here, we use protein structure similarity to scan for virally encoded structure mimics across thousands of catalogued viruses and hosts spanning broad ecological niches and taxonomic range, including bacteria, plants and fungi, invertebrates and vertebrates. Our survey identified over 6,000,000 instances of structural mimicry, the vast majority of which (>70%) cannot be discerned through protein sequence. The results point to molecular mimicry as a pervasive strategy employed by viruses and indicate that the protein structure space used by a given virus is dictated by the host proteome. Interrogation of proteins mimicked by human-infecting viruses points to broad diversification of cellular pathways targeted via structural mimicry, identifies biological processes that may underly autoimmune disorders, and reveals virally encoded mimics that may be leveraged to engineer synthetic metabolic circuits or may serve as targets for therapeutics. Moreover, the manner and degree to which viruses exploit molecular mimicry varies by genome size and nucleic acid type, with ssRNA viruses circumventing limitations of their small genomes by mimicking human proteins to a greater extent than their large dsDNA counterparts. Finally, we identified over 140 cellular proteins that are mimicked by CoV, providing clues about cellular processes driving the pathogenesis of the ongoing COVID-19 pandemic. Viruses deploy an array of genetically encoded strategies to coopt host machinery and support viral 20 replicative cycles. Molecular mimicry, manifested by structural similarity between viral and endogenous 21 host proteins, allow viruses to harness or disrupt cellular functions including nucleic acid metabolism and 22 modulation of immune responses. Here, we use protein structure similarity to scan for virally encoded 23 structure mimics across thousands of catalogued viruses and hosts spanning broad ecological niches and 24 taxonomic range, including bacteria, plants and fungi, invertebrates and vertebrates. Our survey identified 25 over 6,000,000 instances of structural mimicry, the vast majority of which (>70%) cannot be discerned 26 through protein sequence. The results point to molecular mimicry as a pervasive strategy employed by 27 viruses and indicate that the protein structure space used by a given virus is dictated by the host proteome. Interrogation of proteins mimicked by human-infecting viruses points to broad diversification of cellular 29 pathways targeted via structural mimicry, identifies biological processes that may underly autoimmune 30 disorders, and reveals virally encoded mimics that may be leveraged to engineer synthetic metabolic circuits 31 or may serve as targets for therapeutics. Moreover, the manner and degree to which viruses exploit 32 molecular mimicry varies by genome size and nucleic acid type, with ssRNA viruses circumventing 33 limitations of their small genomes by mimicking human proteins to a greater extent than their large dsDNA 34 counterparts. Finally, we identified over 140 cellular proteins that are mimicked by CoV, providing clues 35 about cellular processes driving the pathogenesis of the ongoing COVID-19 pandemic. Viruses deploy an array of genetically encoded strategies to coopt host machinery and support viral 39 replicative cycles. Among the strategies, protein-protein interactions, mediated by promiscuous, 40 multifunctional viral proteins are widely documented. Targeted discovery tools, focused largely on viruses 41 of public-health importance, have experimentally mapped thousands of virus-host protein complexes and evolutionary relationships 1,7 as well as uncover virus-encoded structural mimics that cannot be detected by 53 sequence relationships (see Methods). Here, we use protein structure similarity to identify virally encoded 54 mimics of host proteomes. Briefly, we first employ sequence-based methods to identify proteins that have 55 similar structures to queried viral proteins and then use structural alignment to find "structural neighbors" 56 of viral proteins (Figure 1a) . We refer to the corresponding viral proteins as mimics of their host-encoded 57 neighbors. We applied the approach to a set of 337,493 viral proteins representing 7,486 viruses across a 58 broad host taxonomic range, including bacteria, plants and fungi, invertebrates and vertebrates. Our survey 59 identified over 6,000,000 instances of structural mimicry, the vast majority of which (>70%) cannot be 60 discerned through protein sequence alone (see below). Our results point to molecular mimicry as a 61 pervasive strategy employed by viruses and indicate that the protein structure space used by a given virus 62 is dictated, at least in part, by the host proteome. We further observe that the manner and degree to which viruses exploit molecular mimicry varies by 65 genome size and nucleic acid type. For example, while human-infecting and arthropod-infecting viruses 66 occupy a structure space most similar to their host proteome, arboviruses, which are transmitted to humans 67 by insect vectors, encode promiscuous proteins that mimic both human and insect proteins. In addition, we 68 find that, relative to their proteome size, ssRNA viruses, including coronaviruses (CoV), have circumvented 69 the limitations of their small genomes by mimicking human proteins to a greater extent than their large 70 dsDNA counterparts like Pox-and Herpes-viruses. Interrogation of proteins mimicked by human-infecting 71 viruses points to broad diversification of cellular pathways targeted via structural mimicry, identifies 72 biological processes that may underly autoimmune disorders, and reveals virally encoded mimics that may 73 be leveraged to engineer synthetic metabolic circuits or may serve as targets for therapeutics. Finally, we 74 identified over 140 cellular proteins (including members of the complement activation pathway and critical 75 regulators of innate and adaptive immunity) that are mimicked by CoV, providing clues about the cellular 76 processes underlying the pathogenesis driving the ongoing COVID-19 pandemic. Mining the virome-wide structure space to identify host proteins mimicked by viruses In order to identify mimicry relationships, we implemented the strategy summarized in Figure 1a . non-vertebrate-infecting viruses ranged from 31% to 38% (Figure S1a; Table S2 ). As shown in Figure S1b To measure structure similarity between protein structures, we utilized Ska, an extensively utilized and 90 validated tool for inference of structure-based functional relationships even in the absence of detectable 91 sequence similarity 1,8-13 . In addition to Ska, we employed a conservative global structural similarity criteria 92 (Structural Alignment Score 14-16 , SAS < 2.5Å) to infer structural mimics and minimize biases imposed by 93 local structural similarities (see Methods). As shown in Figure S2, Table S3 ). Yet, while these viruses harbor a large 159 number of human protein mimics, they do not mimic cellular components of RNA and DNA metabolism -160 likely reflecting the fact that they encode polymerases (as is the case of Poxviridae) and/or integrate into 161 the cellular genome and are therefore bystander participants in cellular DNA replication. We observe an enrichment of cytokine related pathways in poxvirus-and herpesvirus-mimicked proteins, and chemokine-related pathways in proteins mimicked by herpesviruses (Table S3 and We identified 145 cellular proteins that are mimicked by three or more coronaviruses (data available at Lastly, as discussed above, we find that structural mimicry of complement components is a feature shared 236 across all coronaviruses that were part of this study (including SARS-CoV-2). The complement system is Our results demonstrate that regardless of genome size, replicative cycle, or ecological niche, and the 290 evolutionary pressures imposed on viruses is reflected in the structure space they occupy and illustrate that 291 mimicry may both constrain and enable host range. Of note, while structural similarity between viral and 292 host proteins serves as a scaffold enabling viral proteins to coopt host pathways, amino acid identities at 293 key functional sites will ultimately shape the extent to which host pathways can be intervened. Our 294 observations offer a unique first step to investigating the role of amino acid variability at mimicked 295 functional sites that might help explain differences in phenotypic outcome associated with viral infections. Finally, the repertoire of structural mimics we discover opens new opportunities to identify potential A Structure-Informed Atlas of Human-Virus Interactions Viral Mimicry to Usurp Ubiquitin and SUMO Host Pathways. 447 Viruses Viral mimicry of cytokines, chemokines and their receptors Pathogen mimicry of host protein-protein 451 interfaces modulates immunity The evolutionary conundrum of pathogen mimicry Mechanisms of immunomodulation by mammalian and viral 456 decoy receptors: insights from structures What does structure tell us about virus 459 evolution? A computational interactome and functional annotation for the 461 human proteome Structure-based prediction of ligand-protein 463 interactions on a genome-wide scale Structural relationships among proteins with different 466 global topologies and their implications for function annotation strategies Structure-based prediction of protein-protein interactions on a 469 genome-wide scale Using multiple structure alignments, fast model building, and energetic 471 analysis in fold recognition and homology modeling An integrated approach to the analysis and modeling of protein 474 sequences and structures. III. A comparative study of sequence conservation in protein 475 structural families using multiple structural alignments FragBag, an accurate representation of protein 478 structure, retrieves structural neighbors from the entire PDB quickly and accurately Comprehensive evaluation of protein structure 481 alignment methods: scoring by geometric measures Structural similarity of DNA-binding domains of 484 bacteriophage repressors and the globin core Visual evidence of horizontal gene transfer between plants and 487 bacteria in the phytosphere of transplastomic tobacco Poxvirus protein evolution: family 490 wide assessment of possible horizontal gene transfer events Regulation of CXCR4 signaling Molecular mimicry between dengue virus 495 and coagulation factors induces antibodies to inhibit thrombin activity and enhance 496 fibrinolysis Molecular mimicry between virus and host and its implications for 498 dengue disease pathogenesis. PubMed. comprises. more. than. 30 million. citations. for. 499 biomedical. literature. from. MEDLINE, life. science. journals., and online. books Review: Viral infections and mechanisms of thrombosis and 502 bleeding Negative Regulation of Cytosolic Sensing of DNA Sensing of RNA viruses: a review of innate immune 506 receptors involved in recognizing RNA virus invasion Multiple enzymatic activities associated with severe acute respiratory 509 syndrome coronavirus helicase Determination of host proteins composing the microenvironment of 514 coronavirus replicase complexes by proximity-labeling Interaction between SARS-CoV helicase and a multifunctional cellular 517 protein (Ddx5) revealed by yeast and mammalian cell two-hybrid systems PARP9 and PARP14 cross-regulate macrophage activation via STAT1 ADP-520 ribosylation 522 Severe acute respiratory syndrome coronavirus open reading frame (ORF) 3b, ORF 6, 523 and nucleocapsid proteins function as interferon antagonists Identification of Immune complement function as a determinant of 526 adverse SARS-CoV-2 infection outcome. medRxiv Structural principles within the human-virus protein-protein 529 interaction network HMI-PRED: A Web Server for Structural Prediction of Host-532 Microbe Interactions Based on Interface Mimicry Interface-Based Structural 535 Prediction of Novel Host-Pathogen Interactions Linking Virus Genomes with Host Taxonomy The NCBI Taxonomy database Cd-hit: a fast program for clustering and comparing large sets of 542 protein or nucleotide sequences CDD/SPARCLE: functional classification of proteins via 545 subfamily domain architectures The Protein Data Bank Basic local alignment 549 search tool HHblits: lightning-fast iterative protein 551 sequence searching by HMM-HMM alignment EMBOSS: the European Molecular Biology Open 554 Software Suite Univariate Discrete Distributions UniProt: the Universal Protein knowledgebase SIFTS: Structure Integration with Function, Taxonomy and Sequences 560 resource Extending the accuracy limits of prediction for side-chain 562 conformations GraphPad Software Bioinformatics enrichment tools: paths 565 toward the comprehensive functional analysis of large gene lists Systematic and integrative analysis of 568 large gene lists using DAVID bioinformatics resources 51 gplots: Various R Programming Tools for Plotting Data Comparison of the 574 structure of vMIP-II with eotaxin-1, RANTES, and MCP-3 suggests a unique mechanism 575 for CCR3 activation Kaposi's sarcoma-associated herpesvirus encodes a functional cyclin A structural viral mimic of prosurvival Bcl-2: a pivotal role for 579 sequestering proapoptotic Bax and Bak Structural basis for chemokine recognition and 582 activation of a viral G protein-coupled receptor