key: cord-0859449-l1kiz2b2 authors: Nikolaidis, Marios; Markoulatos, Panayotis; Van de Peer, Yves; Oliver, Stephen G; Amoutzias, Grigorios D title: The neighborhood of the Spike gene is a hotspot for modular intertypic homologous and non-homologous recombination in Coronavirus genomes date: 2021-10-12 journal: Mol Biol Evol DOI: 10.1093/molbev/msab292 sha: ce20c1713f6d84f2dad03d95608deedda7307304 doc_id: 859449 cord_uid: l1kiz2b2 Coronaviruses (CoVs) have very large RNA viral genomes with a distinct genomic architecture of core and accessory open reading frames (ORFs). It is of utmost importance to understand their patterns and limits of homologous and non-homologous recombination, because such events may affect the emergence of novel CoV strains, alter their host range, infection rate, tissue tropism pathogenicity, and their ability to escape vaccination programs. Intratypic recombination among closely related CoVs of the same subgenus has often been reported; however, the patterns and limits of genomic exchange between more distantly related CoV lineages (intertypic recombination) needs further investigation. Here, we report computational/evolutionary analyses that clearly demonstrate a substantial ability for CoVs of different subgenera to recombine. Furthermore, we show that CoVs can obtain—through non-homologous recombination—accessory ORFs from core ORFs, exchange accessory ORFs with different CoV genera, with other viruses (i.e., toroviruses, influenza C/D, reoviruses, rotaviruses, astroviruses) and even with hosts. Intriguingly, most of these radical events result from double-crossovers surrounding the Spike ORF, thus highlighting both the instability and mobile nature of this genomic region. While many such events have often occurred during the evolution of various CoVs, the genomic architecture of the relatively young SARS-CoV/SARS-CoV-2 lineage so far appears to be stable. facilitates binding to host receptors and so determines host-range, cell-tropism, and even the transition from a mild towards a highly pathogenic phenotype, via point-mutations and CoVs obtained a Spike ORF from an α-CoV ancestor. We also observed several cases of phylogenetic incongruence involving entire subgenera 143 (mostly in α-CoVs); they displayed a major shift in their phylogenetic position (for a certain 144 genomic region), as a monophyletic group. We interpret this as a major event that occurred in the 145 common ancestor of the representative sequences of that subgenus. Here, we only report cases 146 well supported by BioNJ, PhyML and Bayesian tree tanglegrams and also statistically supported In addition, we detected several low-confidence intertypic recombination events for α- CoV subgenera, where the incongruent sequences cluster with other subgenera, but with low 217 bootstrap/aLRT/posterior probability support. Here, either the donor is unknown or the 218 incongruence is due to rapid divergence; they were not considered further in our study. Finally, 2). Although no AOF was present in all four genera, three AOFs were present in some subgenera 235 of both α-and β-CoVs and three AOFs were present in subgenera of both γ-and δ-CoVs. Interestingly, three of these intergenus AOFs are localized in the neighborhood of the Spike ORF. We also identified four more interesting AOFs. p10, situated just after the nucleocapsid We also observed distinct subgenus-specific accessory ORF genomic architectures. These The table/matrix below it shows which genomic regions of the various subgenera are involved in intertypic recombination events. "GM" represents events that occurred at the common ancestor of the genus. "SgM" represents events that occurred at the common ancestor of the subgenus. "P" represents more recent events that occurred for one or few members of the subgenus and have resulted in a polyphyletic tree pattern (for that region and subgenus). All incongruence events in the matrix are supported by the three phylogenetic tree methods (NJ, PhyML, Bayesian) and are also statistically significant, based on the approximately unbiased test of CONSEL. Two phylogenetic trees (of ORF1ab and Spike) for all four genera are also included below the matrix, to visualize the recombination events of the Spike region. In these trees, we use stars to denote sub-genera that have been involved in intertypic homologous recombination events, in any genomic region (not only the Spike). Each column in the matrix represents a certain AOF. Red color (within the matrix cells) denotes the (TblastN) presence of an AOF that is also verified by a predicted ORF with length ≥30 aa, whereas if the length of the predicted ORF is <30 aa, then it is denoted with orange color. Stars JX163926 GU553365 DQ497008 SARS NC_004718 AY654624 AY613950 AY545917 AY545919 KP886808 MK211377 KY417142 KY417149 MK211375 FJ588686 DQ071615 KU973692 KF569996 JX993988 MK211374 JX993987 KJ473811 DQ412043 GQ153542 MT457396 LR824571 SARS CoV2 NC_045512 MT365033 MT709105 MN996532 MT121216 MT040334 MG772933 KY352407 NC_014470 NC_025217 NC_009021 EF065516 EF065515 MG762674 EF065514 MK211379 MT350598 NC_030886 MG693168 MG987421 KX442565 KJ473821 MG021452 MG021451 MG596803 MG596802 MERS NC_019843 MT576585 KT751244 MN507638 MF593268 NC_009020 MH002342 NC_009019 NC_039207 MK907286 FJ425190 FJ425188 FJ425184 FJ425187 DQ915164 EF424623 EF424621 MG518518 MH810163 AF220295 KU558922 KF906249 FJ938067 KX432213 OC43 NC_006213 MG977444 KY419106 EF446615 NC_017083 NC_046954 NC_026011 JF792616 MHV AY700211 MH687968 HKU1 NC_006577 100 denote AOFs that are present in both α-and β-CoV members, whereas diamonds denote an AOF that resulted from duplication of a core ORF. Downward arrows denote AOFs that have homologs in non-CoV genomes, together with their best PSI-Blast hit e-value. Horizontal orange bars (above the matrices) denote the genomic region where the AOF is located, i.e. S-E denotes the region between the Spike and Envelope ORFs. Each column in the matrix represents a certain AOF. Red color (within the matrix cells) denotes the (TblastN) presence of an AOFs that is also verified by a predicted ORF with length ≥30 aa, whereas if the length of the predicted ORF is <30 aa, then it is denoted with orange color. Inverted triangles denote AOFs that are present in both γ-and δ-CoV members. Downward arrows denote AOFs that have homologs in non-CoV genomes, together with their best PSI-Blast hit e-value. Horizontal orange bars (above the matrices) denote the genomic region where the AOF is located, i.e. M-N denotes the region between the Membrane and Nucleocapsid ORFs. Predicting the recombination 499 potential of severe acute respiratory syndrome coronavirus 2 and Middle 500 East respiratory syndrome coronavirus SPAdes: a new genome 503 assembly algorithm and its applications to single-cell sequencing A new 506 twenty-first century science for effective epidemic response Severe respiratory illness caused by 512 a novel coronavirus Shifts in global bat diversity suggest a possible 515 role of climate change in the emergence of SARS-CoV-1 and SARS-CoV-2 Recombination events are concentrated 518 in the spike protein region of Betacoronaviruses Porcine Deltacoronavirus Infection and Transmission in Poultry, United 521 States1 Robertson 523 DL. 2020. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage 524 responsible for the COVID-19 pandemic Porcine Epidemic Diarrhea Virus 527 and Discovery of a Recombinant Swine Enteric Coronavirus Multiple independent emergences of type 2 531 vaccine-derived polioviruses during a large outbreak in northern Nigeria Sequence and Structure 534 Analysis of Distantly-Related Viruses Reveals Extensive Gene Transfer 535 between Viruses and Hosts and among Viruses Recombinant avian infectious 537 bronchitis virus expressing a heterologous spike gene demonstrates that the 538 spike protein is a determinant of cell tropism Selection of conserved blocks from multiple alignments for their 540 use in phylogenetic analysis Emerging coronaviruses: Genome structure, replication, 542 and pathogenesis TreeDyn: towards dynamic 544 graphics and annotations for analyses of trees The species Severe acute respiratory syndrome-related coronavirus: 547 classifying 2019-nCoV and naming it SARS-CoV-2 Origin and evolution of pathogenic coronaviruses ProtTest 3: fast selection of best-fit 551 models of protein evolution Recombinant canine coronaviruses related 554 to transmissible gastroenteritis virus of Swine are circulating in dogs Bayesian Phylogenetics with 557 BEAUti and the BEAST 1.7 MERS-CoV recombination: implications about the 559 reservoir and potential for adaptation MUSCLE: multiple sequence alignment with high accuracy and high 561 throughput Search and clustering orders of magnitude faster than BLAST The "inverse relationship between evolutionary 565 rate and age of mammalian genes" is an artifact of increased genetic distance 566 with rate of evolution and time of divergence An efficient algorithm for large-scale 568 detection of protein families Bat Coronaviruses in China Viruses as vectors of horizontal transfer of genetic 573 material in eukaryotes Extensive 575 recombination-driven coronavirus diversification expands the pool of 576 potential pandemic pathogens Visual TreeCmp: Comprehensive 578 Comparison of Phylogenetic Trees on the Web Nidovirales: Evolving the 581 largest RNA virus genome SeaView version 4: A multiplatform graphical 583 user interface for sequence alignment and phylogenetic tree building Recombination, reservoirs, and the modular spike: 586 mechanisms of coronavirus cross-species transmission Evaluation of a 589 recombination-resistant coronavirus as a broadly applicable, rapidly 590 implementable vaccine platform Natural genetic exchanges between vaccine 593 and wild poliovirus strains in humans A simple, fast, and accurate algorithm to estimate large 595 phylogenies by maximum likelihood The 597 molecular virology of coronaviruses Surveillance of European Domestic 600 Pig Populations Identifies an Emerging Reservoir of Potentially Zoonotic 601 Swine Influenza A Viruses Feline 603 coronavirus type II strains 79-1683 and 79-1146 originate from a double 604 recombination between feline coronavirus type I and canine coronavirus Frontiers in Veterinary Science A 611 Bat-Derived Putative Cross-Family Recombinant Coronavirus Dendroscope 3: an interactive tool for rooted 614 phylogenetic trees and networks International Committee on Taxonomy of Viruses 616 (ICTV) In vivo RNA-RNA recombination of coronavirus in mouse brain SARS-CoV-2 evolution during 624 treatment of chronic infection Experimental evidence of recombination in 626 coronavirus infectious bronchitis virus MEGA X: Molecular Evolutionary 628 Genetics Analysis across Computing Platforms Retargeting of 630 coronavirus by substitution of the spike glycoprotein ectodomain: crossing 631 the host cell species barrier Identifying SARS-CoV-2-related coronaviruses in 634 Malayan pangolins Coronavirus 637 hemagglutinin-esterase and spike proteins coevolve for functional balance 638 and optimal virion avidity Origin and cross-species transmission of 641 bat coronaviruses in China Discovery and Sequence Analysis of Four Deltacoronaviruses 644 from Birds in the Middle East Reveal Interspecies Jumping with 645 Recombination as a Potential Mechanism for Avian-to-Avian and Avian Mammalian Transmission The Footprint of Genome Architecture in the Largest Genome 649 Expansion in RNA Viruses Partitioning the genetic diversity of a virus family: 651 approach and evaluation through a case study of picornaviruses Mesoniviridae: a proposed new family in the order 656 Nidovirales formed by a single species of mosquito-borne viruses Interactive Tree Of Life (iTOL) v4: recent updates and new 659 developments Broad receptor engagement of an 662 emerging global coronavirus may potentiate its diverse cross-species 663 transmissibility Animal origins of 666 the severe acute respiratory syndrome coronavirus: insight from ACE2-S-667 protein interactions Accessory proteins of 669 SARS-CoV and other coronaviruses Interaction of coronavirus 671 nucleocapsid protein with the 5'-and 3'-ends of the coronavirus genome is 672 involved in genome circularization and negative-strand RNA synthesis Analysing recombination in nucleotide 675 sequences RDP4: Detection and 677 analysis of recombination patterns in virus genomes Coronavirus genomic RNA packaging Detection of novel members, 681 structure-function analysis and evolutionary classification of the 2H 682 phosphoesterase superfamily Open questions in the study of de novo genes: what, 684 how and why Susceptibility of swine cells and 687 domestic pigs to SARS-CoV-2 A SARS-like cluster of 690 circulating bat coronaviruses shows potential for human emergence SARS-like WIV1-CoV 694 poised for human emergence 697 Identification of a novel coronavirus from a beluga whale by using a panviral 698 microarray Natural Infection by 701 SARS-CoV-2 in Companion Animals: A Review of Case Reports and Current 702 Evidence of Their Role in the Epidemiology of COVID-19 Evaluating Phylostratigraphic Evidence for Widespread 705 De Novo Gene Birth in Genome Evolution Parallelization of MAFFT for 707 large-scale multiple sequence alignments Atypical Divergence of SARS-CoV-2 709 Orf8 from Orf7a within the Coronavirus Lineage Suggests Potential Stealthy 710 Viral Strategies in Immune Evasion Recombination should not be an 712 afterthought Large-scale genomic analysis reveals 715 recurrent patterns of intertypic recombination in human enteroviruses Possibility for reverse zoonotic 719 transmission of SARS-CoV-2 to free-ranging wildlife: A case study of bats A recent origin of Orf3a from M protein across the coronavirus 722 lineage arising by sharp divergence Full-genome evolutionary analysis of the novel corona 726 virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent 727 recombination event Acute Respiratory Syndrome Coronavirus 2 Risks associated with the use of 732 live-attenuated vaccine poliovirus strains and the strategies for control and 733 eradication of paralytic poliomyelitis HyPhy: hypothesis testing using phylogenies One hundred years of poliovirus pathogenesis Middle East respiratory syndrome 742 coronavirus neutralising serum antibodies in dromedary camels: a 743 comparative serological study Comparison of phylogenetic trees Characterization of a novel 748 coronavirus associated with severe acute respiratory syndrome Acquisition of 751 macrophage tropism during the pathogenesis of feline infectious peritonitis 752 is determined by mutations in the feline coronavirus spike protein 755 Retrospective study, full-length genome characterization and evaluation of 756 viral infectivity and pathogenicity of chimeric porcine deltacoronavirus 757 detected in Vietnam Targeted recombination demonstrates that the spike gene 760 of transmissible gastroenteritis coronavirus is a determinant of its enteric 761 tropism and virulence A contemporary view of coronavirus 763 transcription Selective replication of 765 coronavirus genomes that express nucleocapsid protein Fact or fiction: updates on how protein-coding 768 genes might emerge de novo from previously non-coding DNA Cryo-Electron Microscopy Structure of Porcine Deltacoronavirus Spike 772 Protein in the Prefusion State Journal 775 of Virology Susceptibility of ferrets, cats, dogs, and other domesticated animals to 779 SARS-coronavirus 2 CONSEL: for assessing the confidence of 781 phylogenetic tree selection Why do RNA viruses recombine? Infection of dogs with SARS-CoV-2 Comparison of the genome 788 organization of toro-and coronaviruses: evidence for two nonhomologous 789 RNA recombination events during Berne virus evolution Continuous and Discontinuous RNA 792 Synthesis in Coronaviruses RNA-RNA and 794 RNA-protein interactions in coronavirus replication and transcription Cross-host evolution of severe acute respiratory 798 syndrome coronavirus in palm civet and human TreeGraph 2: combining and visualizing evidence from 801 different phylogenetic analyses Genetic Recombination, and Pathogenesis of Coronaviruses Bayesian 806 phylogenetic and phylodynamic data integration using BEAST 1.10. Virus 807 Evolution Prevalent Eurasian avian-like H1N1 swine influenza virus with 2009 811 pandemic viral genes facilitating human infection Nucleocapsid proteins 814 from other swine enteric coronaviruses differentially modulate PEDV 815 replication Maeda 817 K. 2014. Emergence of pathogenic coronaviruses in cats by homologous 818 recombination between feline and canine coronaviruses Evidence of 821 recombinant strains of porcine epidemic diarrhea virus, United States Shared Common 825 Ancestry of Rodent Alphacoronaviruses Sampled Globally Evaluating the Effects of SARS-CoV-2 828 Extensive Genetic Diversity And Host Range Of Rodent-Borne Coronaviruses. 832 Virus Evolution Coronavirus pathogenesis and the emerging 836 pathogen severe acute respiratory syndrome coronavirus Microglia are required for 839 protection against lethal coronavirus encephalitis in mice Wild birds as reservoirs for diverse and abundant 842 gamma-and deltacoronaviruses Global Epidemiology of Bat Coronaviruses. 844 Viruses 11 Coronavirus diversity, phylogeny and 846 interspecies jumping Discovery of seven novel Mammalian and avian 849 coronaviruses in the genus deltacoronavirus supports bat coronaviruses as 850 the gene source of alphacoronavirus and betacoronavirus and avian 851 coronaviruses as the gene source of gammacoronavirus and 852 deltacoronavirus Discovery of a novel bottlenose dolphin coronavirus reveals a distinct 855 species of marine mammal coronavirus in Gammacoronavirus Comparative analysis of twelve genomes of three novel 859 group 2c and group 2d coronaviruses reveals unique group and subgroup 860 features The emergence of SARS-CoV-2 in Europe and 863 North America A new coronavirus associated with human respiratory disease in 866 China Characterizing Transcriptional Regulatory 868 Sequences in Coronaviruses and Their Role in Recombination Rewiring the severe acute 871 respiratory syndrome coronavirus (SARS-CoV) transcription circuit: 872 engineering a recombination-resistant genome Structure of 875 coronavirus hemagglutinin-esterase offers insight into corona and influenza 876 virus evolution The 878 Short-and Long-Range RNA-RNA Interactome of SARS-CoV-2 Genetic interactions between 881 an essential 3' cis-acting RNA pseudoknot, replicase gene products, and the 882 extreme 3' end of the mouse coronavirus genome