key: cord-0866883-tk7eturq authors: Berrio, Alejandro; Gartner, Valerie; Wray, Gregory A title: Positive selection within the genomes of SARS-CoV-2 and other Coronaviruses independent of impact on protein function date: 2020-09-22 journal: bioRxiv DOI: 10.1101/2020.09.16.300038 sha: 0d5bc40fcc656609cce33263dc552fda68d5120b doc_id: 866883 cord_uid: tk7eturq Background The emergence of a novel coronavirus (SARS-CoV-2) associated with severe acute respiratory disease (COVID-19) has prompted efforts to understand the genetic basis for its unique characteristics and its jump from non-primate hosts to humans. Tests for positive selection can identify apparently nonrandom patterns of mutation accumulation within genomes, highlighting regions where molecular function may have changed during the origin of a species. Several recent studies of the SARS-CoV-2 genome have identified signals of conservation and positive selection within the gene encoding Spike protein based on the ratio of synonymous to nonsynonymous substitution. Such tests cannot, however, detect changes in the function of RNA molecules. Methods Here we apply a test for branch-specific oversubstitution of mutations within narrow windows of the genome without reference to the genetic code. Results We recapitulate the finding that the gene encoding Spike protein has been a target of both purifying and positive selection. In addition, we find other likely targets of positive selection within the genome of SARS-CoV-2, specifically within the genes encoding Nsp4 and Nsp16. Homology-directed modeling indicates no change in either Nsp4 or Nsp16 protein structure relative to the most recent common ancestor. Thermodynamic modeling of RNA stability and structure, however, indicates that RNA secondary structure within both genes in the SARS-CoV-2 genome differs from those of RaTG13, the reconstructed common ancestor, and Pan-CoV-GD (Guangdong). These SARS-CoV-2-specific mutations may affect molecular processes mediated by the positive or negative RNA molecules, including transcription, translation, RNA stability, and evasion of the host innate immune system. Our results highlight the importance of considering mutations in viral genomes not only from the perspective of their impact on protein structure, but also how they may impact other molecular processes critical to the viral life cycle. An important challenge in understanding zoonotic events is identifying the genetic changes that To identify branch specific positive selection, it is necessary to obtain a query and a reference 102 alignment. We downloaded six high quality reference genomes from the subgenus Sarbecovirus 103 (Table 1 ). Next, we used MAFFT (Katoh & Standley, 2013) (Kearse et al., 2012) with default settings to build a sequence alignment. Next, we refined the 105 alignment using a gene by gene procedure. foreground branch is evolving at faster rates than the expectation from the background species. 111 We performed a selection analysis on sliding windows of 300 bp with a step of 150 bp along a 112 sequence alignment of 5 reference genome sequences of coronaviruses of the subgenus 113 Sarbecovirus and two sequences of Pangolin Coronavirus recently published (Liu, Chen & Chen, 114 2019; Lam et al., 2020) . This procedure generates partitions where a tree topology can be fitted. 115 To investigate the extent of positive selection or branches with long substitution rates along the 116 SARS-CoV-2 genome, we used a branch-specific method known as adaptiPhy that was initially Pa_CoV_Guangxi_P4L), (Bat_CoV_LYRa11, SARS_CoV)), Bat_CoV_BM48). This method is 131 highly sensitive and specific and can differentiate between positive selection and relaxation of 132 constraint. AdaptiPhy requires at least 3 kb reference alignment for each species that is used as a 133 putatively neutral proxy for computing substitution rates. Viruses' genomes lack non-functional 134 regions, therefore, the most reasonable proxy for neutral evolution has to be found in the regions 135 outside the query window. To do this, we concatenated twenty regions of 300 bp of the viral 136 genome alignment that were drawn randomly with replacement from the entire genome 137 alignment. Then, for each query alignment, we built a reference alignment of 6 kb as it produces 138 a stable evolutionary standard of recombination rates. To control for the stochasticity of the 139 evolutionary process, we run each query against ten bootstrapped samples of reference 140 alignments. Finally, we used a custom R script to compute the likelihood ratio, which was used 141 as a test statistic for a chi-squared test with one degree of freedom to calculate a P-value for each 142 query. Then, we corrected the distribution of all P-values per query region using the p.adjust() R 143 function with the fdr method. Next, we classified a query window to be under positive selection 144 if the P-adjusted value was < 0.05. We were unable to successfully run adaptiPhy on two 145 windows because the outgroup species (Bat_CoV_BM48) contained a deletion of 406 bp relative 146 to SARS-CoV-2, which spans the entire ORF8. Next, we calculated the distribution of substitution rates for each branch and nodes in each query 148 and reference sequence using phyloFit (Hubisz, Pollard & Siepel, 2011 (Wong & Nielsen, 2004) . To test for conservation, we used the phastCons computational method from PHAST (Siepel et Positive and negative selection are highly localized within coronavirus genomes 204 We tested for branch-specific selection on nucleotide sequences in coronavirus genomes, The fourth signal is located in a segment encoding the S2 and S2' subunits that includes the boundary region between the S1 and the S2 subunits (Fig 1) , a region that includes the Genes encoding Nsp4 and Nsp16 contain branch-specific signals of positive selection 260 We also detected two shorter signals of positive selection within the SARS-CoV-2 261 genome that are located outside of the S gene, in ORF1a and ORF1b (Fig 1A) . Interestingly, 262 both encode small proteins that contribute to viral replication. The first is Nsp4, which encodes a similarities to, but also notable differences from, that of SARS-Cov-2 (Fig 1) . In both species, S CoV-RaTG13)))). Recombination from a divergent species should produce an incongruent 334 topology in one or more adjacent windows, revealing a recombined region and its approximate 335 breakpoints. We identified 12 regions where the topology differed from the expected (Fig 6) . Of (Fig 1C and 3A) . In contrast, signals of positive selection in 407 SARS-CoV-2 and Bat-CoV-RaTG13 are concentrated in the domain that mediates binding to the 408 host receptor ACE2 (Fig 1C and 3A) . These distinct distributions suggest that modifications in 409 different aspects of Spike function took place as various coronaviruses adapted to novel hosts. In Importantly, we also detected signals of positive selection in two additional regions of the 414 SARS-CoV-2 genome, specifically within the genes encoding Nsp4 and Nsp16 (Fig 1A) . Of and RNA structure (Fig 4 and 5) . In the case of Nsp4 protein, two nearly adjacent nonsynonymous substitutions at residues 380 430 and 382 occurred on the branch leading to SARS-CoV-2 ( Fig 3B) . These both involve changing 431 side chains with similar biochemical properties, respectively valine to alanine and valine to 432 isoleucine. Homology-directed modeling of protein structure suggests that these two amino acid 433 substitutions have very little impact on either secondary or tertiary structure when comparing the 434 SARS-CoV-2 protein orthologue to those of the other species examined (Fig 4A) . In the case of 435 Nsp16 protein, no nonsynonymous substitutions evolved on the branch leading to SARS-CoV-2. Thus, the signal of positive selection within Nsp4 is unlikely to reflect changes in protein 437 structure or function, while the signal within Nsp16 cannot affect either because the encoded 438 polypeptide is identical. With highly similar and identical protein structures predicted for Nsp4 and Nsp16, respectively, In silico identification of conserved cis -acting RNA elements in the SARS-493 regulate antagonism of IRF3 and NF-kappaB signaling Ubiquitination-mediated regulation of interferon responses Coronavirus Spike Proteins in Viral Entry Functional RNA elements in the dengue 563 virus genome Recombination, reservoirs, and the modular spike: mechanisms of 565 coronavirus cross-species transmission The Vienna RNA websuite Membrane 571 rearrangements mediated by coronavirus nonstructural proteins 3 and 4. Virology Mobility 574 and Interactions of Coronavirus Nonstructural Protein 4 Visualizing genomic data using Gviz and bioconductor Molecular Biology Promoter regions of many 579 neural-and nutrition-related genes have experienced positive selection during human 580 evolution Temporal dynamics in viral shedding and transmissibility of 584 COVID-19 Structure of 586 replicating SARS-CoV-2 polymerase A Multibasic Cleavage Site in the Spike 588 Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells Evidence of the Recombinant Origin of a Bat Severe Acute Respiratory Syndrome 592 (SARS)-Like Coronavirus and Its Implications on the Direct Ancestor of SARS Discovery of 596 a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of 597 SARS coronavirus PHAST and RPHAST: phylogenetic analysis with 599 space/time models Coronavirus Spike Protein and Tropism Changes Advances in Virus Research Comprehensive in-vivo 604 secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and 605 mechanisms. bioRxiv : the preprint server for biology Comprehensive in-vivo 608 secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and 609 mechanisms. bioRxiv : the preprint server for biology MAFFT Multiple Sequence Alignment Software Version 7: 612 Improvements in Performance and Usability Geneious 616 Basic: an integrated and extendable desktop software platform for the organization and 617 analysis of sequence data The Phyre2 web portal for 620 protein modeling, prediction and analysis The Architecture of SARS-CoV-623 Tracking 628 Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-629 RAxML-NG: A fast, 631 scalable and user-friendly tool for maximum likelihood phylogenetic inference Identifying SARS-CoV-2-related coronaviruses in Malayan 637 pangolins Severe Acute Respiratory Syndrome (SARS) Horseshoe Bats through Recombination Emergence of SARS-CoV-2 through recombination and strong 646 purifying selection The divergence between SARS-CoV-2 and RaTG13 might be overestimated due to the extensive RNA modification Viral Metagenomics Revealed Sendai Virus and Coronavirus 651 Infection of Malayan Pangolins (Manis javanica) ViennaRNA Package 2.0 Severe acute respiratory syndrome-associated coronavirus 3a protein forms an ion 658 channel and modulates virus release Cleavage Inhibition of 661 the Murine Coronavirus Spike Protein by a Furin-Like Enzyme Affects Cell-Cell but Not 662 Virus-Cell Fusion Downloaded from Mechanism and structural diversity of exoribonuclease-666 resistant RNA structures in flaviviral RNAs Coronavirus cis-Acting RNA Elements The SARS coronavirus papain like protease 672 can inhibit IRF3 at a post activation step that requires deubiquitination activity Coronavirus non-structural protein 16: Evasion, 675 attenuation, and possible treatments The SARS Coronavirus 3a 678 protein causes endoplasmic reticulum stress and induces ligand-independent 679 downregulation of the type 1 interferon receptor Viral innate immune evasion and the pathogenesis of emerging 682 RNA virus infections The ratio of replacement to silent divergence and tests of neutrality Topology and Membrane Anchoring of the Coronavirus Replication Complex: 687 Not All Hydrophobic Domains of nsp3 and nsp6 Are Membrane Spanning Emerging SARS-CoV-2 mutation hot 691 spots include a novel RNA-dependent-RNA polymerase variant Structure of the human metapneumovirus polymerase 695 phosphoprotein complex Clinical progression and viral load in a community outbreak of coronavirus-699 associated SARS pneumonia: A prospective study The Coding Region of the HCV 702 Genome Contains a Network of Regulatory RNA Structures Estimating variability in the transmission of severe 705 acute respiratory syndrome to household contacts in Hong Kong HyPhy: hypothesis testing using phylogenies HyPhy 711 2.5-A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related 715 viruses: a first look Comparative analysis of coronavirus genomic RNA structure reveals 718 conservation in SARS-like coronaviruses. bioRxiv : the preprint server for biology In search of molecular darwinism Evolutionarily conserved elements in vertebrate, insect, worm, and yeast 725 genomes The Nonstructural Proteins Directing Coronavirus RNA 727 Synthesis and Processing The Severe Acute Respiratory Syndrome Coronavirus 3a Protein Up-Regulates Expression of Fibrinogen in Lung Epithelial Cells On the origin and continuing evolution of SARS-CoV-2 Structural insights into coronavirus entry SARS-CoV-2 genomic 739 variations associated with mortality rate of COVID-19 Inhibition of IRF3-dependent antiviral responses by 742 cellular and viral proteins Positive 744 selection of ORF3a and ORF8 genes drives the evolution of SARS-CoV-2 during the 2020 Enhanced receptor binding of SARS-CoV-2 through networks of 747 hydrogen-bonding and hydrophobic interactions Structural and Functional Basis of SARS-CoV-2 Entry by Using 752 Exploitation of glycosylation in 754 enveloped virus pathobiology Detecting selection in noncoding regions of nucleotide 757 sequences Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Cryo-EM analysis of a feline coronavirus spike protein 763 reveals a unique structure and camouflaging glycans Mutation-selection models of codon substitution and their use to 766 estimate selective strengths on codon usage FATCAT: a web server for flexible structure comparison and structure 769 similarity searching Coronavirus Open Reading Frame-3a drives multimodal necrotic cell death The short-774 and long-range RNA-RNA Interactome of SARS-CoV-2 Co-first authors Ribose 2'-O-778 methylation provides a molecular signature for the distinction of self and non-self mRNA 779 dependent on the RNA sensor Mda5