key: cord-0891025-vk5uwl0t authors: Emam, Mohamed; Oweda, Mariam; Antunes, Agostinho; El-Hadidi, Mohamed title: Positive Selection as a Key Player for SARS-CoV-2 Pathogenicity: Insights into ORF1ab, S and E genes date: 2021-06-10 journal: Virus Res DOI: 10.1016/j.virusres.2021.198472 sha: 4657d0d986bf7d81e53e28e29f019ac3f58a3cf4 doc_id: 891025 cord_uid: vk5uwl0t The human β-coronavirus SARS-CoV-2 epidemic started in late December 2019 in Wuhan, China. It causes Covid-19 disease which has become pandemic. Each of the five-known human β-coronaviruses has four major structural proteins (E, M, N and S) and 16 non-structural proteins encoded by ORF1a and ORF1b together (ORF1ab) that are involved in virus pathogenicity and infectivity. Here, we performed detailed positive selection analyses for those six genes among the four previously known human β-coronaviruses and within 38 SARS-CoV-2 genomes to assess signatures of adaptive evolution using maximum likelihood approaches. Our results suggest that three genes (E, S and ORF1ab genes) are under strong signatures of positive selection among human β-coronavirus, influencing codons that are located in functional important protein domains. The E protein-coding gene showed signatures of positive selection in two sites, Asp 66 and Ser 68, located inside a putative transmembrane α-helical domain C-terminal part, which is preferentially composed by hydrophilic residues. Such Asp and Ser sites substitutions (hydrophilic residues) increase the stability of the transmembrane domain in SARS-CoV-2. Moreover, substitutions in the spike (S) protein S1 N-terminal domain have been found, all of them were located on the S protein surface, suggesting their importance in viral transmissibility and survival. Furthermore, evidence of strong positive selection was detected in three of the SARS-CoV-2 nonstructural proteins (NSP1, NSP3, NSP16), which are encoded by ORF1ab and play vital roles in suppressing host translation machinery, viral replication and transcription and inhibiting the host immune response. These results are insightful to assess the role of positive selection in the SARS-CoV-2 encoded proteins, which will allow to better understand the virulent pathogenicity of the virus and potentially identifying targets for drug or vaccine strategy design The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) epidemic emerged in early December 2019 in Wuhan, Hubei Province, China . The disease that is caused by this virus has been termed for the year 2019) by the World Health Organization (WHO) on February 19, 2020. The '19' in COVID-19 stands for the year 2019. Taxonomically, SARS-CoV-2 belongs to the existing species Severe acute respiratory syndromerelated coronavirus as determined by the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) on February 2, 2020. The species is a member of the genus Betacoronavirus and the family Coronaviridae (Coronaviridae Study Group of the ICTV, 2020). Since last December, COVID-19 has rapidly spread across different areas in China and subsequently many countries causing pandemic. The major clinical symptoms of the disease in patients are fever, pneumonia, dry cough, headache, and dyspnea. The progression of the disease may result in progressive respiratory failure due to alveolar damage and even may lead death . The virus is highly transmissible among humans and infected individuals may shed the virus efficiently in the first week of infection when they are asymptomatic or show mild symptoms (Wölfel et al., 2020) . SARS-CoV-2 is also possibly transmissible to pangolins (Choo et al., 2020) , ferrets and cats ; with cats being highly susceptible to the virus air born infection. As of 18 of April 2021, the global total confirmed COVID-19 cases is 140,322,903 and deaths is 3,003,794 https://coronavirus.jhu.edu/map.html. Other members of the genus Betacoronavirus that infect humans include SARS-CoV-1, Middle East respiratory syndrome coronavirus (MERS-CoV) and two other viruses, HCoV -OC43 and HCoV-HKU1. SARS-CoV-1 emerged in 2002 and MERS-COV emerged in 2012 with limited transmission from human to human (Tang et al., 2015 , Song et al., 2019 . Both viruses caused severe illness with fatality rate of approximately 9 and 36%, respectively. HCoV-OC43 and HCoV-HKU1 are considered the second most common cause of the common cold and their infection may cause respiratory tract illness (Al-Khannaq et al., 2016; Cui et al., 2019; . The genomes of these viruses are single-stranded positive-sense RNAs whose size varies from 26,000 to 32,000 nucleotides (nt) with six to eleven open reading frames (ORFs) (Song et al., 2019) , which encode accessory proteins, major structural proteins and non-structural proteins (NSPS) (Cui et al.,2019) . The RNA genome of SARS-CoV-2 has 29, 811 nt that contain 14 ORFs encoding 27 proteins . The 3'-terminus of the genome contains eight accessory proteins and four structural proteins. The structural proteins are: small envelope protein (E), matrix protein (M), nucleocapsid protein (N), which binds to the viral RNA genome and the spike protein (S) located at the surface of the virus envelope. The S protein binds to a receptor termed angiotensin converting enzyme 2 (ACE2) to enter into host cells and determine host tropism (Li, 2016; Zhu, 2018) . There are 16 NSPs located at the 5'-terminus of the genome. The pp1ab and pp1a proteins are encoded by the orf1ab and orf1a genes, respectively. Together, they comprise 15 NSPs including from NSP1 to NSP10 and NSP12 to NSP16. Comparative analysis of genomic data demonstrated that SARS-CoV-2 evolved naturally and it is not man-made construct biological agent (Anderson et al., 2020) . In a phylogenetic network analysis of SARS-CoV-2 were found two central variants observed and termed as A and B lineages. A.1 lineage was the Primary outbreak in Washington State, USA and B.1 with B.2 lineage were comprised the large Italian outbreak (Rambaut, A et al., 2020) . Previous studies have shown the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses. It was found that the nucleotide divergence at synonymous sites between SARS-CoV-2 and other coronaviruses such as SARSr-CoV and RaTG13 was much higher than previously expected (Tang et al., 2020) . Selective constraints during the evolution of SARS-CoV-2 and related coronaviruses indicate strong negative selection on the nonsynonymous sites. Therefore, although these coronaviruses coding sequences were generally under very strong negative selection, positive selection was also responsible for the evolutionary shaping of the protein sequences (Angeletti, et al., 2020; Tang et al., 2020) . The genes that are involved in functional innovation often show the footprints of positive selection through high ratios of nonsynonymous to synonymous nucleotide substitutions (Yang and Bielawski 2000; Nielsen et al., 2005; Philip et al., 2012) . Hence, it is essential to perform an in-depth comprehensive positive selection analysis on the functional sites. In this study, we focused on positive selection analysis of SARS -CoV-2 structural genes among Human β-coronavirus (HBC) species and within 36 genomes of SARS-CoV-2, on both coding and non-coding regions. This work provides insights into the key role of positive selection on the recent pathogenicity of the virus and its transmission pattern among humans as well as into E, S and ORF1ab protein, which can identify potential drug targets or vaccine strategy. All coding sequences (CDS) and the non-coding regions (3'-UTR and 5'-UTR) were downloaded from the NCBI virus portal (https://www.ncbi.nlm.nih.gov/genome/virus). Information about genes and accession numbers of the 36 SARS-CoV-2 genomes used in this study can be found in supplementary Table S1 . The reference sequences of the coding regions of five HBC species were retrieved from the NCBI RefSeq database (OLeary et al., 2015) , each species represented by three strains. For each viral genome, the information of the noncoding regions (3'-UTR and 5'-UTR) was extracted from 36 SARS-CoV2 genomes, 50 SARS CoV genomes, 35 HCoV-HKUI genomes, 50 HCoV-OC43 genomes and 50 MERS CoV genomes. Accession numbers of these genomes are listed in supplementary Table S1, Supplementary Material. Estimation of the positively selected sites was implemented through multiple sequence alignments (MSA) by using SEAVIEW v4 (Gouy et al., 2009) . The coding sequences were translated to amino acids, aligned using MUSCLE (Edgar et al., 2004) and further back-translated to nucleotides, then the MSAs were filtered with GBLOCKS (Castresana et al., 2000) using the relaxed parameters (Talavera et al., 2007) to avoid misaligned positions and eliminate false-positive hits. JMODELTEST v2.1.10 (Darriba et al., 2012) was used for maximum likelihood ratio test to select the best-fit model and then Akaike information criterion correction (AICc) was used for model ranking. Construction of phylogenetic gene-based trees were built using PhyML v3.0 ) under the best-fit model (Tables S2 and S3 ). The data set contained six refined MSAs between HBC CDS (E gene, M gene, N gene ORF1a, ORF1ab and S gene) with an average length 40,296 bps and 10 refined MSAs within SARS CoV2 strains (E gene, M gene, N gene, ORF1ab, S gene, ORF3a, ORF6, ORF7, ORF8 and ORF10) with an average length 2,915 bps. The ratio between nonsynonymous (dN) and synonymous (dS) substitution, known as omega (ω) were estimated using the maximum-likelihood method CODEML in PAML v4.6 (Yang et al. 2007 ). Genes were compared to a neutrally evolving model, where ω is equal to one. This value can be considered as evidence of positive selection when the value of ω > 1, or as purifying selection when the value of ω < 1. Estimation of dN /dS ratio for each amino acid site was obtained using three different models (7, 8 and 8a). Equilibrium codon frequencies of the model were used as free parameters (CodonFreq = 2). The Model 7 (M7, beta) is a null model contains the sites-classes which are lower or equal to the neutrality and Model 8 (M8, beta + ω > 1) as an alternative model was used to observe differences over sites through a beta distribution, whereas M8 only contains the sites-classes that above neutrality. As model 8 allows positive selection along the alignment, we compared model 8 pairwise against a stricter model which is M7, using likelihood ratio tests (LRT). Each calculation of the LRT corresponds to 2×[lnL (alternative model)−lnL (null model)] (or LRT = 2×(ΔlnL)). We performed a comparison between models M8 and M8a to identify deviations from neutrality, focusing on testing whether sites belonging to a site-class with a d N /d S > 1 are evolving differently from near neutrality (d N /d S ≈ 1). The LRTs obtained from each pairwise comparison between model M7 versus M8 and M8 versus M8a were used to extract the P-value from the chi-square distribution with two degrees of freedom in the case of M7 versus M8 and one degree of freedom in the case of M8 versus M8a, the P-value was adjusted using FDR correction method (Benjamini and Hochberg, 1995) , genes were considered to be under positive selection in case of having a significant difference in both model comparisons with adjusted p-value lower than 0.05. Multiple sequence alignment (MSA) were built using SEAVIEW v4 (Gouy et al 2009) . Both 3'-UTR and 5'-UTR alignments were built using MUSCLE (Edgar et al., 2004) . JMODELTEST v2.1.10 (Darriba et al., 2012) was used for maximum likelihood ratio test to select the best-fit model and then we used Akaike information criterion correction (AICc) for model ranking. Construction of phylogenetic gene-based trees were built using PhyML v3.0 ) under the best-fit model. The data set contained ten refined MSAs of the five HBC 3'-UTR and 5'-UTR (five 3'-UTR and five 5'-UTR). PhyloP wig-scores analysis was performed using PHAST (Hubisz et al., 2010) to measure the evolutionary conservation and acceleration at individual alignment sites (positive scores for conservation sites and the negative scores for acceleration sites). The Mann-Whitney U test P values and the empirical cumulative function (ECDF) of 5'-UTR and 3'-UTR PhyloP wig-scores were performed using R studio vR1.1.2.5. By obtaining multiple random samples of 3'-UTR and 5'-UTR wig-scores value for each analyzed nucleated position, we performed a validated comparison between the five HBC, the results of the comparisons between five viruses 3'-UTR and 5'-UTR were tested using the Mann-Whitney U values. The genomic evidence reveals a signature of strong positive selection sites for E, S and ORF1ab genes among HBC species. When both MSAs and gene-based trees were used as input for CODEML analysis, M7 versus M8 comparison was significantly more adjusted in five genes, although while using M8 versus M8a (the strict model comparison), we observed four genes which showed that the site class was significantly above neutrality. E gene, S gene, ORF1a and ORF1ab genes LRT tests comparisons have significant differences, M7 versus M8 chi-square showed statistically significant adjusted FDR correction for multiple comparisons P-values of P < 0.01 (E gene), P < 3.364e-07 (S gene), P < 1.182e-11 (ORF1a) and P < 2.595e-17 (ORF1ab). The chi-square adjusted P-value for M8 versus M8a showed values of P < 5.633e-15 (E gene), P < 0.004 (S gene), P < 0.00 (ORF1a) and P < 0.039 (ORF1ab) ( Table 1) . According to the Bayes Empirical Bayes (BEB) analysis only three genes have posterior probability above 80% and posterior probability above 90 % in the Naïve Empirical Bayes (NEB) analysis, which are E, S and ORF1ab. For the E gene, we found two codons under positive selection with their posterior probability equal or over 95% for each codon, residues position and their posterior amino acids probability (Table 1) . Regarding the S gene, we found three codons under positive selection and four codons in the ORF1ab under positive selection (residues position and their amino acids substitutions (Table 1) . By mapping E protein against the domain database using the NCBI domain blast (Marchler et al., 2014), we found both residues (66 Asparagine and 68 Serine) are in the SARS-CoV-2_E domain with E-value 2.02e-24 ( Figure 1 ). The SARS-CoV-2_E domain is involved in the virus morphogenesis and assembly; it acts as a viroporin and induce self-assembly in the host membranes, which plays a central role in ion transport with poor selectivity through forming homopentameric protein-lipid pores. The domains of the spike protein were identified using the protein families database (Pfam), we found that all of the three positively selected sites (Pro 26, Asn 148 and Met 153) were located in the S1 N-terminal domain with E-value 5E.-71 ( Figure 2 ). However, we did not find significant differences between M7 vs M8 and M8 vs M8a models (Table 2 ) regarding the coding sequences within the 36 SARS-CoV-2 strain present in this study, but the non-coding sequence of SARS-CoV-2 showed a high evolutionary rate. The ECDF comparison (Figure 3 ) between the five HBC showed an acceleration in the 3'-UTR and 5'-UTR in SARS -CoV-2 with significant differences (Mann-Whitney U test, P < 0.01) at the lower rank (higher acceleration, P < 0.01). As the non-coding part (3'-UTR and 5'-UTR) is accumulative for the mutations, we can consider the high acceleration of SARS-CoV-2 as evidence of a higher evolutionary rate (Machado et al., 2016) (the pairwise Mann-Whitney U test for both 3'-UTR and 5'-UTR is presented in Tables S4 and S5). Previous studies confirmed that coronavirus proteins vary in size, and this can be described as pleomorphic. Interestingly, even in the conserved set of components between the homologous structural proteins, less than 30% in amino acid identity is observed. Hence, we performed a detailed positive-selection analysis for functional sites of six genes among five HBC and ten genes within 36 SARS -CoV-2 strains to understand the effect of natural selection in the powerful infectivity of SARS-CoV-2. Our findings reveal signatures of strong positive selection of three genes: E gene, S gene and ORF1ab between HBC. E gene translated into a small pentameric structure protein that delimits an ion conductive pore, which plays a crucial role in virus-host interaction (Torres et al., 2006 ) (Parthasarathy et al., 2008 . In the previous studies, recombinant CoVs lacking the E protein result in significantly decrease on the virus titres, reduced maturation, or yield propagation incompetent progeny (Dewald Schoeman et al., 2019) . The E protein of SARS-COV2 is highly similar to the SARS-CoV E protein, which has one putative transmembrane α-helical hydrophobic domain, 20-30 amino acids long, flanked by Nterminus (short amino acids sequence <10 amino acids) and a longer C-terminus tail, both more hydrophilic (Torres et al., 2006) . According to NCBI domain blast, both sites 66 Asn and 68 Ser of the E protein are within an alpha-helical transmembrane domain C-terminal part. We found that site 66 substitutions from Ser, Val and Lys into Asn in SARS-CoV-2 and site 68 substitutions from Glu and Pro into Ser in SARS-CoV-2 ( Supplementary Figure 1 S1 ), which either increase or maintain the polarity of the C-terminal part of the domain. Such substitutions into highly hydrophilic amino acids inside the C-terminal may enhance the stability of the E protein, which increases SARS COV2 production, maturation and pathogenicity. The SARS-COV-2 spike glycoprotein (S) is the largest structural protein of the virus (Pillay, 2020) , it plays a vital role in the viral infection through its binding with the human ACE2 receptor to initiate the viral entry (Lan et al., 2020) , spike protein binding affinity to ACE2 is correlated with the replication rate in different species and also with viral contagiousness and severity Li et al., 2005; Wan et al, 2020) . The spike protein is composed of two main subunits; S1 which is responsible for ACE2 receptor binding via its receptor binding domain and S2 which mediates viral and cellular membranes fusion (Walls et al. 2020) . In our study we found three positively selected sites in the extracellular N-terminal domain (NTD) of the S1 subunit, which are Pro 26, Asn 148 and Met 153. The pro 26 is located in a loop structure of S1 NTD (Figure 2) , this site lies within P25PA sequon which corresponds to N29YT sequon is SARS-COV, in SARS-COV this sequon; N29YT, was found to be glycosylated, however, in SARS-COV-2 it is no longer glycosylated (Walls et al., 2020) , this could suggest a probable differentiating mechanism between SARS-COV-2 and SARS-COV. The asparagine 148 resides at the β turns of s1 subunit surface, Asn is more favorable on the protein surfaces due to its polarity (Kyte and Doolittle, 1982) in comparison with proline in both SARS and MERS (Supplementary Figure 2 S2, Figure 3 S3 ). The last site Met 153 lies on the β sheets of the S1 subunit, the methionine is preferable inside the β sheets structure (Bhattacharjee and Biswas, 2010). Moreover, it can act as a ligand for metal ions (Betts and Russell, 2007) . The ORF1ab represents two-thirds of the viral genome that encodes the polyprotein 1ab (pp1ab) that is cleaved into 16 non-structural proteins (NSPs), which are involved in viral transcription and replication (Brian and Baric, 2005) . Our analysis revealed that three of these (NSPs) contain strong positively selected sites: NSP1, NSP3 and NSP16. NSP1 is one of the first proteins to be expressed after the viral infection to inhibit the host translation machinery through multiple steps of binding with 40S and 80S ribosomal complexes, blocking the mRNAs entry location and suppressing the host antiviral mechanisms, which rely on the expression of host immune factors such as interferons (Lokugamage et al., 2012; Thoms et al., 2020) . Moreover, the NSP1-40S ribosomal complex initiate endonucleolytic activity to degrade the host mRNAs, however, the viral genes continue to be efficiently translated due to NSP1 and the viral genes 5′ untranslated region (UTR) interaction (Huang et al., 2011; Schubert et al., 2020) . NSP1 is composed of N-terminal domain followed by a flexible unstructured linker, and the C-terminal domain which binds with the 40S mRNA entry site, due to the linker flexibility, the N-terminal domain could sample a space of ~ 60 Å from its point of attachment. However, the linker structure is still unresolved (Schubert et al., 2020; Thoms et al., 2020) . The Ala 138 residue substitution is located in the flexible linker of the NSP1, Ala is more flexible than other COVs amino acids in the same position (refer to the alignment figure Supplementary s1) (Huang and Nau, 2003; Koča et al., 1994) , thus, we can interpret that this substitution may increase the flexibility of the linker. Nsp3 is the largest non-structural protein in the genome of coronavirus, containing multiple functional domains that are required for coronavirus replication and blocking host innate immune response (Lei et al., 2018) . Here we found two sites under positive selection within different two domains: Met 196 and Val 1229 (Figure 4) in the Glu-rich acidic region and beta coronavirusspecific marker (βSM) domain, respectively . Glu-rich acidic region comprises more than 35% Glu and 10% Asp residues, it is also known as the hypervariable region (HVR) due to its non-conserved amino-acid sequence (Neuman, 2016) , till now the function of this region is still unknown. In general, Glu/Asp rich proteins mainly involved DNA/ RNA mimicry, protein−protein interactions and metal-ion binding (Chou and Wang, 2015) . The Met 196 is an amphipathic amino acid that substituted into Lue and Val which are non-polar amino acids in HCoV-HKU1 and HCoV-OC43, respectively, and also substituted into Glu which is polar amino acids in SARS and MERS. Glu, Lue and Val are more abundant in the Glu-rich acidic region in comparison with Met (Chou and Wang, 2015) . However, the ability of Met to donate a methyl group (PubChem, 2020) could suggest a relevancy of this position. The second substitution Val 1229 lies within betacoronavirus-specific marker domain (βSM), an intrinsically disordered region with low conservation (Lei et al., 2018; . The role of the βSM in viral pathogenesis is still unknown. The gene that codes the SARS-CoV domain βSM could not be expressed in E-coli, suggesting that βSM is a non-enzymatic domain (Neuman et al., 2008) . The Val 1229 is located in in the βSM alpha helix structure of NSP3 I-TASSER model, in spite of the Val weakly destabilizing the alpha helix structure it was found to be more favored than Gly and Thr in HCoV-OC43 and SARS, respectively, but less favored than Glu in HCoV-HKU1 (Supplementary Figure 4 S4 ) (Nick Pace and Martin Scholtz, 1998) . NSP16 plays a critical role in viral transcription and replication; during RNA synthesis. NSP16 adds a cap structure to the newly synthesized viral mRNAs, ensuring their efficient translation (Bouvet et al., 2010) . NSP16 negatively regulates innate immunity to promote viral proliferation through interferon inhibition (Shi et al., 2019) . In all SARS CoV, MERS and HCoV-OC43, Arg 216 residue replaced Lys in the same position of NSP16 ( Figure 5, Supplementary Figure 5 S5 ). Both amino acids have very similar characteristics. However, arginine can bind via multiple hydrogen bonds with the negatively charged groups on phosphates structure such as in RNA more than lysine does. Recent studies that have analyzed SARS-CoV-2 mutations, discovered that among all mutations, C to T exchanges existed in preponderance of more than 50% and revealed that hypermutations of C > T are most likely resulting from the APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like) deamination in RNA editing (Di Giorgio, S.; . This finding is similar to the exchange preferences in our study results as we found five positivly selected sites having C to T mutations (namley position 68 SER on E gene, position 148 ASN on S gene, postion 138 ALA (A) on Nsp1, position 1229 VAL (V) on Nsp3 , and poisiton 216 ARG (R) on Nsp16 protein). This large proportion of C > T mutations in a host APOBEC-like context, provides evidence for a potent hostdriven antiviral editing mechanism against the pathogencity of SARS-CoV-2 to improve cellular defense functions (Wang R et al., 2020 ; Simmonds, P et al., 2020) . We did not find evidence of positive selection within SARS COV2 genomes with our method, this result support another recent study findings, which was evaluating SARS COV2 recombination, they did not find genes under positive selection within SARS COV2, but they found patterns of purifying selection pressure in some parts of the genome, including the E and M genes, as well as the partial ORF1a and ORF1b genes, which plays an important role in cross-species transmission (Xiaojun et al., 2020) . In addition, to further evidence of positive selection between HBC in our results, we evaluated noncoding parts (3'-UTR and 5'-UTR) among five HBC through the PhyloP score, showing a higher acceleration rate in both (3'-UTR and 5'-UTR) of SARS-CoV-2 providing further evidence of a consistent higher evolutionary rate concordant with the presence of positive selection in coding regions (Tables S4 and S5) . Our results suggest that S, E and ORF1ab genes are under strong signatures of positive selection among human β-coronaviruses, affecting codons that reside in functionally important protein domains. Overall, most of the substitutions increase protein structure stability. The positively selected sites in these proteins could justify some clinical features of SARS-CoV-2 compared with other human β-coronaviruses. Sites undergoing an amino acid change are insightful to highlight relevant functionally important proteins of the SARS-CoV-2 that are essential for the mechanism of viral replication, transcription and evading the host's antiviral immunity. While the current literature contains a huge flow of data about SARS-CoV-2 mutagenesis and variants, limited insights were retrieved regarding the impact of those mutations on biological processes and viral pathogenicity. Here we shed light on the role of these proteins and their associated mutations on the viral pathogenicity and host biological processes. Furthermore, our findings could reveal valuable information useful for potential drug and vaccines development. M The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Benjamini, Y., and Hochberg, Y. (1995 Cui, J., Li, F., and Shi, Z.-L. (2018) . Origin and evolution of pathogenic coronaviruses. Nature Reviews Microbiology 17, 181-192. doi:10.1038/s41579-018-0118-9. Darriba, D., Taboada, G. L., Doallo, R., and Posada, D. (2012) . jModelTest 2: more models, new heuristics and parallel computing. Nature Methods 9, 772-772. doi:10.1038 /nmeth.2109 Martignano, F.; Torcia, M.G.; Mattiuz, G.; Conticello, S.G., et al., (2020 SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building Isolation and Characterization of Viruses Related to the SARS Coronavirus from Animals in Southern China Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China Estimating Maximum Likelihood Phylogenies with PhyML Identification of Severe Acute Respiratory Syndrome Coronavirus Replicase Products and Characterization of Papain-Like Protease Activity SARS Coronavirus nsp1 Protein Induces Template-Dependent Endonucleolytic Cleavage of mRNAs: Viral mRNAs Are Resistant to nsp1-Induced RNA Cleavage A Conformational Flexibility Scale for Amino Acids in Peptides PHAST and RPHAST: phylogenetic analysis with space/time models Positive-inside rule" is complemented by the "negative inside depletion/outside enrichment rule SARS Coronavirus Unique Domain: Three-Domain Molecular Architecture in Solution and RNA Binding Computer study of conformational flexibility of 20 common amino acids A G-quadruplex-binding macrodomain within the "SARS-unique domain" is essential for the activity of the SARScoronavirus replication-transcription complex A simple method for displaying the hydropathic character of a protein Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Nsp3 of coronaviruses: Structures and functions of a large multi-domain protein Structure, Function, and Evolution of Coronavirus Spike Proteins Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2 Bat origin of a new human coronavirus: there and back again Severe Acute Respiratory Syndrome Coronavirus Protein nsp1 Is a Novel Eukaryotic Translation Inhibitor That Represses Multiple Steps of Translation Initiation Positive Selection Linked with Generation of Novel Mammalian Dentition Patterns CDD: NCBIs conserved domain database PubChem Compound Summary for CID 6137 Bioinformatics and functional analyses of coronavirus nonstructural proteins involved in the formation of replicative organelles Proteomics Analysis Unravels the Functional Repertoire of Coronavirus Nonstructural Protein 3 A Helix Propensity Scale Based on Experimental Studies of Peptides and Proteins Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation COVID-19 Coronavirus Vaccine Design Using Reverse Vaccinology and Machine Learning COVID-19 Coronavirus Vaccine Design Using Reverse Vaccinology and Machine Learning Fish Lateral Line Innovation: Insights into the Evolutionary Genomic Dynamics of a Unique Mechanosensory Organ Gene of the month: the 2019-nCoV/SARS-CoV-2 novel coronavirus spike protein Characterization of the coronavirus mouse hepatitis virus strain A59 small membrane protein E A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology. bioRxiv preprint Coronavirus envelope protein: current knowledge SARS-CoV-2 Nsp1 binds ribosomal mRNA channel to inhibit translation Susceptibility of ferrets, cats, dogs, and other domesticated animals to SARS-coronavirus 2 PEDV nsp16 negatively regulates innate immunity to promote viral proliferation From SARS to MERS, Thrusting Coronaviruses into the Spotlight Rampant C > U hypermutation in the genomes of SARS-CoV-2 and other coronaviruses: Causes and consequences for their short-and long-term evolutionary trajectories Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 (2020) Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2 Model of a putative pore: the pentameric alpha-helical bundle of SARS coronavirus E protein in lipid bilayers Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus A novel coronavirus outbreak of global health concern Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating in China Host Immune Response Driving SARS-CoV-2 Evolution Emergence of SARS-CoV-2 through recombination and strong purifying selection PAML 4: Phylogenetic Analysis by Maximum Likelihood Predicting the receptor-binding domain usage of the coronavirus based on kmer frequency on spike protein The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.