key: cord-0704285-xdq0z3o1 authors: Arévalo, Santiago Justo; Sifuentes, Daniela Zapata; Robles, César Huallpa; Bianchi, Gianfranco Landa; Chávez, Adriana Castillo; Casas, Romina Garavito-Salini; Chavarría, Roberto Pineda; Uceda-Campos, Guillermo title: Analysis of the Dynamics and Distribution of SARS-CoV-2 Mutations and its Possible Structural and Functional Implications date: 2020-11-14 journal: bioRxiv DOI: 10.1101/2020.11.13.381228 sha: de9c55a2c231648a3bdf3e34ccae6b2d6fea9814 doc_id: 704285 cord_uid: xdq0z3o1 After eight months of the pandemic declaration, COVID-19 has not been globally controlled. Several efforts to control SARS-CoV-2 dissemination are still running including vaccines and drug treatments. The effectiveness of these procedures depends, in part, that the regions to which these treatments are directed do not vary considerably. Although, it is known that the mutation rate of SARS-CoV-2 is relatively low it is necessary to monitor the adaptation and evolution of the virus in the different stages of the pandemic. Thus, identification, analysis of the dynamics, and possible functional and structural implication of mutations are relevant. Here, we first estimate the number of COVID-19 cases with a virus with a specific mutation and then calculate its global relative frequency (NRFp). Using this approach in a dataset of 100 924 genomes from GISAID, we identified 41 mutations to be present in viruses in an estimated number of 750 000 global COVID-19 cases (0.03 NRFp). We classified these mutations into three groups: high-frequent, low-frequent non-synonymous, and low-frequent synonymous. Analysis of the dynamics of these mutations by month and continent showed that high-frequent mutations appeared early in the pandemic, all are present in all continents and some of them are almost fixed in the global population. On the other hand, low-frequent mutations (non-synonymous and synonymous) appear late in the pandemic and seems to be at least partially continent-specific. This could be due to that high-frequent mutation appeared early when lockdown policies had not yet been applied and low-frequent mutations appeared after lockdown policies. Thus, preventing global dissemination of them. Finally, we present a brief structural and functional review of the analyzed ORFs and the possible implications of the 25 identified non-synonymous mutations. ABSTRACT: 13 After eight months of the pandemic declaration, COVID-19 has not been globally controlled. 14 Several efforts to control SARS-CoV-2 dissemination are still running including vaccines and drug 15 treatments. The effectiveness of these procedures depends, in part, that the regions to which 16 these treatments are directed do not vary considerably. Although, it is known that the mutation 17 rate of SARS-CoV-2 is relatively low it is necessary to monitor the adaptation and evolution of 18 the virus in the different stages of the pandemic. Thus, identification, analysis of the dynamics, 19 and possible functional and structural implication of mutations are relevant. Here, we first 20 estimate the number of COVID-19 cases with a virus with a specific mutation and then calculate 21 its global relative frequency (NRFp). Using this approach in a dataset of 100 924 genomes from 22 GISAID, we identified 41 mutations to be present in viruses in an estimated number of 750 000 23 global COVID-19 cases (0.03 NRFp). We classified these mutations into three groups: high-24 frequent, low-frequent non-synonymous, and low-frequent synonymous. Analysis of the 25 dynamics of these mutations by month and continent showed that high-frequent mutations 26 appeared early in the pandemic, all are present in all continents and some of them are almost 27 fixed in the global population. On the other hand, low-frequent mutations (non-synonymous 28 and synonymous) appear late in the pandemic and seems to be at least partially continent-29 specific. This could be due to that high-frequent mutation appeared early when lockdown 30 policies had not yet been applied and low-frequent mutations appeared after lockdown policies. 31 Thus, preventing global dissemination of them. Finally, we present a brief structural and 32 functional review of the analyzed ORFs and the possible implications of the 25 identified non-33 synonymous mutations. 34 35 COVID-19 cases reached approximately 52 million and 1.3 million deaths (as of November 12th) 36 (WHO. 2020), several countries have reactivated their lockdown policies to control a second- 37 wave of infections (Looi. 2020) showing that the COVID-19 pandemic is not yet controlled. 38 Efforts to develop and license treatments are still running with few of them in the final stages 39 (Krammer. 2020) . 40 In this context, is still important to track the adaptation and evolution of SARS-CoV-2 all around 41 the world. Efforts to sequence genomes are different in each region causing regions with few 42 sequenced genomes to be left out of global analyses. For this reason, we use an approach that 43 takes into account the difference in the number of sequenced genomes in each continent and 44 each month and normalized them by the number of COVID-19 cases resulting in a less biased 45 America, and why T85I_nsp2 and Q57H_orf3a mutations are in lower frequencies in the other 88 continents are still open questions. 89 Mutation L37F_nsp6 showed a global peak in February (Fig 1. A) that corresponds with a peak in 90 all the continents ( Fig. 1.B-G) . However, since February the frequency of this mutation was 91 below 0.2 in all months and continents except in Asia-April ( Fig. 1 .B-G). Furthermore; in Africa, 92 Europe, North America, Oceania, and South America this mutation seems to be near to 93 extinction on the last analyzed months ( Fig. 1.B , D-G). 94 Respect to the LF mutations, we identified 15 LF_nS mutations (T428I_nsp3, A994D_nsp3, 96 S1285F_nsp3, G15S_nsp5, L89F_nsp5, N129D_nsp14, R216C_nsp16, A222V_S, L46F_orf3a, 97 G172V_orf3a, S24L_orf8, S194L_N, P199L_N, A220V_N, V30L_orf10) that reach a global NRFp 98 greater than 0.1 in at least one month (Fig 1.H) . 99 Analyses by continents showed that most of those 15 LF_nS have a continent-specific 100 distribution; for example, in Asia, A994D_nsp3, S1285F_nsp3, L46F_orf3a are maintained with 101 ~0.05 relative frequency from May to July and showed a high peak of ~0.3 in August (Fig 1.J) ; in 102 other continents, those mutations have frequencies near to 0 (Fig 1. I, K-N). In Europe, three 103 LF_nS seems to be specific for this continent, A222V_S, A220V_N, and V30L_orf10 that showed 104 a rapid increasing relative frequency in August (~0.18) and September (~0.42) (Fig 1.K) . Those 105 mutations appear in genomes that do not present HF nucleocapsid mutations (G28881A_N, 106 G28882A_N, G204R_N) explaining the decrease in the relative frequency of these mutations in 107 August and September (Fig 1.D) . Similarly, just in North America, L89F_nsp5, N129D_nsp14, 108 R216C_nsp16, G172V_orf3a, S24L_orf8, P199L_N showed increasing frequencies since August 109 (Fig 1.L) . One of them (S24L_orf8) with relatively high frequencies already reported in previous 110 months (April, May, and June) (Fig 1.L) . 111 Interestingly, T428I_nsp3 and G15S_nsp5 appear with frequencies up to ~0.12 in Europe ( Fig 112 1 .K). In Africa and South America, a constant increase of relative frequency (up to ~0.14 in Africa 113 and ~0.33 in South America) is reported from June to August in Africa and from March to June 114 in South America (July is the last month analyzed in South America and with 89 analyzed 115 genomes the frequencies of those mutations are similar to those in June) (Fig 1.I, N) . Mutation 116 S194L_N showed a variable relative frequency in Asia, Europe, North America, and Oceania ( Fig 117 1.J-M). In Africa and South America its frequency is near to 0 in all the months analyzed (Fig 1.I, 118 N). 119 Thirteen LF but synonymous mutations (LF_S) were identified in our analysis and 6 of them 121 reaching 0.1 global NRFp in at least one month. Those mutations follow a similar pattern that 122 those described by the LF_nS mutations. Thus, C313T_nsp1(S) and G4354A_nsp3(S) accompany 123 the three LF_nS mutations specific to Asia (Fig 1.J, Q) . In the same way, C6286T_nsp3(S) and 124 G21255C_nsp16(S) shows the same pattern as LF_nS mutations A222V_S, A220V_N, and 125 V30L_orf10 in Europe ( Fig. 1 The nsp3 protein is the longest encoded in coronaviruses, to which different functions are 164 attributed due to its multiple domains. Nsp3 is best known for its protease activity and for being 165 essential in the replication/transcription complex (RTC). Structurally, Nsp3 is a 1945 amino acid 166 protein; there is no consensus on the number of domains of nsp3 in SARS-CoV and SARS-Cov-2. 167 Lei 2018 mentions 14 domains in SARS-CoV (Ubl-1, Ac, X, SUD, Ubl-2, PL2pro, NAB, βSM, TM1, 168 3Ecto, TM2, AH1, Y1, and CoV-Y). In nsp3 we identified three non-synonymous relevant 169 mutations. 170 The SARS Unique Domain (SUD) is composed of 3 subdomains: Mac2, Mac3, and DPUP. The 171 function of these is binding to RNA to form G-quadruplexes (Lei 2018) . Besides, Mac3 has been 172 described as an essential region for the RdRp complex (Kusov 2015) . According to Tan 2009 , the 173 positive interface of Mac2-Mac3 is the possible region for nucleic acid binding (Fig. 2B ). It is 174 unlikely that the LF_nS mutation T428I (NRFp 0.056) varies the electrostatic potential in this 175 region because it is located far from this interface ( Fig. 2A) . In the T428 variant, there are no 176 hydrogen bonds between this residue and the nearby residues. Instead, we observed several 177 hydrophobic atoms from residues L431, T423, Y536, and E427 (Fig. 2C ). An in silico model of 178 T428I showed that isoleucine could accommodate at this site (Fig. 2D) . 179 The PL2pro domain processes the amino-terminal of polyprotein 1a/ab to generate 2 or 3 180 products: nsp1, nsp2, and nsp3, through to its protease activity mediated by the catalytic triad 181 Cys-His-Asp (Barretto 2005) (Fig. S4) . Also, Pl2pro cleave Interferon Stimulated Gene 15 (ISG-15) 182 causing the loss of ISGylation from interferon responsive factor 3 (IRF3), an important 183 component in the Interferon I pathway (Shin 2020) . Structurally, it is made up of 3 subdomains: 184 thumb, palm, and fingers. It also has a Ubl-2 domain linked by an alpha helix (Clasman 2017) 185 ( Fig. S4 ) that determines the substrate specificity for ISG (Shin 2020) . The subdomain 186 corresponding to fingers contains a zinc-binding region (Fig. S4 ) and has been described as 187 essential for the maintenance of the structure of Pl2pro (Baez-Santos 2014). The A994D LF_nS 188 mutation (NRFp 0.044) is located on a beta-sheet of the palm subdomain, far away from the 189 catalytic site (Fig. S4 ). This residue is exposed to the surface, where D could favor solvent 190 interactions. 191 The last mutation observed in the nsp3 protein corresponds to S1285F, located in the βSM 192 domain, for which, to our knowledge, there is no structural or functional information. for dimerization since it forms a hydrogen bond with G11 of the opposite monomer ( Fig 3B) . 214 Due to the S15 orientation, it would not favor the formation of inter-or intra-monomer 215 hydrogen bonds. Instead, it could favor solubility through solvent interactions. L89F mutation is 216 located in a beta-sheet of DII with the side chain points towards the center of the protein. The 217 bigger hydrophobic chain of phenylalanine could favor the packaging between the beta-sheets 218 of this domain ( Fig 3C) . 219 Nsp6 220 The This mutation is near to the interface with nsp8 ( Fig. 4C ). Our structural in silico analysis shows 231 that this mutation could increase the hydrophobic packaging in the region (Fig. 4D ). 232 Nsp8 233 Nsp8 is a 198 amino acid protein showed to have a key role in forming cytoplasmic complexes 234 (nsp7/nsp8/nsp12) for viral RNA synthesis (Kumar et al. 2007 SARS-CoV-2 Nsp9 is a 113 amino acid RNA-binding homodimeric protein (Littler, 2020) . The 242 monomer is composed of seven β-strands (β1-β7), an N-terminal β7 extension, and a C-terminal 243 α-helix with a conserved "GxxxG" motif ( SARS-CoV-2 nsp9 homotetramer structure with an interface composed mainly of β5 and three 247 connection loops. It is proposed that the nsp9 function is critical to the replication and 248 transcription machinery since mutations in the SARS-CoV Nsp9 gene prevent viral replication 249 (Miknis, 2009 ). Nearly 97% sequence identity is found between SARS-CoV and SARS-CoV-2 Nsp9 250 protomers, suggesting a related function (Littler, 2020). We did not find relevant mutations in 251 nsp9. 252 The SARS-CoV-2 Nsp10 is divided into two subdomains: the alpha helix subdomain (α1-α4 and 254 α6) and a beta subdomain (two antiparallel sheets, a short alpha-helix, and coiled-coil regions) 255 ( Fig. S6A ) (Viswanathan, 2020) . Nsp10 has two zinc-binding sites: i) C74, C77, C90, and H83 256 (between α2 and α3) (Fig. S6B) , ii) C117, C120, C128, and C130; both with stabilizing effects 257 (Krafcikova, 2020) (Fig. S6C ). Nsp10 interacts with the Nsp14 and Nsp16 methyltransferases, 258 activating them (Krafcikova 2020). The SARS-CoV-2 Nsp10 helices α2, α3, α4, and a coiled-coil 259 region between α1 and β1 (N40 to T49) interact with Nsp16 through hydrophobic interactions 260 or water molecules. Two important Nsp10 residues are immersed into hydrophobic pockets 261 from Nsp16: V42 (pocket 1: M41, V44, V78, A79 and P8O from Nsp16) and L45 (pocket 2: P37, 262 I40, V44, T48, L244 and M247 from Nsp16) (Fig. S7A ) (Krafcikova, 2020) . In SARS-CoV, the Nsp10-263 Nsp14 is mediated by the N-terminal loop, the α1 helix (1-20), the loop following α2, and 264 residues around zinc finger 1 (Fig. S7B ) (Ma, 2015) . We did not find relevant mutations in nsp10. 265 Nsp12 266 Nsp12, in complex with nsp7-nsp8, forms the RdRp complex (Pachetti M et al. 2020 interface between the polymerase domain and interface domain (Fig. 5D ). A97 in the NiRAN 279 domain falls in a loop exposed to the solvent (Fig. 5E ). In the A97V variant, additional methyl 280 groups of valine gain interactions with V96 increasing hydrophobic contacts (Fig. 5F ). 281 end of growing RNAs from RdRp RNA synthesis (Ferron, 2018 ). Additionally, nsp14 shows 294 (guanine-N7) methyltransferase activity of the cap using S-adenosyl-L-methionine (SAM) as the 295 methyl donor (Robson, 2020) . Our analysis showed that the region comprised residues 420 to 296 503 within the N7-MTase domain seems to be difficult to sequence as shown by the relatively 297 high frequency of Ns in this region (Fig. S9) . Also, we identified the LF_nS N129D mutation in the 298 ExoN domain (NRFp: 0.03). In the SARS-CoV (PDB ID: 5C8U) N129 side chain is exposed to the 299 protein surface and far from interaction surfaces with nsp10 indicating little probability to 300 impact the structure or function of nsp14. 301 regions: a nucleoside pocket and a methionine pocket (Fig. 6A, B) . The nucleoside forms 318 hydrogen bonds with D99 and D114 side chains, L100, C115, and Y132 main chains, and through 319 water molecules with N101 ( Fig. 6A) (Fig. 6C) , however in other structure (PDB: 6W4H) 324 these two residues are further away from each other (4.930 A) (Fig. 6C) , indicating that probably 325 this salt bridge is not enough strong to contribute significantly to the stability. Anyway, R216C 326 mutation removes the possibility to form this salt bridge. None cysteines were observed near 327 C216 to speculate the formation of a disulfuric bridge. immunogenicity is the main protein used as a target of vaccines (Krammer 2020) . Structurally, 333 the spike is divided into two domains: S1 (1-681) and S2 (686-1213). Between these two 334 domains, exist the cleavage region S1/S2 (682-685) (Fig. 7A) . S1 domain has two subdomains: of S1, showed 0.032 NRFp. The side chain of alanine is immersed between side chains of V36, 344 Y38, F220, and I285 (Fig. 7D) . The presence of a bigger side chain could fill better the spaces 345 probably affecting the stability of this region. In our analysis, we found three relevant mutations (L46F (0.04), Q57H (0.29), and G172V(0.03)). 356 Two of them in the ion channel pore. The ion channel shows six constrictions along the pore, 357 the fifth corresponding to the side chain of Q57 (Fig. 8B) . Mutational studies do not show 358 differences in the expression, stability, conductance, selectivity, or gating behavior in the Q57H 359 variant (Kern et al. 2020). Our in-silico analysis of Q578H did not show a reduction of pore size 360 on this constriction (Fig. 8C) . Due to the difference in the pKa of the glutamine and histidine is 361 interesting to speculate that in different pHs functional differences could be observed. The 362 second constriction of the pore corresponds to L46 and I47 side chains ( ribosomal S6 Kinase ( RSK), cyclin-dependent kinase 1 (cdk1/CDC-2), casein kinase 1 (CKI), cyclin-454 dependent kinase 5 (CDK-5), p38 mitogen-activated protein kinase (p38MAPK) and glycogen 455 synthase kinase 3 (GSK-3)) ( Fig. 10) and evaluate the effects of ten identified genomic variants 456 (Table. 2). 457 In the PS S193, PKA kinase showed differences between variants (Fig. 10A) . PKA recognizes 458 sequences where arginine is preferred at positions -3 and -2, whereas hydrophobic residues are 459 preferred at the C-terminal to the phosphorylatable serine/threonine (Fig. S14 ) (Songyang et al 460 1994) . Thus, the S194L and P199L mutations confer a hydrophobic amino acid at position +1 and 461 +6, respectively, (Table. 2) increasing the PP on those variants (Fig. 10A) . The distance of the 462 hydrophobic residue from the PS seems to affects the PP (Fig. 10A) . 463 At position S194 we observed PKA, PKC, and RSK with a different PP for the variant SLRGA (Fig. 464 10B). Similar to site S193, PKA has a slightly higher PP in the SLRGA variant (due to a Leucine in 465 position +5). PKC shows the opposite pattern indicating that proline is preferred in position +5 466 for this kinase (Fig. 10B ). RSK is a highly selective kinase that preferentially phosphorylates serine 467 with basic amino acid residues at positions -5 and -3 ( (Table. 2), which matches the consensus phosphorylation 470 sequence (Fig. S14 ). Despite this, the SLRGA variant had a slightly lower phosphorylation 471 potential in comparison with variants with proline in position +5. S194L excluded the LPKRA, 472 LPRGA, and LPRRA variants as possible targets due to the loss of the PS. 473 At residue S197 NetPhos identified cdc2, CKI, and PKC as possible kinases (Fig. 10C) . For CKI, the 474 consensus sequence S(P)-Xn-S has been proposed, where prior phosphorylation (S(P)) is a critical 475 determinant of kinase action and two residues spacing (Xn = 2) is the best recognition site (Fig. 476 S14) (Flotow et al. 1990 ). LPKRA, LPRGA, and LPRRA do not have the PS S194 reducing the PP for 477 these variants (Fig. 10C) . Also, the absence of K or the presence of R in position 203 seems to 478 impact the PP of S197. For PKC, variants with S194 and P199 show the greatest PP. For cdc-2, 479 subtle differences are predicted between the variants. 480 At PS T198, three kinases were identified, cdk-5, GSK-3, and p38MAPK (Fig. 10D) . CDK family and 481 MAPKs require a proline just after the phospho-acceptor residue (S/T-P motif) (Fig. S14 ) 482 (Sheridan et al. 2008, Songyang et al. 1996) . Thus, in the SLRGA variant, the loss of proline 483 significantly decreases its PP (Fig. 10D ). The greatest difference between the other variants may 484 be due to differences in position 194 (-4) for cdk-5 and position 204 (+6) for p38MAPK. GSK-3 485 recognizes the sequence S-X-X-X-S(P), where prior phosphorylation is required for the 486 recognition site (Fig. S14 ) (Fiol et al. 1987 ). All the variants present this sequence and show little 487 differences among their PPs (Fig. 10D) . 488 PKC and cdc2 were predicted for PS S201 (Fig. 10E ). Cdc2 has shown slight differences between 489 the variants. Woodgett et al., (1986) reported that the sequence S/T-X-K/R, where X is usually 490 an uncharged residue, is a target sequence for PKC (Fig. S14) . Additionally, basic residues at both 491 the N and C termini of the PS enhance the PP. We observed that all the variants have a high PP 492 due to the presence of the uncharged serine at position +1 and the basic lysine/arginine at 493 position +2. The differences between the variants are due to the difference in the basic residue 494 (lysine or arginine) and proline or leucine in position -2 (Fig. 10E) . 495 Similar results for PKC are observed in the PS S202, except for the variant SLRGA that opposite 496 to the S201 it showed lesser PP than SPKGA (Fig. 10F ). The PS T205 shows a higher PP for PKC in 497 variants where residue G204 is present (Fig. 10G) . G204 creates the PKC consensus sequence 498 described by Nishikawa et al., (1997) (Fig. S14) . 499 NetPhos predicted cdk5, PKA, and RSK as potential kinases for S206 with variable PP for the 500 variants (Fig. 10H) . All variants show a proline at the position +1 necessary for cdk5 recognition 501 (Fig. S14) . Thus, subtle differences appear due to differences in the c-terminal of S206 with 502 variants with RR (-2, -3) as those with the highest PP (Fig. 10H) . For PKA and RSK, LPRRA and 503 SPRRA variants showed the highest PP. Again, this can be explained by the presence of RR. When 504 just one R is present the PP goes down and when there is none it goes down even more. Also, 505 we noted that NetPhos predicts a higher PP where R is presented in position 204 than when it 506 is present at position 203 for PKA and the opposite for RSK. 507 The P13L and A220V mutation do not generate differences in kinase PPs predicted by NetPhos 508 due to its distance from the predicted phosphorylation sites. 509 To perform the mutation frequency analysis in each ORF, we first downloaded a total of 100 924 517 complete and high coverage genomes from the GISAID database (as of October 05 th , 2020). After 518 that, we randomly divided the genomes in 4 groups of 25 231 genomes each. Each set of 519 genomes was divided in genomes with at least one ambiguity and without ambiguities (any 520 nucleotide different to A, C, T or G was identified as ambiguous). The 8 sets of genomes were 521 aligned using MAFFT with FFT-NS-2 strategy and default parameter settings (Katoh et al. 2002) . 522 In each alignment, we removed columns that do not correspond to the region from nt 203 to nt 523 29674, and insertions respect to the genome EPI_ISL_402125 (Thus, our analysis does not cover 524 the frequency of insertions respect to the reference genome EPI_ISL_402125). After that, we 525 repeat the alignment with MAFFT (Katoh et al. 2002 ) (same setting that the previous one) and 526 the columns removal iteratively until we do not obtain difference in the number of columns after 527 the alignment (indicating that any other insertion was detected in the region comprised nt 203 528 to nt 29674 in the reference genome). Finally, we bound the 8 alignments using cat function in 529 Linux and use this to extract regions corresponding to each of the ORFs and nsp regions of SARS-530 CoV-2 (regions as annotated in the NCBI database of the Wuhan-Hu-1 reference genome). These 531 regions were translated to obtain the AA sequences. After that, sequences were divided by 532 continent-month combinations, aligned using MAFFT with FFT-NS-2 strategy and default 533 parameter settings (Katoh et al. 2002) , columns with more than 98 % gaps were removed and 534 relative frequencies of each base or gap in each position were calculated ( , − ). To obtain 535 Normalized Relative Frequencies by COVID-19 cases ( ) we follow similar steps as described 536 in (Justo et al. 2020 showing residues 2 Å close to T428 side chain in spheres (above) or sticks (below) representation, no hydrogen bonds were observed involving this threonine. D) Model of the T428I variant in the same view of C, after a potential energy minimization we noted that internal spaces are better filled. II is present just after E14 that forms a hydrogen bond with G11 in the dimer interface. S15 could confer solubility and less flexibility due to its location. C) Packing of the side chains near to L89 in domain I showing that mutations L89F slightly changes the packaging in this region (a potential energy minimization was done after the mutation was modelled). 6M5I) . B) Connection between α1 and α2 is formed by three serines (S24, S25 and S26). C) S26 is forming a hydrogen bond with D163 from nsp8 and hydroxyl of S25 is pointing to the solvent. D) Computational modelling of S25L shows a gaining in hydrophobicity of the region with the three serines. Mutation P323L could affect the rigidity of the loop, but another proline is present in position P322. Thus, P323L increases the internal hydrophobicity probably modifying the stability of nsp12. E-F) Presence of A97V mutation allows that a methyl group from valine accomodates near to V96 increasing the contacts between methyl groups in that region. A-B) nsp16 binds SAM through several residues, one group of residues stabilized the binding of the nucleoside (A) and another group stabilized the mehtionine moiety (B). C) PDB: 7JHE shows a well established salt-bridge between D217 and R216. D) however in PDB: 6W4H salt-bridge seems not well established due to the greater distance between residues. These dofferences are probably due to the competence for interactions with the solvent. A) The monomer of spike protein (PDB 6VYB, chain A) is formed by two domains named S1 and S2. B) S1 domains is subdivided in and N-terminal domain (NTD) and in a Receptor Binding Domain (RBD), inside RBD can be found the region responsible for ACE2 receptor recognition called Receptor Binding Motif (RBM). C) S2 is formed for a small region called Fusion Peptide (FP), two helix subdomains involved in intermonomer interactions (HR1 and CH) and a C-terminal domain. D) Mutation A222V in the NTD domain of spike interacts better with Y38, F220 and I285. . Mutation S24L would prevent the formation of a hydrogen bond a intermonomer interface of orf8. A) orf8 presents two interfaces of dimerization showed in cyan and magenta. B) Residue S24 is capable to form a hydrogen bond with K53, however, just one of the possible interactions is visualized in the structure indicating that is not sufficient stable. Thus, the mutation S25L would not significantly affects dimer stability. Severe acute respiratory 613 syndrome coronavirus nonstructural protein 2 interacts with a host protein complex 614 involved in mitochondrial biogenesis and intracellular signaling Coronavirus NSP6 restricts autophagosome 617 expansion Coronavirus nonstructural protein 15 mediates evasion of dsRNA sensors and limits 620 apoptosis in macrophages The Pfam protein families database in 2019 Structural and molecular basis of mismatch correction and 628 ribavirin excision from coronavirus RNA Formation of protein 631 kinase recognition sites by covalent modification of the substrate. Molecular 632 mechanism for the synergistic action of casein kinase II and glycogen synthase kinase 3 Phosphate groups 635 as substrate determinants for casein kinase I action Structure of SARS-CoV ORF8, a rapidly evolving coronavirus protein implicated in immune evasion Prohibitin induces the 641 transcriptional activity of p53 and is exported from the nucleus upon apoptotic signaling Phosphoproteomic analysis identifies 645 the tumor suppressor PDCD4 as a RSK substrate negatively regulated by 14-3-3 Structure of the RNA-dependent RNA 651 polymerase from COVID-19 virus A SARS-CoV-2 653 protein interaction map reveals targets for drug repurposing The nsp2 replicase protein of 656 murine hepatitis virus and severe acute respiratory syndrome coronavirus is dispensable 657 for viral replication Severe Acute Respiratory Syndrome 659 Coronavirus 7a Accessory Protein Is a Viral Structural Protein Activities Associated with Severe Acute Respiratory Syndrome Coronavirus Helicase Delicate structural coordination of the Severe Acute Respiratory Syndrome coronavirus 666 Nsp13 upon ATP hydrolysis Global geographic and temporal analysis of SARS-CoV-2 haplotypes 669 normalized by COVID-19 cases during the first seven months of the pandemic Crystal structure of SARS-CoV-2 nucleocapsid protein 673 RNA binding domain reveals potential unique drug targeting sites MAFFT: a novel method for rapid multiple 676 sequence alignment based on fast Fourier transform Crystal structure of Nsp15 endoribonuclease NendoU from SARS-680 Respiratory Syndrome Coronavirus Inhibits Cellular Protein Synthesis and Activates p38 Structural analysis of the SARS-CoV-2 685 methyltransferase complex involved in RNA cap creation bound to sinefungin SARS-CoV-2 vaccines in development The nonstructural protein 8 (nsp8) of the SARS coronavirus interacts with its ORF6 690 accessory protein A G-quadruplex-binding 692 macrodomain within the "SARS-unique domain" is essential for the activity of the SARS-693 coronavirus replication-transcription complex Comparison of the 695 specificities of p70 S6 kinase and MAPKAP kinase-1 identifies a relatively specific 696 substrate for p70 S6 kinase: the N-terminal kinase domain of MAPKAP kinase-1 is 697 essential for peptide phosphorylation Nsp3 of coronaviruses: Structures and functions of a 699 large multi-domain protein Crystal Structure of the SARS-CoV-2 Non-701 structural Protein 9 Covid-19: Is a second wave hitting Europe Genomic characterization and epidemiology of 2019 novel coronavirus: 708 implications for virus origins and receptor binding Structural 710 basis and functional analysis of the SARS coronavirus nsp14-nsp10 complex Prohibitin function within mitochondria: essential roles 714 for cell proliferation and cristae morphogenesis Acute Respiratory Syndrome Coronavirus nsp9 Dimerization Is Essential for Efficient 718 Viral Growth Structure and intracellular 720 targeting of the SARS-coronavirus orf7a accessory protein Determination of the 722 specific substrate sequence motifs of protein kinase Localization 725 and Membrane Topology of Coronavirus Nonstructural Expression and cleavage of middle east respiratory syndrome coronavirus nsp3-4 729 polyprotein induce the formation of double-membrane vesicles that mimic those 730 associated with coronaviral RNA replication Emerging SARS-CoV-2 733 mutation hot spots include a novel RNA-dependent-RNA polymerase variant Structure, 736 expression, and intracellular localization of the SARS-CoV accessory proteins 7a and 7b UCSF 739 ChimeraX: structure visualization for researchers, educators, and developers Chimera-A visualization sustem for exploratory research and analysis Cryo-746 EM Structures of the SARS-CoV-2 Endoribonuclease Nsp15 Phosphorylation of the arginine/serine dipeptide-rich 749 motif of the severe acute respiratory syndrome coronavirus nucleocapsid protein 750 modulates its multimerization R: A language and environment for statistical computing. R 756 Foundation for statistical computing Coronavirus 758 RNA Proofreading: Molecular Basis and Therapeutic Targeting Two-amino acids 761 change in the nsp4 of SARS coronavirus abolishes viral replication The Transmembrane Domain of the Severe 764 The ORF7b Protein of Severe Acute 767 Respiratory Syndrome Coronavirus (SARS-CoV) Is Expressed in Virus-Infected Cells and 768 Incorporated into SARS-CoV Particles Structure-based 773 virtual screening and molecular dynamics simulation of SARS-CoV-2 Guanine-N7 774 methyltransferase (nsp14) for identifying antiviral inhibitors against COVID-19 Papain-like protease regulates SARS-CoV-2 viral spread and innate immunity GISAID: Global initiative on sharing all influenza data -from 785 vision to reality 787 Use of an oriented peptide library to determine the optimal substrates of protein 788 kinases 791 A structural basis for substrate specificities of protein Ser/Thr kinases: primary sequence 792 preference of casein kinases I and II, NIMA, phosphorylase kinase, calmodulin-793 dependent kinase II, CDK5, and Erk1 One severe acute respiratory syndrome coronavirus protein 797 complex integrates processive RNA polymerase and exonuclease activities The severe acute 802 respiratory syndrome coronavirus nucleocapsid protein is phosphorylated and localizes 803 in the cytoplasm by 14-3-3-mediated translocation The nsp9 Replicase Protein of SARS-Coronavirus The SARS-Unique Domain (SUD) of SARS Coronavirus Contains Two Overexpression of 7a, a 813 Protein Specifically Encoded by the Severe Acute Respiratory Syndrome Coronavirus Induces Apoptosis via a Caspase-Dependent Pathway The SARS-coronavirus nsp7+nsp8 817 complex is a unique multimeric RNA polymerase capable of both de novo initiation and 818 primer extension The SARS-CoV-2 main protease as drug target Structural basis of RNA cap modification 825 by SARS-CoV-2 Prohibitin, a potential tumor suppressor, 827 interacts with RB and regulates E2F function SWISS-MODEL: homology modelling 830 of a protein structures and complexes Ggplot2: Elegant graphics for data analysis Substrate specificity of protein kinase C. Use of 837 synthetic peptides corresponding to physiological sites as probes for substrate 838 recognition requirements Nucleocapsid phosphorylation and RNA helicase DDX1 840 recruitment enables coronavirus transition from discontinuous to continuous 841 transcription Glycogen synthase kinase-3 regulates the phosphorylation of severe acute 844 respiratory syndrome coronavirus nucleocapsid protein and viral replication Activation and maturation of SARS-CoV main protease SARS coronavirus 7a protein 849 blocks cell cycle progression at G0/G1 phase via the cyclin D3/pRb pathway Biochemical characterization of SARS-CoV-2 853 nucleocapsid protein Structural basis for the 856 multimerization of nonstructural protein nsp9 from SARS-CoV-2 Crystal structure of SARS-CoV-2 main protease provides a basis for design of 860 improved a-ketoamide inhibitors The ORF8 Protein of SARS-863 CoV-2 Mediates Immune Evasion through Potently Downregulating MHC-I The authors declare no competing interests 868 We are very grateful to the GISAID Initiative and all its data contributors, i.e. the Authors from