key: cord-0735728-19i9lknr authors: Wu, Aiping; Wang, Lulan; Zhou, Hang-Yu; Ji, Cheng-Yang; Xia, Shang Zhou; Cao, Yang; Meng, Jing; Ding, Xiao; Gold, Sarah; Jiang, Taijiao; Cheng, Genhong title: One Year of SARS-CoV-2 Evolution date: 2021-02-24 journal: Cell Host Microbe DOI: 10.1016/j.chom.2021.02.017 sha: 8c80f1bea228873bd097aeff9c546b1c17ab534a doc_id: 735728 cord_uid: 19i9lknr Since the outbreak of SARS-CoV-2, the etiologic agent of the COVID-19 pandemic, the viral genome has acquired numerous mutations with the potential to increase transmission. One year after its emergence, we now further analyze emergent SARS-CoV-2 genome sequences in an effort to understand the evolution of this virus. emerge independently in a parallel, convergent pattern of viral antigenic evolution that 30 may confer resistance to neutralizing antibodies (McCarthy et al., 2020) . Here, we iden-31 tify the mutations and deletions accumulated throughout the past year within repre-32 sentative genome sequences of SARS-CoV-2, explore the possible epidemiological pat-33 terns of potentially parallel mutations, and evaluate the impact of existing and potential 34 mutations on the efficacy of monoclonal antibodies and vaccines. itable amino acid mutations in the major strain clades identified by Nextstrain (Hadfield et al., 2018) are shown hierarchically in the phylogenetic tree ( Figure 1B) Figure 1B) . 56 57 Collectively, we detected a total of 130 nucleotide mutations acquired by SARS-CoV-2 58 genomes in the past year; of these, 75 are heritable, non-synonymous mutations (Fig-59 ure S1). Viral evolution studies have indicated that parallel mutations and independently 60 recurrent mutations have higher associations with viral adaptation (van Dorp et al., 2020) . 61 Thus, from the 75 non-synoymous mutations, we further identified 24 heritable muta-62 tions, including the two well-known mutations D614G and N501Y, that potentially arose 63 in parallel (Table S1A ). It should be noted that the potentially parallel mutations detect-64 ed here were based on representative sequences and thus could be highly under-65 estimated. To investigate the occurrence and transmission of these potentially parallel 66 mutations, we plotted the spatiotemporal distribution of sequences from GISAID with 67 these mutations in the past year as an indicator of their possible epidemiological distri-68 bution ( Figures 1C and S2 ). Significant epidemiological patterns were observed for the-69 se mutations in the SARS-CoV-2 genomes ( Figures 1C and S2 ). Of them, the D614G 70 mutation is notable for having raised global concern over its rapid transmission and 71 dominance. The L37F mutation in nonstrucutral protein 6 (NSP6) protein has appeared 72 frequently in different clades and across continents. The N501Y mutation in the S pro-73 Brazil. The S477N mutation was likely responsible for causing the epidemic from July to 75 September of 2020 in Oceania. Strikingly, two potentially parallel mutations observed in 76 the Ser/Arg (SR)-rich linker region of the N protein (R203K and G204R) co-occurred 77 across 6 continents ( Figure 1C ). 78 To evaluate the current status of therapeutic antibodies against mutations in the spike 80 protein of the SARS-CoV-2, we first identified the most prevalent mutations in the S pro-81 tein, as illustrated by the heatmap (Figure 2A ). There are ten key mutations located in 82 the RBD of the S protein. We further analyzed the phylogenetic pattern of these muta-83 tions ( Figure S3 ) and evaluated their positional relationship to the distribution of genetic 84 diversity score across RBD residues ( Figure 2B ). Based on the reported epitope map-85 ping of COVID-19 patients by Shrock et al ( Figure 2C ), although most COVID-19 pa-86 tients produce antibodies against SARS-CoV-2 proteins, the levels of neutralizing anti-87 bodies (nAbs) vary among individual patients and typically correspond to the severity of 88 the infection (Shrock et al., 2020) . Therefore, it is not clear whether these patients, es-89 pecially those with mild disease, will produce sufficient amounts of nAbs against SARS-90 CoV-2 to prevent reinfection (Bosnjak et al., 2020) . Recent studies have already indi-91 cated that mutations or indels in the RBD region may impact the neutralization efficacy 92 of nAbs (Wang et al., 2021) . To evaluate the current state of therapeutic antibodies, we 93 determined the frequency of epitopes corresponding to 20 reported nAbs (listed in Fig-94 ure 2D and Table S1B ) and visualized epitope frequency within the RBD structure ( Methods 133 The multiple sequence alignments (MSAs) of >320,000 quality-checked genome se-135 quences as of 11 th January, 2021 were downloaded from GISAID after access was 136 granted. By comparing with the reference genome EPI_ISL_402125 (Wu et al., 2020), 137 we identified the mutations and deletions in these genomes. The insertions or ambigu-138 ous nucleotides were ignored when counting the mutations and deletions. We used the 139 same site-numbering scheme as the reference genome. The ORF and protein annota-140 tion of the genome were inferred from NCBI RepSeq NC_045512 (Wu et al., 2020) . The residues in the SARS-CoV-2 RBD monomer structure (PDB:7BZ5) were colored 175 according to the frequency of epitopes in the nAbs listed in Figure 2D . The frequency 176 was calculated using Python v3.7. The structural illustration was generated using PyMol 177 (https://pymol.org/2/). 178 The data analyzed in Figure with fixed amino acid mutations for representative SARS-CoV-2 strains selected by 245 Fixed mutations detected in each cluster were dis-246 played in boxes. Viral strains were divided into hierarchical clusters as those in 247 (C) The spatiotemporal distribution of SARS-249 In the histogram for each continent/mutation pairing, the X-axis 251 represents the collection date of the sequenced viruses and the Y-axis represents the 252 number of sequences with the indicated mutation. The number of mutated sequences is 253 shown in red