key: cord-133453-23rfdkuw authors: Chen, Jiahui; Gao, Kaifu; Wang, Rui; Wei, Guowei title: Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies date: 2020-10-13 journal: nan DOI: nan sha: doc_id: 133453 cord_uid: 23rfdkuw Antibody therapeutics and vaccines are among our last resort to end the raging COVID-19 pandemic.They, however, are prone to over 1,800 mutations uncovered by a Mutation Tracker. It is urgent to understand how vaccines and antibodies in the development would be impacted by mutations. In this work, we first study the mechanism, frequency, and ratio of mutations on the spike (S) protein, which is the common target of most COVID-19 vaccines and antibody therapies. Additionally, we build a library of antibody structures and analyze their 2D and 3D characteristics. Moreover, we predict the mutation-induced binding free energy (BFE) changes for the complexes of S protein and antibodies or ACE2. By integrating genetics, biophysics, deep learning, and algebraic topology, we deduce that some of the mutations such as M153I, S254F, and S255F may weaken the binding of S protein and antibodies, and potentially disrupt the efficacy and reliability of antibody therapies and vaccines in the development. We provide a strategy to prioritize the selection of mutations for designing vaccines or antibody cocktails. The expeditious spread of coronavirus disease 2019 pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to 34,667,658 confirmed cases and 1,030,040 fatalities as of September 30, 2020. In the 21st century, three major outbreaks of deadly pneumonia are caused by βcoronaviruses: SARS-CoV (2002) , Middle East respiratory syndrome coronavirus (MERS-CoV) (2012), and SARS-CoV-2 (2019) [1] . Similar to SARS-CoV and MERS-CoV, SARS-CoV-2 causes respiratory infections, and the transmission of viruses occurs among family members or in healthcare settings at the early stages of the outbreak. However, SARS-CoV-2 has an unprecedentedly high infection rate compared to SARS-CoV and MERS-CoV [2] . Considering the high infection rate, high prevalence rate, long incubation period [2] , asymptomatic transmission [3, 4] , and potential seasonal pattern [5] of COVID-19, the development of specific antiviral drugs, antibody therapies, and effective vaccines is of paramount importance. Traditional drug discovery takes more than ten years, on average, to bring a new drug on the market [6]. However, developing potent SARS-CoV-2 specified antibodies and vaccines is a relatively more efficient and less timeconsuming strategy to combat COVID-19 for the ongoing pandemic [7] . Antibody therapies and vaccines depend on the host immune system. Recently studies have been working on the host-pathogen interaction, host immune responses, and the pathogen immune evasion strategies [8] [9] [10] [11] [12] [13] , which provide insight into understanding the mechanism of antibody therapies and vaccine development. The immune system is a host defense system that protects the host from pathogenic microbes, eliminates toxic or allergenic substances, and responds to an invading pathogen [14] . It has innate immune system and adaptive immune system as two major subsystems. The innate system provides an immediate but non-specific response, whereas the adaptive immune system provides a highly specific and effective immune response. Once the pathogen breaches the first physical barriers, such as epithelial cell layers, secreted mucus layer, mucous membranes, the innate system will be triggered to identify pathogens by pattern recognition receptors (PRRs), which is expressed on dendritic cells, macrophages, or neutrophils [15] . Specifically, PPRs identify pathogen-associated molecular patterns (PAMPs) located on pathogens and then activate complex signaling pathways that introduce inflammatory responses mediated by various cytokines and chemokines, which promote the eradication of the pathogen [16, 17] . Notably, the transmission of SARS-CoV-2 even occurs in asymptomatic infected individuals, which may delay the early response of the innate immune response [8] . Another important line of host defense is the adaptive immune system. B lymphocytes (B cells) and T lymphocytes (T cells) are special types of leukocytes that are the acknowledged cellular pillars of the adaptive immune system [18] . Two major subtypes of T cells are involved in the cell-mediated immune response: killer T cells (CD8+ T cells) and helper T cells (CD4+ cells). The killer T cells eradicate cells invaded by pathogens with the help of major histocompatibility complex (MHC) class I. MHC class I molecules are expressed on the surface of all nucleated cells [19] . The nucleated cells will firstly degrade foreign proteins via antigen processing when viruses infect them. Then, the peptide fragments will be presented by MHC Class I, which will activate killer T cells to eliminate these infected cells by releasing cytotoxins [20] . Similarly, helper T cells cooperate with MHC Class II, a type of MHC molecules that are constitutively expressed on antigen-presenting cells, such as macrophages, dendritic cells, monocytes, and B cells [21] . Helper T cells express T cell receptors (TCR) to recognize antigen bound to MHC class II molecules. However, helper T cells do not have cytotoxic activity. Therefore, they can not kill infected cells directly. Instead, the activated helper T cells will release cytokines to enhance the microbicidal function of macrophages and the activity of killer T cells [22] . Notably, an unbalanced response can result in a "cytokine storm," which is the main cause of the fatality of COVID-19 patients [23] . Correspondingly, a B cell involves in humoral immune response and identifies pathogens by binding to foreign antigens with its B cell receptors (BCRs) located on its surface. The antigens that are recognized by antibodies will be degraded to petites in B cells and displayed by MHC class II molecules. As mentioned above, helper T cells can recognize the signal provided by MHC class II and upregulate the expression of CD40 ligand, which provides extra stimulation signals to activate antibody-producing B cells [24] , rendering millions of copies of antibodies (Ab) that recognize the specific antigen. Additionally, when the antigen first enters the body, the T cells and B cells will be activated, and some of them will be differentiated to long-lived memory cells, such as memory T cells and memory B cells. These long-lived memory cells will play a role in quickly and specifically recognizing and eliminating a specific antigen that encountered the host and initiated a corresponding immune response in the future [25] . The vaccination mechanism is to stimulate the primary immune response of the human body, which will activate T cells and B cells to generate the antibodies and long-lived memory cells that prevent infectious diseases, which is one of the most effective and economical means for combating with COVID-19 at this stage. As mentioned above, secreted by B cells of the adaptive immune system, antibodies can recognize and bind to specific antigens. Conventional antibodies (immunoglobulins) are Y-shaped molecules that have two light chains and two heavy chains [26] . Each light chain is connected to the heavy chain via a disulfide bond, and heavy chains are connected through two disulfide bonds in the mid-region known as the hinge region. Each light and heavy chain contain two distinct regions: constant regions (stem of the Y) and variable regions ("arms" of the Y) [27] . An antibody binds the antigenic determinant (also called epitope) through the variable regions in the tips of heavy and light chains. There is an enormous amount of diversity in the variable regions. Therefore, different antibodies can recognize many different types of antigenic epitopes. To be specific, there are three complementarity determining regions (CDRs) that are arranged non-consecutively in the tips of each variable region. CDRs generate most of the diversities between antibodies, which determine the specificity of individuals of antibodies. In addition to conventional antibodies, camelids also produce heavy-chain-only antibodies (HCAbs). HCAbs, also referred to as nanobodies, or VHHs, contain a single variable domain (VHH) that makes up the equivalent antigen-binding fragment (Fab) of conventional immunoglobulin G (IgG) antibodies [28] . This single variable domain typically can acquire affinity and specificity for antigens comparable to conventional antibodies. Nanobodies can easily be constructed into multivalent formats and have higher thermal stability and chemostability than most antibodies do [29] . Another advantage of nanobodies is that they are less susceptible to steric hindrances than large conventional antibodies [30] . Considering the broad specificity of antibodies, seeking potential antibody therapies has become one of the most feasible strategies to fight against SARS-CoV-2. In general, antibody therapy is a form of immunotherapy that uses monoclonal antibodies (mAb) to target pathogenic proteins. The binding of antibody and pathogenic antigen can facilitate either immune response, direct neutralization, radioactive treatment, the release of toxic agents, or cytokine steam inhibition (aka immune checkpoint therapy). The SARS-CoV-2 entry of a human cell facilitated by the process of a series of interactions between its spike (S) protein and the host receptor angiotensin-converting enzyme 2 (ACE2), primed by host transmembrane protease, serine 2 (TMPRSS2) [31] . As such, most COVID-19 antibody therapeutic developments focus on the SARS-CoV-2 spike protein antibodies that were initially generated from patient immune response and T-cell pathway inhibitors that block T-cell responses. A large number of antibody therapeutic drugs are in clinical trials. 28 Currently, most antibody therapy developments focus on the use of antibodies isolated from patient convalescent plasma to directly neutralize SARS-CoV-2 [32] [33] [34] , although there are efforts to alleviate cytokine storm. A more effective and economical means to fight against SARS-CoV-2 is vaccine [35] , which is the most anticipated approach for preventing the COVID-19 pandemic. A vaccine is designed to stimulate effective host immune responses and provide active acquired immunity by exploiting the body's immune system, including the production of antibodies, which is made of an antigenic agent that resembles a disease-causing microorganism, or surface protein, or genetic material that is needed to generate the surface protein. For SARS-CoV-2, the first choice of surface proteins is the spike protein. There are four types of COVID-19 vaccines, as shown in Figure 1 . 1) Virus vaccines use the virus itself, in a weakened or inactivated form. 2) Viral-vector vaccines are designed to genetically engineer a weakened virus, such as measles or adenovirus, to produce coronavirus S proteins in the body. Both replicating and non-replicating viral-vector vaccines are being studied now. 3) Nucleic-acid vaccines use DNA or mRNA to produce SASR-CoV-2 S proteins inside host cells to stimulate the immune response. 4) Protein-based vaccines are designed to directly inject coronavirus proteins, such as S protein or membrane (M) protein, or their fragments, into the body. Both protein subunits and viral-like particles (VLPs) are under development for COVID-19 [36] . Among these technologies, nucleic-acid vaccines are safe and relatively easy to develop [36] . However, they have not been approved for any human usage before. However, the general population's safety concerns are the major factors that hinder the rapid approval of vaccines and antibody therapies. A major potential challenge is an antibody-dependent enhancement, in which the binding of a virus to suboptimal antibodies enhances its entry into host cells. All vaccine and antibody therapeutic developments are currently based on the reference viral genome reported on January 5, 2020 [37] . SARS-CoV-2 belongs to the coronaviridae family and the Nidovirales order, which has been shown to have a genetic proofreading mechanism regulated by non-structure protein 14 (NSP14) in synergy with NSP12, i.e., RNA-dependent RNA polymerase (RdRp) [38, 39] . Therefore, SARS-CoV-2 has a higher fidelity in its transcription and replication process than other single-stranded RNA viruses, such as the flu virus and HIV. Even though the S protein of SARS-CoV-2 has been undergoing many mutations, as reported in [40, 41] . As of September 30, a total of 1811 mutations on the S protein has been detected on 63556 complete SARS-CoV-2 genome sequences. Therefore, it is of paramount importance to establish a reliable computational paradigm to predict and mitigate the impact of SARS-CoV-2 mutations on vaccines and antibody therapies. Moreover, the efficacy of a given COVID-19 vaccine depends on many factors, including SARS-CoV-2 biological properties associated with the vaccine, mutation impacts, vaccination schedule (dose and frequency), idiosyncratic response, assorted factors such as ethnicity, age, gender, or genetic predisposition. The effect of COVID-19 vaccination also depends on the fraction of the population who accept vaccines. It is essentially unknown at this moment how these factors will unfold for COVID-19 vaccines. It is no doubt that any preparation that leads to an improvement in the COVID-19 vaccination effect will be of tremendous significance to human health and the world economy. Therefore, in this work, we integrate genetic analysis and computational biophysics, including artificial intelligence (AI), as well as additional enhancement from advanced mathematics to predict and mitigate mutation threats to COVID-19 vaccines and antibody therapies. We perform single nucleotide polymorphism (SNP) calling [41, 42] to identify SARS-CoV-2 mutations. For mutations on the S protein, we analyze their mechanism [43] , frequency, ratio, and secondary structural traits. We construct a library of all existing antibody structures from the Protein Data Bank (PDB) and analyze their two-dimensional (2D) and three-dimensional (3D) characteristics. We further predict the mutation-induced binding affinity changes of antibody and S protein complexes using a topology-based network tree (TopNetTree) [44] , which is a state-of-the-art model that integrates deep learning and algebraic topology [45] [46] [47] . After identifying mutations that are potentially disruptive to antibody and S protein interactions, we further infer their threats to vaccines based on antibody binding site analysis, mutation-induced disruptive free energy, and mutation occurrence frequency. We combine frequency and free energy change to prioritize mutation threats and guild the development of future vaccines and antibody therapies. As a fundamental biological process, mutagenesis changes the organism's genetic information and servers as a primary source for many kinds of cancer and heritable diseases, which is a driving force for evolution [48, 49] . Generally speaking, virus mutations are introduced by natural selection, replication mechanism, cellular environment, polymerase fidelity, gene editing, random genetic drift, gene editing, recent epidemiology features, host immune responses, etc [50, 51] . Notably, understanding how mutations have changed the SARS-CoV-2 structure, function, infectivity, activity, and virulence is of great importance for coming up with life-saving strategies in virus control, containment, prevention, and medication, especially in the antibodies and vaccines development. Genome sequencing, SNP calling, and phenotyping provide an efficient means to parse mutations from a large number of viral samples [40, 42] (see the Supporting material (S1)). In this work, we retrieved over 60,000 complete SARS-CoV-2 genome sequences from the GISAID database [52] and created a real-time interactive SARS-CoV-2 Mutation Tracker( https://users.math.msu.edu/users/weig/SARS-CoV-2 Mutation Tracker.html) to report over 18,000 single mutations along with its mutation frequency on SARS-CoV-2 as of September 30. Figure 2 is a screenshot of our online Mutation Tracker. It describes the distribution of mutations on the complete coding region of SARS-CoV-2. The y-axis shows the natural log frequency for each mutation at a specific position. A reader can download the detailed mutation SNP information from our Mutation Tracker website. As mentioned before, the S protein has become the first choice for antibody and vaccine development. Among 63,556 complete genome sequences, 1811 unique single mutations are detected on the S protein, and the h-index of S protein is 52 [40, 40] The number of unique mutations (N U ) is determined by counting the same type of mutations in different genome isolates only once, whereas the number of non-unique mutations (N NU , i.e., frequency) is calculated by counting the same type of mutations in different genome isolates repeatedly. Table 1 lists the distribution of 12 SNP types among unique and non-unique mutations on the S protein of SARS-CoV-2 worldwide. It can be seen that C>T and A>G are the two dominated SNP types, which may be due to the innate host immune response via APOBEC and ADAR gene editing [43] . Moreover, 133 non-degenerated mutations occurred on the S protein receptor-binding domain (RBD), which are relevant to the binding of SARS-CoV-2 S protein and most antibodies as well as ACE2. Additionally, 59 mutations occurred on the S protein domain (residue id: 14 to 226) are relevant to the binding of another antibody (4A8) and SARS-CoV-2 S protein. Furthermore, since antibody CDRs are random coils, the complementary antigen-binding domains must involve random coils as well. Table 2 lists the statistics of non-degenerate mutations on the secondary structures of SARS-CoV-2 S protein. Here, the secondary structures are mostly extracted from the crystal structure of 7C2L [53] , and the missing residues are predicted by RaptorX-Property [54] . We can see that for both unique and non-unique cases, the average mutation rates on the random coils of the S protein have the highest values. Particularly, the 23403A>G-(D614G) mutation on the random coils has the highest frequency of 39967. If we do not consider the 23403A>G-(D614G) mutations, then the unique and non-unique average rates on the random coils of S protein still have the highest values (0.98 and 10.83), indicating that mutations are more likely to occur on the random coils. Consequently, the natural selection of mutations may tend to disrupt antibodies. Table 2 : The statistics of non-degenerate mutations on the secondary structure of SARS-CoV-2 S protein. The unique and non-unique mutations are considered in the calculation. N U , N NU , AR U , AR NU represent the number of unique mutations, the number of nonunique mutations, the average rate of unique mutations, and the average rate of non-unique mutations on the secondary structure of S protein, respectively. Here, the secondary structure is mostly extracted from the crystal structure of 7C2L, the missing residues are predicted by RaptorX-Property. We construct a SARS-CoV-2 antibody library of 28 3D antibody structures deposited in the PDB. Among them, the binding sites of 27 antibodies are on the RBD of the S protein. While another antibody, 4A8 [53] , has a distinguished binding domain. Additionally, MR17-K99Y is a mutant of antibody MR17 [55] . We align 26 antibody structures, excluding MR17-K99Y, with SARS-CoV-2 S protein in Figure 3 . ACE2 is included as a reference. Clearly, except for antibody 4A8, all other 26 structures bind to the S protein RBD. It is interesting to note that 4A8 locates on a different domain. The PDB IDs of these complexes can be found in Figure 4 . [56] , CR3022 [61] , EY6A [62] , and 4A8, all the other 23 antibodies have their binding sites spatially clashing with that of ACE2. Notably, the paratope of H014 [67] does not overlap with that of ACE2 directly, but in terms of 3D structures, their binding sites still overlap. This suggests that the bindings of 23 antibodies are in direct competition with that of ACE2. Theoretically, this direct competition reduces the viral infection rate. For such an antibody with strong binding ability, it will directly neutralize SARS-CoV-2 without the need of antibody-dependent cell cytotoxicity (ADCC), antibody-dependent cellular phagocytosis (ADCP), or other immune mechanisms. The paratopes of S309, CR3022, and EY6A on the RBD are away from that of ACE2, leading to the absence of binding competition [62, 69, 70] . One study shows that the ADCC and ADCP mechanisms contribute to the viral control conducted by S309 in infected individuals [69] . For CR3022, one research indicates that it neutralizes the virus in a synergistic fashion [71] . For EY6A, the hypothesis is that the binding of EY6A could inhibit the glycosylation of ACE2 [62] . A more radical example is 4A8 [53] , it binds to the N-terminal domain (NTD) of the S protein (Figure 3 (h)), which is quite far from the RBD, it is speculated (i) The 3D structure of S protein RBD. The red, green, and blue represent for helix, sheet, and random coils of RBD, respectively. The darker color represents the higher mutation frequency on a specific residue. The antibodies are S309 (6M0J) [56] , CC12.1 (6XC2) [57] , CC12.1 and CR3022 (6XC3) [57] , CC12.3 (6XC4) [57] , CC12.3 and CR3022 (6XC7), C105 (6XCM) [58] , REGN10933 and REGN10987 (6XDG) [59] , CV30 (6XE1) [60] , Fab 2-4 (6XEY) [55] , CR3022 (6YLA) [61] , H11-D4 (6YZ5), CR3022 and H11-D4 (6Z2M) [61] , H11-H4 (6ZBP), EY6Z and nanobody (6ZCZ) [62] , EY6Z (6ZER) [62] , P2B-2F6 (7BWJ) [63] , BD23 (7BYR) [64] , B38 (7BZ5) [65] , CB6 (7C01) [66] , 4A8 (7C2L) [53] , SR4 (7C8V) [55] , B38 (7C8W), H014 (7CAH) [67] , MR17-K99Y (7CAN) [55] , BD-604 (7CH4), BD-629 (7CH5), BD-236 (7CHB), BD-236 and BD-368-2 (7CHE), BD-604 and BD-368-2 (7CHF), BD-368-2 (7CHH), COVA2-04 (7JMO) [68] , and COVA2-39 (7JMP) [68] . that 4A8 may neutralize SARS-CoV-2 by restraining the conformational changes of the S protein, which is very important for the SARS-CoV-2 cell entry [53] . Any antibody or drug that can inhibit serine protease TMPRSS2 priming of the S protein priming can effectively stop the viral cell entry [31] . Figure 3 provides a visual illustration of antibody and ACE2 competitions. It remains to know in the residue detail what has happened to these competitions. To better understand the antibody and S protein interactions, we study the residue contacts between antibodies and the S protein. We include the ACE2 as a reference but excluding antibodies 4A8 and MR17-K99Y. In Figure 4 , the paratopes of 26 antibodies and ACE2 were aligned on the S protein RBD 2D sequence, and their contact regions are highlighted. From the figure, one can see that, except for H014, S309, CR3022, and EY6A, all the other 22 antibodies have their antigenic epitopes overlapping with the ACE2 RBD, especially on the residues from 486 to 505 of the SARS-CoV-2 RBD. Therefore, these 22 antibodies competitively bind against ACE2 as revealed in Figure 3. The next question is whether there is any connection or similarity between the antibody paratopes in our library, particularly for those antibodies that share the same binding sites. To better understand this perspective, we carry out multiple sequence alignment (MSA) to further study the similarity and difference among existing antibodies. Many antibodies are very similar to each other and can be described in a few groups. The first group includes BD-629, CC12.3 [57] , COVA2-04 [68] , CV30 [60] , CC12.1 [57] , B38 [65] , BD-236, BD-604, EY6A, and REGN10933 [59] , as well as CB6 [66] . Their identity scores to CB6 are 87. 39 Therefore, multiple sequence alignment suggests that the paratopes of the antibodies BD-629, CB6, COVA2-04, CV30, CC12.1, CC12.3, C105, BD-604, BD-236, and B38 are almost identical. Similarly, the paratopes of the antibodies H11-H4, H11-D4, Nb are highly consistent. So are the antibodies REGN10987, COVA2-39, and P2B-2F6. The above similarity indicates that the adaptive immune systems of individuals have a common way to generate antibodies. On the other hand, the existence of three distinct groups, as well as antibody 4A8 suggests the diversity in the immune response. Note that we have also included ACE2 in our MSA as a reference but none of the existing antibodies is similar to ACE2, because they were created from entirely different mechanisms. To investigate the influences of existing S protein mutations on the binding free energy (BFE) of S protein and antibodies, we consider 133 mutations occurred on the S protein RBD which are relevant to the binding of SARS-CoV-2 S protein and antibodies as well as ACE2. Additionally, 59 mutations occurred on the NTD of the S protein (residue id: 14 to 226) which are relevant to the binding of SARS-COV-2 S protein and antibody 4A8 (PDB: 7C2L). We predict the free energy changes following existing mutations using our TopNetTree model [44] . The RBD mutations are computed which are in the distance of 10Å to antibodies. Our predictions are built from the X-ray crystal structure of SARS-CoV-2 S protein and ACE2 (PDB 6M0J) [56] , and various antibodies (PDBs 6WPS [69] , 6XC2 [57] , 6XC3 [57] , 6XC4 [57] , 6XC7, 6XCM [58] , 6XDG [59] , 6XE1 [60] , 6XEY [72] , 6YLA [61] , 6YZ5, 6Z2M, 6ZBP, 6ZCZ [62] , 6ZER [62] , 7BWJ [63] , 7BYR [64] , 7BZ5 [65] , 7C01 [66] , 7C2L [53] , 7C8V [55] , 7C8W, 7CAH [67] , 7CAN [55] , 7CH4, 7CH5, 7CHB, 7CHE, 7CHF, 7CHH, 7JMO [68] , and 7JMP [68] ). The BFE change following mutation (∆∆G) is defined as the subtraction of the BFE of the mutant type from the BFE of the wild type, ∆∆G = ∆G W − ∆G M where ∆G W is the BFE of the wild type and ∆G M is the BFE of mutant type. Therefore, a negative BFE change means that the mutation decreases affinities, making the protein-protein interaction less stable. We first present the BFE changes ∆∆G of SARS-CoV-2 S protein binding domain with antibody 4A8 in Figure 5 , which is the only complex that is not on the RBD in our collections of S protein and antibody complexes. Most mutations have small changes in their binding free energies, while some of them have large changes. Notably, 25 out of 59 mutations on the binding domain have positive BFE changes, which means that the mutations increase affinities and would make protein-protein interactions more stable. However, the majority (58%) of mutations have negative BFE changes, including high-frequency mutations, M153I, S254F, and S255F. It is also noted that many mutations on the binding domain, such as G142D and K147N, have significant negative free energy changes. The mutations on the binding domain with negative binding affinities reveal that the binding of antibody 4A8 and S protein will be potentially disrupted. Next, we study the BFE changes ∆∆G induced by 39 mutations on the SARS-CoV-2 S protein RBD for the antibody Fab 2-4 (PDB: 6XEY) in Figure 6 . Most mutations induce small changes in the thee binding free energies, while mutations, G485R and S494L, have large negative BFE changes. Overall, 27 out of 39 mutations on the RBD lead to negative BFE changes, which means 69% of mutations will potentially weaken the binding between antibody Fab 2-4 and S protein. Particularly, mutation S477N on the RBD induces a negative BFE change with a high frequency of 3269. While some mutations leading to positive BFE changes, more mutations induce negative BFE changes with large magnitude. Antibody Fab 2-4 shares a similar binding domain with ACE2 and thus is a potential candidate for the direct neutralization of SARS-CoV-2. However, BFE change predictions indicate that the mutations on S protein weaken the Fab 2-4 binding with S protein and make it less competitive with ACE2. In Figure 7 , we illustrate antibody B38 (PDB: 7C8W), which shares the binding domain with ACE2 as well. One can notice that only four mutations, R403S, F490S, L455F, and S494L, have the magnitude of BFE changes larger than 1 kcal/mol and all are negative BFE changes. The rest mutations have a small magnitude of changes. Mutation V483A has a frequency of 31 and small positive BFE changes. Interestingly, mutation S494L induces large BFE changes for antibodies B39 and Fab 2-4. Antibody B39 will reduce its competitiveness with ACE2 if mutations R403S, F490S, L455F, and S494L become dominant. Finally, we consider the BFE change predictions for antibody S309 and S protein complex, whose re- ceptor binding motif (RBM) does not overlap with the RBM of ACE2. The BFE changes induced by 30 mutations are predicted. Among them, 11 changes are positive. Similar to the aforementioned antibodies, most of the mutations lead to small changes in their binding affinity magnitude but three mutations, T345S, V395I, and K444N, induce large negative changes. The binding of antibody S309 might be disrupted, considering that a majority of mutations induce negative BFE changes with large magnitude. While antibodies play a variety of functions in the human immune system such as neutralization of infection, phagocytosis, antibody-dependent cellular cytotoxicity, etc., their binding with antigens is crucial for these functions. Our analysis of BFE changes following mutations on S protein suggests that some antibodies will be less affected by mutations, which is important for developing vaccine and antibody therapies. The BFE change analysis of other antibodies is described in the Supporting material (S3). In this section, we build a library of mutation-induced BFE changes for all mutations and all antibodies. In principle, we could create a library of all possible mutations for all antibodies, as we did for ACE2 [73] . Here, we limit our effort to all existing mutations. Antibody 4A8 on the NTD has been discussed above. We consider antibodies on the RBD. Based on our earlier analysis, three types of SARS-CoV-2 S protein secondary structural residues have F338L G339D E340K V341I Y365H S366P S366F V367F N370S D405V V407I R408I R408T K417R K417N I418V N439K N440K L441I different mutation rates. Among them, the random coils are major components of the RDB and the NTD, as shown in Fig. 3 . Therefore, mutations on the RBD are split into three categories based on their locations in secondary structures helix, sheet, and coil. In Figure 9 , we present the BFE changes for ACE2 and antibodies induced by mutations on helix residues of the S protein RBD. The frequency for each mutation is also presented. Most mutations on helix residues lead to positive BFE changes (green squares), whereas some mutations induce negative BFE changes (pink squares). The N439K mutation having the largest frequency, 106, shows mild BFE changes on ACE2 and antibodies. Mutations K417N and Y505C induce positive BFE changes on most of ACE2 and antibodies. Especially, antibodies C105 and BD-604 have larger BFE changes than ACE2, which indicates that they are stronger competitive than ACE2. Antibody CB6 may be potentially a good therapeutic candidate as its BFE changes are positive following all mutations, but this needs to be confirmed by other mutations on the coil and sheet residues. In Figure 10 , we present the BFE changes for ACE2 and antibodies along with frequencies on mutations of sheet residues of the S protein RBD. The mutation R403S has a large variance of the BFE changes such that both positive and negative changes occurred on antibodies and ACE2. Clearly, antibodies BD23, BD38, CB6, MR17, and MR17-K99Y lead to negative BFE changes on mutations of RBD sheet residues, which reduce their competitive binding ability with ACE2 after mutations. As for mutations with high frequencies, the mutation R403K has negative changes on most antibodies, which poses a danger of disrupting the binding of antibodies and S protein. Figure 11 presents the BFE changes for ACE2 and antibodies along with the log of frequencies on each mutation of coil residues on the S protein RBD. Overall, most mutations on coil residues lead to negative BFE changes. Interestingly, CV30 has the most positive BFE changes following mutations, which can be a good candidate for potential therapy. For the high-frequency mutation S447N, the BFE changes are mild on ACE2 and antibodies. However, mutation L455F induces negative BFE changes for all antibodies except for COVA2-39, which is considered as a potentially dangerous mutation for antibody therapies. N354D N354S N354K R357K K378N T393P V395I R403K R403S L452M L452R Y453F Q493L S494P S494L Y508H In statistics, most mutations (94 of 133) occur on residues whose secondary structures are coil, while 20 out of 133 mutations are on the helix, and 19 out of 133 mutations are on the sheet. Here, 12 mutations on the random coils and 2 mutations on helix are not calculated due to the far distance to antibodies. Moreover, residues on coil have more negative BFE changes (548 negative BFE changes vs. 534 positive BFE changes), while residues on the helix or sheet have more positive BFE changes (74 and 95 negative BFE changes vs. 94 and 123 positive BFE changes, respectively). Lastly, the maximum BFE changes of the helix, sheet, and coils are 4.47 kcal/mol, 4.63 kcal/mol, and 4.52 kcal/mol, while the minimum BFE changes are -2.91 kcal/mol, -2.95 kcal/mol, and -3.33 kcal/mol, respectively. Binding affinity changes (kcal/mol) Figure 12 : Illustration of SARS-CoV-2 mutation-induced maximal and minimal binding free energy changes for the complexes of S protein and 28 antibodies or ACE2. Here, the maximal change strengthens the binding while the minimal change weakens the binding for each complex. Figure 12 indicates the BFE changes extreme values (maximal in blue and minimal in red) of the complexes of S protein ACE2 or antibodies following mutations. Many antibodies, such as CR3022 and CR3022 H11-D4, are not very sensitive to the current S protein mutations. However, some other antibodies, such as CV30, Fab 2-4, and EY6Z Nb, can be dramatically affected by SARS-CoV-2 mutations. The increasing number of affected and dead individuals, the global spread situation, and the lack of prophylactics and therapeutics give rise to the urgent demand for the prevention of COVID-19. Vaccination is the most effective and economical means to prevent and control pandemics [35] . Currently, 213 vaccines are in various clinical trial stages, as reported in an online COVID-19 Treatment And Vaccine Tracker ( https://covid-19tracker.milkeninstitute.org/#vaccines intro). Broadly speaking, there are four types of coronavirus vaccines in progress: virus vaccines, viral-vector vaccines, nucleic-acid vaccines, and proteinbased vaccines, as shown in Figure 1 . The first type of vaccine is the virus vaccine, which injects weakened or inactivate viruses to the human body. A virus is conventionally weakened by altering its genetic code to reduce its virulence and elicit a stronger immune response. A biotechnology company Codagenix is currently working on a "codon optimization" technology to weaken viruses, and it has weakened virus vaccine is in progress [74] . Unlike a weakened virus, the inactivated virus cannot replicate in the host cell. A virus is inactivated by heating or using chemicals, which induces neutralizing antibody titers and has been proven to have its safety [75] . At this stage, both Sinopharm, which works with the Beijing Institute of Biological Products and Wuhan Institute of Biological Products, and Sinovac, which works with Institute Butantan and Bio Farma is developing inactivate SARS-CoV-2 vaccines that are in Phase III clinical trials. The second type of vaccine is the viral-vector vaccine, which is genetically engineered so that it can produce coronavirus surface proteins in the human body without causing diseases. There are two subtypes of viral-vector vaccines: the non-replicating viral vector and the replicating viral vector. There are 4 non-replicating viral vector vaccines in Phase III trials. AstraZeneca and the University of Oxford, whose vaccine is in phase III trials in many countries. It works by taking a chimpanzee virus and coating it with the S proteins of SARS-CoV-2. The chimp virus causes a harmless infection in humans, but the spike proteins will activate the immune system to recognize signs of a future SARS-CoV-2 invasion. Notably, the booster shots can be needed to keep long-lasting immunity. Moreover, at this stage, only one replicating viral-vector vaccine is in Phase I. Institut Pasteur Themis, in cooperating with the University of Pittsburgh CVR and Merck Sharp & Dohme is developing such a replicating viral vaccine, which tends to be safe and provoke a strong immune response [36] . The third type of vaccine is nucleic-acid vaccines, including two subtypes: DNA-based vaccines and RNA based vaccines. At least 40 teams are currently working on nucleic-acid vaccines since they are safe and easy to develop. The DNA-based vaccine works by inserting genetically engineered blueprints of the viral gene into small DNA molecules such as plasmids for injection. Moreover, the electroporation technique is employed to create pores in membranes to increase DNA uptake into cells. The injected DNA will produce mRNA by transcription with the help of the nucleus in human cells. Such an mRNA will translate viral proteins (mostly spike proteins), which are dutifully produced by cells in response to the genes, alarm the immune system, and should produce immunity. Currently, there are four DNA-based vaccines in Phase II. Similar to DNA-based vaccines, the RNA-based vaccines provide immunity through the introduction of RNA, which is encased in a lipid coat to ensure its entering into cells. Two RNA-based vaccines are in Phase III, and companies such as Moderna, Biontec, and Pfizer are working on the advanced development of RNA-based vaccines. The fourth type of vaccine is the protein-based vaccines, which aims to inject viral proteins directly to human bodies to trigger immune readiness. Protein subunits vaccine is one of the subtypes of the proteinbased vaccine. More than 60 teams are working on vaccines with viral protein subunits, such as spike proteins and membrane (M) proteins. Another subtype of the protein-based vaccine is the virus-like particle (VLP) vaccine. The VLP vaccines closely resemble viruses. However, they are not infectious since they do not contain viral genetic material. The non-replicating propriety provides a safer alternative to weakened virus vaccines, the HPV vaccine or newer flu vaccines are VLP vaccines. Currently, 16 teams are working on the VLP vaccines for the future prevention of COVID-19. Since the structural basis of antibody CDRs, or paratope, is random coils, we hypothesize that CDRs favor antigenic random coils as complementary epitopes, i.e., antigenic determinants [76, 77] . Figure 13 depicts the 3D structure of S protein, where the random coils are drawn with green strings, and the other secondary structure is described with the purple surface. It shows that the RBD and the NTD mostly consist of random coils. The RBD is the antigenic determinant of 27 structurally-known SARS-CoV-2 antibodies; meanwhile, the NTD is the binding domain of antibody 4A8, which confirms our hypothesis. Figure 14 marks the secondary structure of the S protein. The red, blue, and green colors represent helix, sheet, and random coils of S protein. It can be seen that the S protein mostly consists of random coils, which means there are many other potential antigenic epitopes on the S protein for antibody CDRs. We believe that the emphasis of direct binding competition with ACE2 in the past [62, 69, 70] has led to the neglecting of many important antibodies that do not bind to the RBD. Therefore, we suggest that researchers pay more attention to antibodies that do not bind to the RBD. Vaccine efficacy is an essential issue for the control of the COVID-19 pandemic. S protein is one of the most popular surface proteins for the vaccine development. However, mutations accumulated on the S protein of SARS-CoV-2, which may reduce the vaccine efficacy. As we found in section 2, mutations are more likely to happen on the random coils of S protein, which may have a devastating effect on vaccines in the development. As shown in Figure 12 , mutations could considerably weaken the binding between the S protein and antibodies and thus pose a direct threat to reduce the efficacy of vaccines. However, there are a few obstacles in determining the exact impacts of mutations to COVID-19 vaccines. Firstly, the four types of vaccine platforms can produce very different virus peptides, which will result in different immune responses, as well as antibodies. Secondly, even for a given vaccine platform, the different peptides may be produced due to different immune responses caused by gender difference, age difference, race difference, etc. Therefore, in this work, we proposed to understand the impact of SARS-CoV-2 mutations on COVID-19 vaccines by the statistical analysis. By evaluating the binding affinity changes induced by 28 existing SARS-CoV-2 antibodies, as shown in Figure 9 to Figure 11 , we can notice that the K417N, Y505C, F456L, and F486L mutations enhance the binding of almost all of the 28 antibodies. In contrast, the R403K, L455F, and P491R mutations have weakened the binding of almost all of the existing antibodies. Moreover, mutation K378N enhances the binding of antibody EY6A, whereas mutation V395I weakens the binding of antibody S309. Furthermore, it is noticed that many mutations such as K417N, Q414R, and G585R considerably disrupt many antibodies and thus may bring a threat to future vaccines. Figure 12 depicts the maximal and minimal binding free energy changes for S protein complexes and 28 antibodies or ACE2. It can be seen that antibodies CR3022, CR3022 H11-D4, BD-629, BD-604, and BD-362-2 are not very sensitive to the current mutations on the S protein. However, other antibodies, such as CV30 and EY6A, are very sensitive to current mutations. In a nutshell, by setting up a SARS-CoV-2 antibody library with the statistical analysis based on the mutation-induced binding free energies changes, we can estimate the impacts of SARS-CoV-2 mutations on COVID-19 vaccines, which will provide a way to infer how a specific mutation will pose a threat to vaccines. This approach works better when more antibody structures become available. Another important factor in prioritization is mutation frequency. Figures 9, 10, and 11 have provided frequency information from our SNP calling. Once a mutation is identified as a potential threat, it can be incorporated into the next generation of vaccines in a cocktail approach. In principle, all four types of vaccine platforms allow the accommodation of new viral strains. Coronavirus disease 2019 (COVID-19) pandemic has gone out of control globally. There is no specific medicine and effective treatment for this viral infection at this point. Vaccination is widely anticipated to be the endgame for taming the viral rampant. Another promising treatment that is relatively easy to develop is antibody therapies. However, both vaccines and antibody therapies are prone to more than 18,000 unique mutations recorded in the Mutation Tracker. We present a prediction of mutation threats to vaccines and antibody therapies. First, we identify existing mutations on the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike (S) protein, which is the man target for both vaccines and antibody therapies. We analyze the mechanism, frequency, and ratio of mutations along with the secondary structures of the S protein. Additionally, we build a library of antibodies with structures available from the Protein Data Bank (PDB) and analyze their two-dimensional (2D) and three-dimensional (3D) characteristics by employing computational biophysics. We further predict the mutation-induced binding free energy (BFE) changes of S protein and antibody complexes by a model called TopNetTree based on deep learning and algebraic topology. From these studies, we infer that some of S protein mutations may disrupt the binding of antibodies and S protein, which will further affect the efficacy and reliability of vaccines. To prioritize mutation threats, we also take into consideration of mutation occurrence frequency. The resulting algorithm indicates that some high-frequency mutations such as M153I, S254F, and S255F with negative BFE changes may potentially disrupt the efficacy and reliability of vaccines and antibody therapies currently in the development. Our method can provide the efficient prioritization of mutations to guild the design of the next generation of vaccines and antibody therapies. Supporting material is available for: S1 Method; S2 Multiple sequence alignments of the antibodies and pairwise identity scores; and S3 Mutation-induced changes of binding free energies of antibody-SARS-CoV-2 spike protein complexes. P337S A344T A344S T345S R346K R346T A348S A348T S359N C361S C361S D364Y A372V S373L F374L V382L P384S P384L T385A A411S G413R Q414E Q414K Q414P Q414R T415S D427Y S438F K444R K444N V445I V445F V445A G446S G446V N450K L455F F456L R457K K458Q K458N S459F S459Y N460T I468F I468V I468T S469P T470N E471Q E471G E471D I472V A475V G476S S477G S477T S477I S477N S477R T478A T478K T478I P479S P479L N481H N481D G482S V483F V483A E484K E484Q E484D G485S G485R F486L F490L F490S P491R L518I H519Q 4 3 Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding COVID-19 vaccine development and a potential nanomaterial path forward Covid-19: four fifths of cases are asymptomatic, China figures indicate Clinical and immunological assessment of asymptomatic SARS-CoV-2 infections Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period Deployment of convalescent plasma for the prevention and treatment of COVID-19 Immune responses in COVID-19 and potential vaccines: Lessons learned from SARS and MERS epidemic Neutralizing antibody responses to SARS-CoV-2 in a COVID-19 recovered patient cohort and their implications The ORF6, ORF8 and nucleocapsid proteins of SARS-CoV-2 inhibit type I interferon signaling pathway COVID-19, immune system response, hyperinflammation and repurposing antirheumatic drugs Highlight of immune pathogenic response and hematopathologic effect in SARS-CoV, MERS-CoV, and SARS-CoV-2 infection Immune response in COVID-19: addressing a pharmacological challenge by targeting pathways triggered by SARS-CoV-2 Overview of the immune response Pathogen recognition by the innate immune system Pattern recognition receptors and inflammation Pathogen recognition in the innate immune response The evolution of adaptive immunity The MHC class I antigen presentation pathway: strategies for viral immune evasion CD8+ T cell effector mechanisms in resistance to infection Genetic control of MHC class II expression The cytokine storm and COVID-19 CD40 and CD154 in cell-mediated immunity Immunological memory in humans Primary structure of a human IgA1 immunoglobulin. IV. streptococcal IgA1 protease, digestion, Fab and Fc fragments, and the complete amino acid sequence of the alpha 1 heavy chain Antibody structure, instability, and formulation Naturally occurring antibodies devoid of light chains Comparison of physical chemical properties of llama VHH antibody fragments and mouse monoclonal antibodies Llama antibody fragments with cross-subtype human immunodeficiency virus type 1 (HIV-1)-neutralizing properties and high affinity for HIV-1 gp120 SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor COVID-19: immunopathology and its implications for therapy Convalescent plasma as a potential therapy for COVID-19 Treatment of 5 critically ill patients with COVID-19 with convalescent plasma Progress and prospects on vaccine development against SARS-CoV-2. Vaccines The race for coronavirus vaccines: a graphical guide A new coronavirus associated with human respiratory disease in China Insights into RNA synthesis, capping, and proofreading mechanisms of SARS-coronavirus Structural and molecular basis of mismatch correction and ribavirin excision from coronavirus RNA Decoding SARS-CoV-2 transmission, evolution and ramification on COVID-19 diagnosis, vaccine, and medicine Decoding SARS-CoV-2 Transmission and Evolution and Ramifications for COVID-19 Diagnosis, Vaccine, and Medicine Genotyping coronavirus SARS-CoV-2: methods and implications Host immune response driving SARS-CoV-2 evolution A topology-based network tree for the prediction of protein-protein binding affinity changes following mutation Topological persistence and simplification Persistent homology analysis of protein structure, flexibility, and folding. International journal for numerical methods in biomedical engineering Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins Loss of protein structure stability as a major causative factor in monogenic disease Mechanisms of viral mutation Making sense of mutation: what D614G means for the COVID-19 pandemic remains unclear GISAID: Global initiative on sharing all influenza data-from vision to reality A neutralizing human antibody binds to the N-terminal domain of the Spike protein of SARS-CoV-2 Raptorx-property: a web server for protein structure property prediction Potent synthetic nanobodies against SARS-CoV-2 and molecular basis for neutralization Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Structural basis of a shared antibody response to SARS-CoV-2 Structures of human antibodies bound to SARS-CoV-2 spike reveal common epitopes and recurrent features of antibodies Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail Structural basis for potent neutralization of SARS-CoV-2 and role of antibody affinity maturation. bioRxiv Structural characterisation of a nanobody derived from a naïve library Structural basis for the neutralization of SARS-CoV-2 by an antibody from a convalescent patient Human neutralizing antibodies elicited by SARS-CoV-2 infection Potent neutralizing antibodies against SARS-CoV-2 identified by high-throughput single-cell sequencing of convalescent patients' B cells A noncompeting pair of human neutralizing antibodies block COVID-19 virus binding to its receptor ACE2 A human neutralizing antibody targets the receptor binding site of SARS-CoV-2 Structural basis for neutralization of SARS-CoV-2 and SARS-CoV by a potent therapeutic antibody An alternative binding mode of IGHV3-53 antibodies to the SARS-CoV-2 receptor binding domain Structural and functional analysis of a potent sarbecovirus neutralizing antibody Potent binding of 2019 novel coronavirus spike protein by a SARS coronavirusspecific human monoclonal antibody. Emerging microbes & infections Human monoclonal antibody combination against SARS coronavirus: synergy and coverage of escape mutants Potent neutralizing antibodies against multiple epitopes on SARS-CoV-2 spike Mutations strengthened SARS-CoV-2 infectivity The SARS-CoV-2 vaccine pipeline: an overview. Current tropical medicine reports Safety and immunogenicity from a phase I trial of inactivated severe acute respiratory syndrome coronavirus vaccine Bioinformatic prediction of epitopes in the Emy162 antigen of Echinococcus multilocularis. Experimental and therapeutic medicine Structural analysis of B-cell epitopes in antibody: protein complexes This work was supported in part by NIH grant GM126189, NSF Grants DMS-1721024, DMS-1761320, and IIS1900473, Michigan Economic Development Corporation, George Mason University award PD45722, Bristol-Myers Squibb, and Pfizer. The authors thank The IBM TJ Watson Research Center, The COVID-19 High Performance Computing Consortium, NVIDIA, and MSU HPCC for computational assistance. RW thanks Dr. Changchuan Yin for useful discussion. The authors declare no competing interests.