key: cord-0907234-ms6aamm6 authors: Wong, Samuel W. K.; Liu, Zongjun title: Conformational variability of loops in the SARS‐CoV‐2 spike protein date: 2021-10-23 journal: Proteins DOI: 10.1002/prot.26266 sha: b995fdf04b836d958c0c090da92155f3d1c834e1 doc_id: 907234 cord_uid: ms6aamm6 The SARS‐CoV‐2 spike (S) protein facilitates viral infection, and has been the focus of many structure determination efforts. Its flexible loop regions are known to be involved in protein binding and may adopt multiple conformations. This article identifies the S protein loops and studies their conformational variability based on the available Protein Data Bank structures. While most loops had essentially one stable conformation, 17 of 44 loop regions were observed to be structurally variable with multiple substantively distinct conformations based on a cluster analysis. Loop modeling methods were then applied to the S protein loop targets, and the prediction accuracies discussed in relation to the characteristics of the conformational clusters identified. Loops with multiple conformations were found to be challenging to model based on a single structural template. The COVID-19 disease is caused by the SARS-CoV-2 strain of coronavirus and its continued spread remains a concern since the first reported infections in late 2019. 1 The SARS-CoV-2 viral genome encodes for four main structural proteins: spike, envelope, membrane, and nucleocapsid. 2 The spike (S) protein is of particular importance as it facilitates viral entry into host cells via its receptor binding domain (RBD), which recognizes human angiotensin-converting enzyme 2 (ACE2). 3 Current vaccines being administered 4 achieve efficacy against SARS-CoV-2 by enabling the human body to produce a modified version of its S protein; this in turn induces the production of neutralizing antibodies against the disease. 5 Toward the development of such therapeutic interventions, many structure determination efforts have focused on the S protein, with the first standalone experimental structure of the full-length S protein obtained via cryo-electron microscopy in mid-February 2020. 6 Soon thereafter, the structure of the S protein RBD bound in a complex with ACE2 was also determined. 7 As of January 13, 2021, there were 203 structures deposited in the Protein Data Bank (PDB 8 ) associated with the SARS-CoV-2 S protein. These include studies of the standalone S protein, 9 the S protein interacting with potential antibodies, 10, 11 and the S protein interacting with various forms of ACE2. 12 Finally, with the emergence of S protein sequence variants, structures corresponding to mutations are also being studied, with D614G being a common example. 13 While individual PDB structures generally provide static snapshots of protein conformations, it is wellknown that proteins exhibit dynamic movement. 14, 15 The local dynamics of atoms and residues are partially depicted via crystallographic B-factors. 16 Larger motions are also possible: for the SARS-CoV-2 S protein, a well-documented example is the ability of its RBD to adopt "up" (or open) and "down" (or closed) states, where the "up" state is the conformation capable of binding to ACE2. 6 Overall then, the PDB is a rich source of data for examining the conformational variability of the S protein, given the number of times its structure has been solved experimentally. This article focuses on the loop conformations of the S protein. Protein loops are the flexible connecting regions between regular secondary structures, and are where protein disorder is most likely to occur. 17 This greater disordered nature of loops may be manifest in a PDB structure via missing atomic coordinates or atoms with high Bfactors. 18 Accurate structure prediction for loops is both challenging and necessary, to construct useful models for downstream therapeutic applications. 19 Loops are of particular importance as they are often associated with protein function, such as providing binding recognition sites and facilitating protein-protein interactions. 20 For example, an extended loop of the SARS-CoV-2 S protein RBD interacts directly with loops of ACE2, as evidenced by the PDB structure of the RBD-ACE2 complex. 21 Dynamic structural changes can occur both in larger regions of a protein (e.g., the SARS-CoV-2 RBD), as well as in individual loops adopting conformational rearrangements to carry out protein function in accordance with their environment. 22 Thus, when a protein has been solved many times in the PDB, we may be able to observe distinct conformations among some of its loops, given their potential for disorder and structural variability. In particular for the SARS-CoV-2 S protein, the PDB also documents sequence variants arising from mutations to some of its loop regions, 23 and the possible structural impacts of mutations can also be studied more broadly via computational methods. [24] [25] [26] Mutations to the S protein are especially of concern as they can lead to more infectious variants of SARS-CoV-2. 27 The task of structure prediction for flexible loops with multiple distinct conformations has been found to be more challenging than for rigid or inflexible ones. 28 Most loop prediction methods are designed to identify the most likely conformation, for example, with the lowest potential energy. [29] [30] [31] [32] [33] [34] Such methods are typically trained on loop sets where a single conformation for each loop is taken from the PDB and assumed to represent the ground truth, 35 and thus tend to be more successful at accurately predicting inflexible loops with one "correct" solution. Accuracy is typically measured by computing the root-mean-squared deviation (RMSD) of the backbone atoms from the predicted loop conformation to the corresponding one in the PDB. In order to study loops that can adopt multiple conformations, prediction methods might instead be applied to generate an ensemble of decoys, which often involves a combination of sampling and scoring steps. 36 Then, the success of different methods could be assessed on the basis of whether their generated ensembles include decoys that are close to each of the known conformations. 28 For the SARS-CoV-2 S protein, this kind of assessment is a good test on the ability of current methods to explore a range of likely conformations, especially if further mutations were to occur in the flexible loop regions. These considerations motivate the main contributions of this article. First, we identify the loop regions and sequence variants from the known PDB structures of the SARS-CoV-2 S protein, and use cluster analysis to classify each loop according to whether it has been observed to adopt multiple distinct conformations or a single conformation only. Second, we apply four current loop prediction methods on the identified loop regions, to generate ensembles of decoys for each one. Third, we discuss the results of these methods and the effectiveness of their application to modeling the loops of the S protein, along with the insights gained via our analyses. The 3-D structures of the SARS-CoV-2 S protein were downloaded from the PDB at the RCSB website (https://rcsb.org) on January 13, 2021, by navigating to the page in the "COVID-19 coronavirus resources" section entitled "Spike proteins and receptor binding domains." We extracted the S protein structures that are not bound to other molecules and have sequence length greater than 1000. This facilitates study of the S protein loop conformations within the context of a (mostly) full-length S protein structure, while without explicit interaction with other proteins. A total of 63 S protein PDB structures satisfied these criteria, most of which are provided as S protein trimers. We treated each chain as an individual sample and thus extracted a total of 193 S protein chains. Some realignments of the corresponding amino acid sequences were required in order to keep the residue numbers consistent across all chains; this was accomplished with the ClustalO service in Jalview. 37 For each S protein chain, we first used DSSP 38 to determine the secondary structure classification of each residue. The eight-state DSSP classification was reduced to the traditional three types of helix (H), sheet (E), and coil (C) following the conventions in the SPIDER3 39 secondary structure prediction method: we map DSSP's "G," "H," and "I" to H; "E" and "B" to E; the remaining three states are mapped to C. Due to structural variability, the classified type (H, E, or C) for a given residue position may not always agree among the 193 S protein chains. Thus, we define a loop region for our study as follows: a segment of five or more consecutive residues where over 50% of the protein chains at each position are classified as type C. Further, if two such segments are separated by only one E or H type residue (i.e., where less than 50% of the chains are type C at that position), we treat the two combined segments (including that connecting residue) as a single loop region. With the starting and ending positions of loops defined in this manner, we check for the presence of sequence variants in each loop region among the S protein chains. If multiple distinct residue sequences are observed for a loop region, we shall treat each unique sequence separately for further analysis. This allows us to document the possible impact of mutations on the loop conformations. Thus, we shall say that a loop instance consists of its starting and ending positions together with its unique residue sequence. We then consider the structural variability of each loop instance. To account for the potential disordered nature and structural uncertainties of loops, we extract both the atomic coordinates and B-factors from the PDB chains. Taking all chains that have no missing coordinates or B-factors within the loop residues, we compute their pairwise RMSD matrix based on the loop's backbone (N, C α , C, and O) atoms. The RMSD calculation is applied after the backbone atoms of the loop residues for each pair are optimally superimposed using the Kabsch algorithm. 40 This is the "local RMSD" 41,42 that compares the loop region only, and so is not sensitive to orientation differences in the rest of the structure. Based on that distance matrix, we apply hierarchical clustering with average linkage (UPGMA 43 ) and a distance cutoff of 1.5 Å 28 to form initial clusters of loop conformations. Following, we incorporate B-factors to ensure that the clusters formed are statistically distinct. Recall that the B-factor can be expressed in terms of the mean-square amplitude of atomic oscillations u 2 around their measured positions: B ¼ 8π 2 ⟨u 2 ⟩. Using an isotropic Gaussian approximation for the corresponding coordinate uncertainties, we can determine whether the difference in backbone coordinates between a loop pair is significantly different with 95% confidence (see Appendix A for details). If none of the chains in one cluster are significantly different from any chains in another cluster, we merge them into a single cluster. Clusters composed entirely of chains with poor structure resolution (>3 Å) after this step are removed from further analysis as the atomic coordinates are unlikely to be sufficiently reliable for making detailed structural comparisons. Each remaining cluster then represents a distinct group of S protein chains which have a similar conformation for that loop instance. We consider a loop instance to have multiple distinct conformations if this analysis results in two or more such clusters of conformations; otherwise, we say that loop instance essentially adopts only a single conformation. We select a representative from each cluster by taking the chain with resolution ≤3 Å that is closest to the geometric centroid of the cluster. Our full list of S protein loop targets for study thus consists of all the cluster representatives obtained from the above steps. To study the conformational variability of the identified S protein loop targets, we make use of several loop modeling methods. We focus on methods that incorporate sampling-based techniques for loop construction, which are suitable for stochastically generating an ensemble of decoys that represent plausible conformations for a loop. We include Rosetta's next-generation KIC (NGK) algorithm, 32 the DiSGro algorithm, 33 and the PETALS algorithm, 34 which are ab initio methods that explore the conformational space with the guidance of an energy or scoring function; these do not directly make use of any structure templates of known loop conformations. We also include the Sphinx algorithm, 30 which is a hybrid method that begins with loop structure fragments obtained from sequence alignment and then completes the loop construction by ab initio sampling. Using each of the methods, we generate an ensemble of 500 decoys for each loop target. The input (or template) structure is the loop target's representative PDB chain, prepared by removing the coordinates of the loop residues: following loop modeling conventions, we treat the backbone atoms from the starting residue's C atom to the ending residue's C α atom as unknown. The generated decoys are compared with the loop structures from each known conformation for that loop region. The backbone RMSD is used to assess the accuracy of the decoys. Two types of RMSDs are calculated, as in Choi and Deane 41 : local RMSD (which superimposes the backbone of the loop residues, as in Section 2.1) and global RMSD, which superimposes the backbone atoms of the two residues on either side of the loop (rather than the backbone of the loop residues themselves) prior to the calculation. Global RMSD, as often reported in loop modeling studies, also considers the decoy's orientation to the rest of the structure. For loop regions with multiple conformations or mutations, decoy generation is carried out multiple times, once using each representative PDB as input; taken together, we may thus assess whether decoys generated from different PDB inputs have good coverage of the conformational space for that loop region. The scoring function associated with each method provides a ranking of its 500 generated decoys for a loop target. Thus, it is of interest to assess how well each method's top-ranking decoys can predict the possible conformations of the loop region. We use three RMSD statistics for this purpose: (a) lowest RMSD among the 500 decoys, (b) RMSD of the top-ranked decoy, and (c) lowest RMSD among the top-five ranked decoys. The first RMSD statistic evaluates the method according to its ability to construct native-like conformations, without regard to whether its scoring function can select the best prediction. The second RMSD statistic corresponds to typical loop modeling assessment, where the top-ranked decoy is selected as the prediction. However, this approach of selecting a single prediction would be less informative if the loop region has multiple conformations. Thus, we also use the third RMSD statistic: by selecting multiple (i.e., the top five) decoys, we can examine whether these top-ranking decoys are structurally distinct and accurately represent the different known conformations. We briefly describe how each of the loop modeling methods is run. The NGK algorithm 32 is included in the Rosetta protein modeling suite (available at https://www.rosettacommons.org/), and we used the version provided in Rosetta release 2020.50 on December 18, 2020. NGK improves on a previous kinematic closure method, which consists of local conformational sampling and Monte Carlo minimization steps performed over two (coarse and full-atom) stages. The program outputs the lowest energy loop structure found in each run, and so to obtain the desired ensemble of decoys we ran the program 500 times, following the recommended settings in the online guide (https://guybrush.ucsf.edu/benchmarks/benchmarks/loop_modeling). The DiSGro algorithm 33 uses a distance-guided sequential chaingrowth method to stochastically sample loop structures. We ran the authors' program to generate 100 000 conformations for the best possible coverage of the conformational space, then used their scoring function to select the 500 decoys with the lowest energy. The PETALS algorithm 34 uses a sequence of propagation and filtering steps to explore the conformational space and locate low-energy structures. We ran the authors' program with 60 000 seeds and outputted 30 000 decoys, then used an updated scoring function to select the 500 top-ranked decoys, see Appendix B for details. The is used to obtain the final ranking of decoys. Sphinx is hosted on the SAbPred server, 45 for which we automated the loop target submissions and used the "general protein" option; no PDB blacklist was necessary as the fragment database had not yet been updated to contain any COVID-19 S protein structures. Table 1 shows the number of PDB chains that contain a complete backbone Table 1 . It should be noted that the exact number and composition of clusters will depend on the algorithm (i.e., cutoff and criterion) chosen. Here, using a cutoff of 1.5 Å with UPGMA, the average RMSD between members of different clusters will be at least 1.5 Å. For example, if we used a cutoff of 1.5 Å with WPGMA 43 Table 1 to provide a fairly stable characterization of the structural variability present in these loops. The final 75 clusters in Table 1 differ in their size and withincluster variation. There were 4 singleton clusters (defined by a single chain only), and 61 clusters were defined by at least four chains and two distinct PDB codes (and often significantly more). These high chain counts per cluster enable more cluster statistics to be examined, compared to related studies, for example, Marks et al. 28 where clusters were defined by at most five chains (except in one case). Here, loop instances with multiple conformations tend have a dominant cluster that is defined by at least two-thirds of the available chains; the one exception is 841-848, which is also the most structurally variable loop with five distinct clusters. For each of the 61 wellrepresented clusters, we computed the average within-cluster RMSD (i.e., between all pairs of members in that cluster) as a measure of its breadth of movement, and a histogram is shown in Figure 2 . The average breadth over all 61 clusters is 0.72 Å. The list of clusters grouped according to their breadth d is shown in Table 2 , where 16 clusters are fairly tight with d ≤ 0.5 Å, 36 clusters have 0.5 < d ≤ 1.0, and the 10 loosest clusters have d > 1.0 Å. It might be expected that shorter loops tend to form tighter clusters as they have a smaller conformational space; indeed, this pattern can be seen as the average loop length of clusters in these three groups are 6.5, 12.1, and 13.0 respectively. The larger clusters also tend to be tighter: the average cluster size in these three groups are 127, 108, and 49, respectively. However, we note that these are overall patterns only; for example, the cluster for the longest loop 783-816 is defined by 142 chains and has only a moderate d = 0.81. It is well-known that the SARS-CoV-2 RBD as a whole can adopt an "up" or "down" conformational state. 6 Here, 7 of the 17 loop instances with multiple conformations were located within the RBD. Notably, both 475-487 and 495-506 which interact with ACE2 are among these. Thus, we examined whether this higher propensity for multiple conformation loops within the RBD might be associated with the chains having an "up" or "down" RBD state, even when the S protein chain is considered in isolation. We took PDB 6zge, 49 where it is known that chain A has a "down" RBD and chain B has an "up" RBD. Then, each of the 193 S protein chains was classified as "up" or "down" according to whether its backbone RMSD to 6zgeB or 6zgeA was smaller. Based on this criterion, the loop at 370-375 has both T A B L E 1 SARS-CoV-2 S protein loops. The first column shows the starting and ending positions of each identified loop region. The second column shows the loop sequences; if there are sequence variants in the PDB, the most common variant is listed first, and other variants have their mutated residues marked in bold. The number of PDB chains containing that loop instance is shown in the third column. The rightmost column lists the representative PDB chains for each loop instance; if a loop instance has multiple conformations, each chain listed corresponds to one distinct conformation (cluster). The number of PDB chains represented by each cluster is shown in parentheses; these may not sum up to the third column since clusters with poor structure resolution (all chains >3 Å) are omitted First, we assess the ability of methods to predict a correct loop structure. We define this loop prediction accuracy by calculating the To visualize these results, the global RMSD of the top decoy is plotted against loop length for each method in Figure 3 . It is clear that the prediction difficulty and the variance of prediction RMSDs tend to increase with loop length, with methods consistently achieving <2 Å RMSD accuracy only for the shortest loops (≤6 residues). This is sensible since the size of the conformational space increases with loop length, with long loops (>12 residues) often posing a challenge for methods to sample adequately. 50 The plots also indicate that hardest targets for a given loop length tend to be those from multiple conformations, especially for the two most accurate methods (NGK and PETALS). The average lengths of loop targets in the "Single conf." and "Multiple conf." categories are similar (9.7 vs. 10.0 residues). The detailed results for each target individually are given in Table S1 In addition to loop length, we also examine whether the cluster characteristics, namely their size (as measured by the number of chains) and breadth (as measured by the average within-cluster RMSD in Figure 2 ), are associated with prediction difficulty. For each T A B L E 4 RMSD metrics for assessing the loop prediction accuracy of the four methods. The loop backbone RMSDs shown are averaged over single conformation targets (n = 26), multiple conformation targets (n = 40), and all targets (n = 66). The columns "Min.," "Top," and "Top-5" refer, respectively, to the lowest RMSD among the 500 decoys, RMSD of the top-ranked decoy, and lowest RMSD among the top-five ranked decoys. Prediction accuracy is defined as the RMSD to the closest loop structure among all chains containing that loop instance Table 5 for the four methods. The sign of the t-statistic indicates whether successes (positive t-statistic) or failures (negative t-statistic) are associated with larger values of that variable; for example, the t-statistics for loop length are all negative, so successes are associated with shorter loop lengths as expected from Next, we focus on the loop instances with multiple distinct conformations, to assess how well the decoys generated from a specific PDB input can represent all the known conformations for that loop instance. Taking the loop 130-140, for example: the decoys generated using 6xluB are compared to the loop structures in the clusters represented by 6xluB, 7kdkC, 7kdlA, and the RMSD to the closest structure in each cluster is recorded; the average of the RMSDs to these three clusters then provides an overall result for 6xluB; the same is done using the decoys from 7kdkC and 7dklA. The results are summarized in Table 6 using the same RMSD metrics, averaged over the targets in the multiple conformation categories. This task is noticeably more challenging than the prior prediction task, as evidenced by RMSDs in Table 6 which are all larger than the corresponding values in the "Multiple conf." rows of Table 4 Detailed results for these five targets are provided in Table S4 of the Supporting Information. In this article, we studied the conformations of loops in the SARS- Table S2 ). Exploring the conformational variability of "Loop 3" thus provides a fuller range of structural states that the development of therapeutics might target before the S protein binds to ACE2. 51 More generally, high-quality loop models are a crucial part of protein structures used in the computational drug discovery process. 19 We found that the structurally flexible loops with multiple con- Both the authors declare no potential conflict of interest. The peer review history for this article is available at https://publons. com/publon/10.1002/prot.26266. The data that support the findings of this study are openly available in the RCSB Protein Data Bank. Since the B-factor is defined as B ¼ 8π 2 ⟨u 2 ⟩, a Gaussian approximation gives the variance in each measured x, y, and z coordinate as B/(3 Á 8π 2 ). For the ith atom, the coordinate difference between the pair is a random vector (H xi , H yi , H zi ) with a multivariate Gaussian distribution with mean vector (x 1i À x 2i , y 1i À y 2i , z 1i À z 2i ) and a diagonal covariance matrix with the value σ 2 By the properties of the multivariate Gaussian, has a chi-squared distribution with 3 degrees of freedom, denoted χ 2 3 . Similarly, considering all the atoms together, a χ 2 3N random variable is defined by The pair of loop backbones are not different if it is plausible that (H xi , H yi , H zi ) = (0, 0, 0) for all N atoms, that is, all the coordinate differences are zero. This corresponds to computing the statistic x 1i À x 2i ð Þ 2 þ y 1i À y 2i ð Þ 2 þ z 1i À z 2i ð Þ 2 σ 2 i and comparing T to the quantiles of the chi-squared distribution with 3N degrees of freedom. Taking a significance level of α = 0.05, let c denote the 0.95 quantile of the χ 2 3N distribution. Then, the pair is considered significantly different if T > c. In this work, we also tested a strategy for improving the energy function accuracy of the PETALS algorithm, in its ability to rank generated loop decoys. The set of structures used for training is the same as that described in Wong et al., 34 namely, the CulledPDB list by PISCES 52 on March 14, 2015 with maximum 20% sequence identity, resolution 2.0 Å, and R-factor cutoff 0.25, thus ensuring no SARS-CoV-2 S protein structures were present. Loop regions were extracted via DSSP, from which we compiled 10 786 loops with lengths ranging from 5 to 10 residues. The PETALS algorithm was first used to generate 200 decoys for each loop, and for each decoy, we computed: RMSD to the native conformation, 210 distance-based energy terms corresponding to each pair of atom types defined in DiSGro's energy function, 33 and a backbone torsion term. 34 We then defineŷ ij as the predicted energy of the ith loop's jth decoy according tô where β k 's are coefficients associated with each energy term E ijk to be trained, and T ij is the torsion term. Then, define the square-error loss function where RMSD ij is the RMSD to native and w ij is the weight associated with the ith loop's jth decoy, N is the number of training loops, and f is a mapping function associated with the rank of that decoy. The decoys with the lowest RMSDs are the ones that best resemble the true conformation; thus the goal is to train the β k 's to minimize this loss function so that the rankings of the predicted energies and the rankings of the RMSD values match as closely as possible. We chose f(Á) to be a function that maps values into quantile bins. Specifically, we ranked the 200 predicted energiesŷ ij È É 200 j¼1 from smallest to largest, then assigning f = 1 to the best 10%, f = 2 to the next 10%, until f = 10 for the last 10%. We ranked the 200 RMSD values RMSD ij È É 200 j¼1 and assigned values of f the same way. Positive weights w ij were assigned to the top five quantile bins, with higher weights for the better ranked predicted energies: 1.0 for the best 10%, 0.9 for the next 10%, until 0.6 for 5th quantile bin, and zero for the rest. We used 80% of the loops as training data and 20% as validation data. As gradient information was unavailable due to the discrete nature of the model, the PySwarms 53 implementation of Particle Swarm Optimization was used to minimize the square error loss function in Equation (B1). A novel coronavirus from patients with pneumonia in China Neutralizing antibodies against SARS-CoV-2 and other human coronaviruses Structural basis of receptor recognition by SARS-CoV-2 Safety and efficacy of the bnt162b2 mRNA COVID-19 vaccine COVID-19 vaccines: delivering protective immunity Cryo-EM structure of the 2019-ncov spike in the prefusion conformation Structure of the SARS-CoV-2 spike receptorbinding domain bound to the ACE2 receptor The Protein Data Bank Distinct conformational states of SARS-CoV-2 spike protein An ultrapotent synthetic nanobody neutralizes SARS-CoV-2 by stabilizing inactive spike. Science A human neutralizing antibody targets the receptor-binding site of SARS-CoV-2 Engineered trimeric ACE2 binds viral spike protein and locks it in "three-up" conformation to potently inhibit SARS-CoV-2 infection Structural and functional analysis of the D614G SARS-CoV-2 spike protein variant Dynamic personalities of proteins New tools provide new insights in NMR studies of protein dynamics Local dynamics of proteins and DNA evaluated from crystallographic B factors Protein disorder prediction: implications for structural proteomics Modeling protein conformational ensembles: from missing loops to equilibrium fluctuations Homology modeling in drug discovery: overview, current applications, and future perspectives Identification of functionassociated loop motifs and application to protein function prediction Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 The role of protein loops and linkers in conformational dynamics and allostery Structural impact on SARS-CoV-2 spike protein by D614G substitution Mutations strengthened SARS-CoV-2 infectivity Coronavirus3d: 3d structural visualization of COVID-19 genomic divergence. Bioinformatics Assessing the impacts of mutations to the structure of COVID-19 spike protein via sequential Monte Carlo The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity Predicting loop conformational ensembles Leap: highly accurate prediction of protein loop conformations by integrating coarse-grained sampling and optimized energy scores with all-atom refinement of backbone and side chains Sphinx: merging knowledgebased and ab initio approaches to improve protein loop prediction Loop modeling: sampling, filtering, and scoring Improvements to robotics-inspired conformational sampling in Rosetta Fast protein loop sampling and structure prediction using distance-guided sequential chain-growth Monte Carlo method Fast de novo discovery of low-energy protein loop conformations Modeling of loops in protein structures Protein loops with multiple meta-stable conformations: a challenge for sampling and scoring methods Jalview version 2-a multiple sequence alignment editor and analysis workbench Dictionary of protein secondary structure-pattern-recognition of hydrogen-bonded and geometrical features. Biopolymers Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics A solution for the best rotation to relate two sets of vectors FREAD revisited: accurate loop structure prediction using a database search algorithm Dareusloop: a web server to model multiple loops in homology models A statistical method for evaluating systematic relationships Optimized atomic statistical potentials: assessment of protein interfaces and loops Sabpred: a structure-based antibody prediction server Dynamics of the ACE2-SARS-CoV-2/SARS-CoV spike protein interface reveal unique mechanisms Making sense of mutation: what D614G means for the COVID-19 pandemic remains unclear SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity SARS-CoV-2 and bat ratg13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects The VSGB 2.0 model: a next generation energy model for high resolution protein structure modeling Molecular dynamics analysis of a flexible loop at the binding interface of the SARS-CoV-2 spike protein receptor-binding domain Pisces: a protein sequence culling server SUPPORTING INFORMATION Additional supporting information may be found in the online version of the article at the publisher's website. How to cite this article: Wong SWK, Liu Z. Conformational variability of loops in the SARS-CoV-2 spike protein