key: cord-0871377-l1r1h8pu
authors: Croll, Tristan I.; Williams, Christopher J.; Chen, Vincent B.; Richardson, David C.; Richardson, Jane S.
title: Improving SARS-CoV-2 Structures: Peer Review by Early Coordinate Release
date: 2021-01-16
journal: Biophys J
DOI: 10.1016/j.bpj.2020.12.029
sha: 14691d805e263d0acec078097d528e3b3dd16789
doc_id: 871377
cord_uid: l1r1h8pu

This work builds upon the record-breaking speed and generous immediate release of new experimental 3D structures of the SARS-CoV-2 proteins and complexes, which are crucial to downstream vaccine and drug development. We have surveyed those structures to catch the occasional errors that could be significant for those important uses and for which we were able to provide demonstrably higher-accuracy corrections. This process relied on new validation and correction methods such as CaBLAM and ISOLDE, not yet in routine use. We found such important and correctable problems in seven early SARS-CoV-2 structures. Two of the structures were soon superseded by new higher-resolution data, confirming our proposed changes. For the other five, we emailed the depositors a documented and illustrated report, and encouraged them to make the model corrections themselves and use the new option at the worldwide Protein Data Base for depositors to re-version their coordinates without changing the PDB code. This quickly and easily makes the better-accuracy coordinates available to anyone who examines or downloads their structure, even before formal publication. The changes have involved sequence misalignments, incorrect RNA conformations near a bound inhibitor, incorrect metal ligands, and cis-trans or peptide flips that prevent good contact at interaction sites. These improvements have propagated into nearly all related structures done afterward. This process constitutes a new form of highly rigorous peer review, which is actually faster and more strict than standard publication review, because it has access to coordinates and maps; journal peer review would also be strengthened by such access.

In this truly urgent crisis of the COVID-19 pandemic, the worldwide research community has mobilized to provide amazingly rapid understanding of the biology of the SARS-CoV-2 virus and many new paths toward its possible control. As early as February, often in broad collaborations, structural biologists had begun to deposit structures of the proteins and their complexes from the new virus. In a break with tradition, these structures are being released to the public immediately, which in turn greatly speeds downstream research and development.

Early-release structures have not yet gone through all the cross-checks involved in writing and reviewing a formal paper, so it is understandable they will contain somewhat more mistakes. However, most parts of these structures are up to the standards expected for their resolution and local degree of order, and often the overall molecular arrangement can provide quite unexpected and valuable new insights. For instance, in 6w41 an antibody that blocks spike binding to the ACE2 receptor interacts with a non-overlapping part of the spike protein's surface (1) , and in 6zm7 the viral Nsp1 protein inhibits a cell's antiviral defenses by stuffing itself into the messenger RNA channel of human ribosomes to prevent synthesis of the defense proteins (2) .

In contrast, a more detailed and nuanced use of a structure, such as creating or modifying highspecificity binding molecules to produce an effective drug or vaccine, can be chancy from earlyrelease structures and benefits greatly from the best feasible accuracy of conformation and atom placement in the relevant contact area. Therefore, a number of groups that specialize in validating and correcting 3D macromolecular structures have been concentrating on the new SARS-CoV-2 depositions. Andrea Thorn has gathered an extremely broad set of experts to form the Coronavirus Structural Task Force (CSTF). Their website (http://github.com/thornlab/coronavirus_structural_task_force/) brings together a variety of information on all the hundreds of SARS-CoV-2 and related structures, validation reports from several programs (now including MolProbity), rebuilt models from several sources, and information about the virus biology as outreach to the public (3). The http://covid-19.bioreproducibility.org/ website hosts rebuilt structures with a concentration on the important aspect of bound ligands (4) . The PDB-Redo site http://www.cmbi.ru.nl/pdb_redo/ has for a number of years routinely done re-refinement and automated local corrections for all PDB entries, and that of course continues for the new SARS-CoV-2 structures (5) . These, and ours, are probably not the only such efforts.

The authors of this paper have worked more behind the scenes, to get the clearest and most important corrections to SARS-CoV-2 models updated directly in the PDB by the depositors themselves without changing PDB code, possible since the new versioning system announced in the PDB News item of 7/31/2019. Now that most formal publications based on those re-versioned structures are out, we are here describing our strategies and the available but not yet mainstream methods that made this possible, much of it visual and interactive.

Structures of SARS-CoV-2 macromolecules were identified by searches at the RCSB or PDBe sites of the worldwide Protein Data Bank (wwPDB; 6), and by entries on the CSTF website (3) . Because other groups are concentrating on the bound ligands, and because our expertise is in conformational analysis of protein and RNA 3D structures, we prioritized cases where those conformations and binding surfaces are likely to matter for understanding the virus biology and host interactions or for drug and vaccine design. Coordinates and density maps were downloaded from the PDB or the Electron Microscopy Data Base (EMDB; 7). For crystal structures we used 2mF obs -dF calc and F obs -F calc difference maps, and for cryoEM structures we used the primary map and only occasionally a focused map. In KiNG, usually two interactively adjusted contour levels were visualized, the lower in gray and the higher in black. In ISOLDE the original map is shown as a transparent surface, sometimes with a wireframe overlay of a map de-sharpened by B=+50.

The SARS-CoV-2 structures were surveyed for possible problems in two complementary ways: by running MolProbity validation (8) , and by initial behavior when starting up molecular dynamics in ISOLDE (9) . Both those validations are automated, as generally true for validations; the only exceptions here were the Zn ++ for Clin 6vy0 and the half-occupancy remdesivir in 7bv2. A central, and still unique, aspect of MolProbity is all-atom contact analysis, which uses the Reduce program (10) to add and optimize H atoms and then the Probe program (11) to measure the non-pairwise surface contacts between all atoms in the model. It outputs an overall "clashscore" evaluation and, most importantly for this application, it provides quantitative data and visual markup for local Hbonds, van der Waals contacts and serious clashes (defined as overlaps ≥0.4 Å). For RNA, MolProbity provides criteria for ribose pucker and backbone conformers (12) . MolProbity's traditional validations include outliers in bond lengths and angles, Ramachandran values, and sidechain rotamers. These are still extremely effective at resolutions better than about 2.5 Å and do flag problems whenever they occur, but at lower resolutions they are very often not seen because they have been tightly restrained in order to achieve stable, convergent refinement, usually without fixing the underlying problems (13) .

Because tight restraints to traditional validation criteria have destroyed their usefulness at resolutions poorer than 2.5 Å, new validation criteria are badly needed that can still provide meaningful assessment in that regime. Since model building does not yet use Bayesian likelihood to trade off conformational probability with density fit, very rare conformations can be greatly overused at low resolutions or in regions of poor density. For instance, for the case of cis-nonPro peptides (which occur genuinely in only 1 out of 3000 residues), that problem was of epidemic proportions for about 10 years (14) ; they are now strongly flagged in MolProbity (15) and elsewhere, and unjustifiable cis-nonPro are back to much lower levels.

So far, the most generally applicable MolProbity tool for 2.5-4 Å is CaBLAM (8, 16) , which uses Cα virtual angles to determine a robust backbone trace and then a virtual angle between successive backbone CO bond directions to find where peptide orientations are not compatible with the local Cα-trace. CaBLAM flags incorrect peptide orientations even when Ramachandran outliers have been refined away, and in the recent CryoEM Model Challenge the CaBLAM score was found to have a higher correlation with match-to-target than any other criterion (13) . In development is RNAprecis, a criterion to improve both modeling and validation of full-detail RNA conformations, using features visible even at 3.5 Å.

Corrections to outliers are almost always done by a visual, interactive combination of user-driven control of computational procedures. We examined as many outliers as feasible, prioritizing them in two ways. First: outliers in important areas such as active sites, bound ions or ligands, between chains or molecules, or where known conformational changes occur. Second: outliers where prior probability plus local map, fit, and contact quality are sufficient to distinguish clearly between specific proposed alternative interpretations. At 3-4 Å resolution, especially in large structures, we have found there are three tiers of certainty vs uncertainty: 1) In the best parts, usually the central core, the map is usually clear enough to determine an unambiguous model fitting, with only occasional definable errors; 2) There are always some mobile regions with such low local resolution that they show density but do not determine a single model, where in most cases no alternative can be reliably judged as best; 3) In between those extremes, we concentrate on identifying and correcting problems where we can clearly document by multiple criteria that the suggested corrections are genuine improvements.

Much of the examination was done in KiNG interactive graphics (17) , which shows model, map, and all MolProbity markup, is very good at sidechain correction coupled with subtle "backrub" backbone shifts (18) , and can make limited further backbone changes. Coot (19) could quite often correct CaBLAM outliers, although it does not yet display their markup. The most general system for interactive correction was ISOLDE, described below. After a model had been corrected, it was briefly re-refined in PHENIX (20) . Efforts are underway to automate correcting at least some classes of lower-resolution outliers, but so far they have not succeeded nearly as well as manual correction.

Rebuilding in ISOLDE is accomplished via repeated local interactive molecular dynamics simulations biased by the experimental density map. ISOLDE runs as a plug-in to ChimeraX (21), using a molecular-dynamics flexible fitting (22) approach. Each simulation is typically on the scale of a few dozen to a few hundred amino acid residues -large enough to remodel a problem region, but small enough to support simulation speeds sufficient for interaction. Remodeling is accomplished via the combination of direct user tugging with scripted tools for common tasks such as cis-trans change of peptide geometry, flipping of peptide orientation, adjusting of rotamers, or shifting a selected stretch in sequence register. The Ramachandran and rotamer quality of each residue is marked up in real time as the model evolves; restraints are not used for Ramachandran or rotamers, with the rare exception of individual rotamer restraints applied by the user on a case-by-case basis. As is typical of molecular dynamics simulations, for a model settled in ISOLDE the clashscore within an individual asymmetric unit is always close to zero (although severe clashes with symmetry neighbors remain possible) due to the explicit modeling of the van der Waals potential. Clashing atoms are instead pushed out of density, usually leading to easier diagnosis and correction of the underlying problem. Each model was inspected and remodeled residue-by-residue at least once, end-to-end in overlapping simulations in ISOLDE. Crystal structures were then refined in phenix.refine to obtain the benefits of phase optimization, with the model acting as its own reference for the purpose of torsion restraints. For cryoEM structures a second map smoothed with B+50 was sometimes overlaid in wireframe; the simulations feel contributions from both maps, which can improve convergence. For lower resolution (>3 Å) datasets the resulting models were rebuilt and re-refined 1-2 more times. The most significant changes were noted and prioritized by at least two different people.

When important local errors were identified and convincingly corrected, we emailed the depositor with explanations and illustrations of those changes. We included a revised coordinate file, but encouraged them to make and confirm the changes themselves, and to use the new wwPDB versioning system to update their deposited structure quickly and easily, usually in these cases before formal publication (see the last section in Results).

As well as the revised coordinate files from ISOLDE that are posted on the CSTF website, our team has worked with the CSTF to provide simplified but complete MolProbity validation output for all the SARS-related structures. This was enabled by revising MolProbity's Ramachandran and Cbeta deviation PDF outputs to work better with very large structures, and fitting the overall information into file-size limits on the GitHub site. Figure 8 was made in ISOLDE by Tristan Croll, to represent a closeup of how problems were seen and corrected in that highly complex and real-time interactive environment. The rest were made in KiNG by the Richardsons, where CaBLAM outlier corrections can be shown and made and which has more facilities for 2D static presentation graphics at a variety of scales.

We use a convention for PDB codes that prevents 1, l, I or O, 0 ambiguity in any font: letters in lower case except for L, as in 6yLa.

Our first example from SARS CoV-2 was file 6vyo (23), deposited on February 27, 2020 and first released on March 11. It is a 1.7 Å x-ray structure of the tetrameric RNA-binding phosphoprotein of the internal nucleocapsid that holds the viral genome. Overall, it is an excellent structure, with a highly interpretable map and very few validation outliers. But visual inspection of model and map in KiNG, starting at the important zinc site in each subunit, showed a chemically implausible, partiallyoccupied second Zn ++ only 2.2 Å from the primary Zn ++ and positioned as one of its 4 tetrahedral ligands, all with clean electron density (see stereo Figure 1a ). This second ion has the wrong charge for its position. Fortunately, crystallization conditions were reported in the PDB file to include ZnCl 2 , so almost certainly these secondary sites are full-occupancy Cl -. This incorrect atom identity in the 6vyo model was presumably an accidental oversight, and is very straightforward to correct.

The Richardsons emailed Andrzej Joachimiak, the depositor of record, on March 24, describing the problem and the easy route for him to re-version the coordinates at the wwPDB. He replied the next day, saying that he agreed and would change it, and the new version 2.0 was released on April 8, less than 2 weeks later. Since then, anyone who downloads 6vyo from any wwPDB site automatically gets the improved coordinates. Along with the PDB's just-in-time depositor-initiated re-versioning system instituted last fall, the complete version history is now made obvious, as seen for instance in Figure 1b , at bottom right of the 6vyo RCSB-PDB web page at (http:/www.rcsb.org/pdb).

This inadvertent error was the structural equivalent of a "typo", but one that changed the meaning in an important location. It is a rare and unexpected type of error not tested by automated structure validation or fixed by refinement or by PDB-Redo (5) .

As well as our visual inspection, we later learned that the interactive MD of Tristan Croll's ISOLDE program had also strongly flagged this problem, and Croll had posted his revised structure on the CSTF website. Going forward, he joined our collaborative group, which has so far resulted in the work described here.

Our second SARS-CoV-2 example was 6w41 at 3.08 Å, an antibody bound to the spike protein's RBD (1) . Surprisingly, although the antibody is a nanomolar binder and prevents ACE2 binding in vitro, it interacts with a non-overlapping part of the RBD.

The core of that interface is a 6-residue edge β-strand, its center disulfide-linked to the neighbor strand in the sheet (Figure 2a) . Such a link, called an SS staple, has only one possible conformation, with -, -, +, -, -dihedral angles (24) . The 6w41 model has a highly strained SS conformation with clashing Hα's and t, 120°, -, 0°, -dihedrals, two of which are eclipsed. That SS seems to distort the edge β-strand, resulting in two flipped peptides, 6 bad clashes, and very poor contact between the RBD and Fab, with only one H-bond and sparse van der Waals contact. In addition, several of the NAG carbohydrates were fit backward at the bond to the protein.

Our rebuilt version (Figure 2b ) has the standard SS staple conformation and corrects the peptides to give H-bonds instead of clashes. Trp33 H-chain, seen edge-on at right, is buried in the contact but too far from the RBD to satisfy its ring N7 (Ne1) atom. We modeled a water (small orange ball), not seen at this resolution but in position to bridge the gap with 4 tetrahedral H-bonds, made some rotamer changes, and added several ions at the surface. We contacted the authors, but they were already close to finishing a new structure at 2.42 Å: 6yLa (1) . As shown in Figure 2c , it confirmed all the major corrections: the SS staple conformation and the correctly oriented, H-bonding peptides, and even showed a clear positive difference peak at the proposed water position.

In another case also, better new data soon became available that corrected the major problems -always the preferred outcome. 6w9c (25) at 2.7 Å is the papain-like protease of SARS-Cov-2. The long arms of the trimer end in zinc finger domains important to one phase of the activity, but their cryoEM density is very poorly resolved. The 3 Zn sites were modeled independently, with ligands missing, misoriented, or even SS-linked. We could improve the model somewhat, and another team of the CS Task Force reprocessed the data and also improved the model somewhat (Croll 2020), but the new mutant structure at 1.6 Å (6wrh; 26) really solved the problem satisfactorily.

RNA-dependent RNA polymerase: Nsp12/Nsp8/Nsp7 complex The Nsp12 RNA polymerase, with its helper proteins Nsp7 and 8, is essential for replication of the SARS-CoV-2 viral genome. The first structures of this complex were 6m71 at 2.9 Å and 7btf at 2.95 Å by cryoEM (27) . We chose to work most intensively on 7btf because inclusion of DTT prevented SS formation and preserved the biological Zn sites; however, most of the same problems also occur in the 6m71 and 7bv1 structures of this complex.

This problem was discovered and rebuilt in ISOLDE, and was confirmed using MolProbity's CaBLAM and all-atom-contact functions (8) along with examining by eye the fit of model to map. The chain B N-terminal dozen visible residues are misaligned by +1 until joining correctly at Lys 79 in the first helix. Two CaBLAM outliers (magenta) and a Cα-geometry outlier (red) flag the unlikely backbone conformation caused by squeezing in the extra residue, as shown in Figure 3a . CaBLAM flags peptide CO orientations not compatible with the local Cα trace (16) . Figure 3b shows the rebuilt section in correct sequence register, with no CaBLAM outliers, a normal helix N-cap, and much better H-bonding and contact with Nsp12. Backbone fit is a bit better in the corrected version, especially at the helix start, but local map density is rather low and patchy. Sequence in the misaligned section (MTQMYKQARSED K79) has no Trp or Gly and is nearly all midsize mobile polars, so that sidechain fit is not very diagnostic. The chain B N-terminus is now known to fold into a long helical extension of each Nsp8 copy, when there is a long RNA transcript they can stabilize, as happens in the later 6yyt (28) .

7btf also has the 9-residue sequence shift in Nsp12 described below for 7bv2, presumably inherited from the earlier 6nur/6nus SARS CoV structures at 3.1 Å (29).

ISOLDE does not yet look at CaBLAM outliers explicitly, but it made a number of peptide "flips" (rotations of 90° -180°), which are usually associated with CaBLAM outliers. Figure 3c shows an especially clear CaBLAM diagnosis in the Nsp12 chain, at the end of the A 717-734 helix. The problem is flagged by two CaBLAM outliers and a CaBLAM Cα-geometry outlier. A peptide flip of the 733 CO (red ball on the O) corrects all 3 outliers, makes an additional H-bond at the helix C-cap, and fits the density somewhat better (Fig. 3d) . Most peptide-flip CaBLAM corrections in the rebuilt structure are clear improvements. However, flip corrections attempted in broad, low, or patchy density are often ambiguous as to which version (or both, or neither) is preferable, and so potential changes were seldom made in the new version.

A thought for the future It seems from the 7btf Ramachandran plot that φ,ψ values were restrained in refinement (diagnosed by too many points along the cyan contour separating favored from allowed values, and a near-complete vertical cutoff at φ -60°). That helps keep refinement from diverging and for instance progressively distorting good secondary structures. It gives artificially good traditional Ramachandran scores, but actually makes many of the conformations worse rather than better by pulling them into the wrong local minimum. This problem happens because the bumps for peptide CO oxygens disappear into the tube of backbone density somewhere between 2.5 and 3 Å resolution, so that badly incorrect peptide orientations are the most common type of misfitting by ≥3 Å (16) . For each backward peptide the preceding ψ and the following φ are very incorrect, so each of those Ramachandran points is usually close to the wrong local minimum. Figure 4 shows this for two cases of 7btf CaBLAM outliers: at the B 76 awkward return to correct sequence register (Fig. 3a) and at the A 733 helix terminus (Fig. 3c) . The points for 7btf are in red, always very close to the favored contour (cyan), with a green arrow pointing to the better rebuilt answer, always in a quite different part of the Ramachandran plot. The preferred strategy (not always possible in rushed circumstances like the present) is to model regular secondary structures initially, to fix as many CaBLAM outliers as feasible before refinement, and then to restrain H-bonds rather than Ramachandran values.

A re-implemented tool called the Rama-Z score (30) is sensitive, especially for very large structures, not only to those with many Ramachandran outliers, but also to structures refined with simplistic Ramachandran restraints, a very useful diagnostic. However, refinements are now beginning to apply the same criteria as in Rama-Z; that enables them to score well on Rama-Z even with restraints applied, but unfortunately still makes the underlying problems worse rather than better unless they have been fixed beforehand.

The cryoEM structure 7bv2 has 50 nucleotides of primer-template RNA helix bound, along with the potentially-therapeutic Remdesivir inhibitor (31) . The resolution is unusually good at 2.5 Å, and the accompanying 7bv1 apo structure at 2.8 Å provides a close comparison. We, and many others, greatly appreciate the rapid deposition of these important structures and their maps. The 2.5 Å resolution provides quite clear density for both backbone and sidechains, especially in the central core of the particle. At the other extreme, as typical, for some of the outer regions and chain termini the map density is so weak, patchy, confusing, or missing altogether that it does not effectively determine a most-probable conformation. In between, however, are regions where local mistakes can happen that are reliably correctable on close analysis.

Most of the RNA in 7bv2 forms a regular A-form double helix with very strong basepair density. However, as shown in Figure 5a , template-strand (T) nucleotides 17-19 are modeled with !! outlier backbone conformers (12) , the T A18 base is in the unusual syn orientation with clashes, and all three fit quite poorly to the clear density. This problem was corrected during the rebuild in ISOLDE, and was confirmed using MolProbity's ribose pucker, RNA-suite conformer, and all-atom-contact functions (8) along with examining by eye the fit of model to map. For the rebuild in Figure 5b , the density fit is excellent in backbone conformers A-form 1a (and a close 1c) with no clashes and better basepair H-bonding. At T 20 and above, the RNA helix makes no protein contacts, the density rapidly deteriorates, and neither model is convincing. Probably that part is mobile and no one conformation fits the fragmented map. However, the primer and template strands are entirely complementary, with a G•C pair at the far end, and they will not have unfavorable conformations where there are no contacts to force strain. Ideally, they would be modeled as 2 or 3 copies of A-form, gently bending or twisting. The later 6yyt structure (28) has now shown that longer RNA product adopts very regular A-form, stabilized by long α-helical extensions folded from the two Nsp8 N-termini.

From the T 16 -P 15 basepair down to the active site in 7bv2, both protein and RNA look very good, until the single-stranded end of the template, where the U9 to U10 suite is clearly quite extended but the T 9 base and all of T 8 are largely disordered.

The Remdesivir is well stacked and base-paired in the ligated product monophosphate form, as modeled ( Figure 6 ). It fits the density well, but that density is only strong enough to account for about half occupancy, which implies that only about half of the cryoEM particles have remdesivir covalently bound. It is therefore not surprising that the adjacent active-site space has very low, patchy density that presumably represents some mixture of ligated and unligated states. The modeled Mg ions, pyrophosphate, and waters may be part of that mixture, but not at high occupancy in any one position. We made one clear Mg-to-water correction at Mg A1006, but could not produce a clean model in the active site.

This isolated stretch of model lies between the C-terminus and an unmodeled gap following A 895. The misalignment problem has been inherited from 6nur through 7btf and now to 7bv1 and 7bv2. In each case, it extends across however much of the fragment was modeled. The potential sequence contains 4 large aromatics (Tyr 915, Trp 916, Phe 920, & Tyr 921) whose fit to their sidechain density is highly diagnostic. In these misaligned regions, Met 906, Leu 907, Asn 911, & Thr 912 are much too small for their clear, connected sidechain densities, which can be beautifully filled by the 4 aromatics. Figure 7 shows both the proximity of the RNA (green) and the badly-filled aromatic sidechain densities. In 7bv2 at left, the line of 3 sidechains up the center is Tyr 915 which is a bit too big and the wrong shape for that density, Asn 911 which is too small, and Met 906 which doesn't get into the density at all because of a sidechain-backbone switch around its Cα (blue ball on the backbone N), which prevents fitting of a few earlier residues with density. In the rebuilt model at right, Met 924, Phe 920, and Tyr 915 fit perfectly. In this view, the end of Trp 916 fits the density in the top right corner.

Unfortunately, current real-space correlation measures are sensitive only to atoms with no density, not to density with no atoms, so they do not detect sequence misalignments well. Since the sequence does not go back into register at either end of the offset, there are no awkward backbone compensations for CaBLAM to detect, either. However, by visual examination, the -9 shift is unambiguous once considered as a possibility. This is worth correcting because it moves residues by extremely large distances, and also because 3 of its residues are at the interface with RNA.

Only about 1 in 3000 non-proline peptides are cis (15) , and indeed they should probably never be fitted that way at 2.5 Å and never at 3 Å, unless known from other data. But about 5% of prolines are cis, so they are relatively common, and the Pro ring makes the distinction much more evident in the map density. It seems that the modeling process used for 7bv2 went overboard and forced all peptides to be trans, not just all non-Pro. That was the wrong answer in two cases. Pro B 183 (Figure 8a ) is especially bad as trans. The Pro itself is a poor fit to density (top center) and has both a CaBLAM outlier, which indicates a peptide orientation incompatible with the local Cα trace (8) , and a Cα-geometry outlier. Most tellingly, the preceding residue is distorted so much that its Trp sidechain cannot get anywhere near its gorgeous, unoccupied density (bottom center) and was fit as just a Cβ stub. When the Pro is changed to cis in ISOLDE and the conformation relaxed, then the Trp slides easily into that classic sidechain density (Fig. 8b) .

Our rebuild made a number of peptide flips in 7bv2 (rotations of more than 90°), almost all associated with CaBLAM outliers. Figure 9a shows a clear example in chain B (Nsp8), in a betahairpin loop. The problem is flagged by 2 successive CaBLAM outliers. In Fig. 9b a peptide flip of the central 161 CO (red ball on the O) corrects both outliers, fits the density somewhat better, forms a tight turn, and makes 4 more H-bonds, 2 across the turn and 2 that bridge to chain C (Nsp7, at top).

Peptide flips and rotamer changes matter most if they are in important places such as near the active site or in a chain-chain interface, but should always be corrected if the new version is unambiguously better. However, one should remember that CaBLAM outliers are declared at a score contour level that excludes 1% of the quality-filtered reference data, so as many as 1% of the outliers may in fact be correct. A possible example is the Gly A 678 CaBLAM Cα-geometry outlier near the active site; it is in an unusual Pro-Gly-Gly sequence, seems to be fit correctly, and has tight local contacts that prevent building it differently. Since it immediately precedes Thr 680 in the active-site area, it might be one of the cases that conserves a less favorable but genuine outlier conformation because it better supports biological function.

Depositor re-versioning as an efficient route to improved structures Before the PDB News announcements of 7/31/2019 and 2/18/2020, if the "depositor of record" (the PI) for a structure later found a need to change its atomic coordinates, sequence, or chemical description (but still from the same data), they had to obsolete it by an updated model with a new PDB code. Understandably, that process was only invoked for really serious reasons. Now there is an archival versioning system for PDB codes that jumps by 1.0 for major changes as above (depositor-initiated) and by 0.1 for formatting or other minor changes (usually done by the wwPDB itself). Figure 1b shows this in the Revision History for 6vyo, where version 2.0 was the Zn-to-Cl ligand identity change. Previous versions of a structure can be accessed from a separate, versioned FTP archive at the PDB. This new process has been invaluable for immediate availability of new SARS-CoV-2 structures, allowing further checkouts and changes to proceed easily and to propagate immediately into new database downloads even before publication.

As well as providing our own rebuilt models on the CSTF site (similar to what is done by PDB-Redo, the covid-19.bioreproducibility.org site, and probably others), we have taken advantage of the wwPDB re-versioning system to alert depositors to the few most urgent and clear changes in their SARS-CoV-2 structures, encouraging them to make and confirm those changes themselves in their own model, for rapid deposit of an improved major version with the same PDB code. This process has typically taken only somewhat over a month between initial and re-versioned releases (about 2 weeks for us to find and convincingly document problems and 2 weeks for depositor change and version release). Besides direct responses to our emails such as for 6vyo, 7bv1, and 7bv2, information about major changes such as sequence misalignments sometimes propagates via the grapevine, such as for 6m71 and 7btf. Once an early structure has been re-versioned, later structures of the same molecule will usually start from the improved model, whether solved by the same or by different groups (Table 1) . For the RNA polymerase complex, that was true for 5 of 6 newer structures: explicitly stated in the papers for 6yyt (28) and for 7bzf and 7c2k (32) , true also for 7ctt (33) and 7aap (34) but not true for 6xqb (35) .

In response to the pandemic, structural biologists worldwide have responded with unprecedented speed to solve the structures of SARS-CoV-2 proteins and complexes, and have broken precedent to deposit and release those structures immediately for the benefit of further COVID-19 research and development. There has been some criticism of these early releases and of the posting of COVID-19related research on preprint servers such as bioRxiv, as undesirable shortcutting of the peer review system. We believe these complaints have missed the very positive aspects of what is actually going on. Most of these initial releases and preprints will eventually go through the standard peer-review process to achieve formal publication, and many have by now already done so. In the meantime, they have gone through a much stricter scrutiny and review than normally possible, done by the entire community. That is exemplified by the work described in this paper, which has also rippled into improving related later structures.

Immediate coordinate release of an initial, preliminary model seems desirable only in urgent circumstances like the present --normally, the depositors themselves should thoroughly check out their own structures. Release after checkout is certainly possible and has advantages, but would probably only seldom attract community validation and correction.

Perhaps the most important take-away message from our work is that coordinates and density maps need to be, and should be, provided to reviewers in the structured environment of standard peer review, where they would be enormously helpful to the review, and where any misuse of that information could be documented and censured. In current peer review for journals, only a validation report is supplied to the reviewer, not the coordinates or the map. Therefore they cannot judge, as we have often been able to do here, whether an outlier is actually wrong or genuine and how much it matters to the reported conclusions. In standard peer review, sometimes the validation report prompts a request for more qualified wording of specific conclusions, but coordinates are almost never changed (we know of only one case, in which model and data were actually available to the reviewer). The most effective way to initiate this paradigm change of providing reviewers with coordinates and maps is for structural biologists, in our role as reviewers, to routinely request those data as a condition for doing the review.

The results presented here also demonstrate the great value of the wwPDB's new archival versioning system, which enables the depositor of record to update to an improved model without changing the PDB code. This just-in-time facility has been invaluable for the SARS-CoV-2 structures, and going forward it will encourage a general improvement in the accuracy of the database for the benefit of all users and uses, including more leisurely retroactive versioning as well as for urgent early releases.

A few of the early-release SARS-CoV-2 structures were accompanied by a short initial writeup as a bioRxiv preprint, such as for 6m71 and 7btf (36) ; that preprint explained their strategy on the issue of disulfides vs Zn sites that would otherwise have seemed like an error. Those preprints were very useful and missed when absent, and we would strongly encourage their provision for early releases, as well as for depositions where formal publication is not planned. Conversely, preprint posting of a structure report should always be accompanied by deposition and release of coordinates and map(s), so that potential users of the information can fully check validity.

At a more specific and detailed level, we hope this work provides convincing evidence that, on the negative side, at the current state of the art there are local model errors even in generally-excellent xray or cryoEM structures that cannot be fixed by downhill refinement. But also in compensation on the positive side, most of those errors can be located even in very large structures by tools such as CaBLAM, and are very often tractable to user-guided correction in rebuilding systems such as KiNG, Coot, Chimera, or ISOLDE. The cases treated in detail here can serve as guidance in strategies for diagnosis, identifying alternative fittings, and testing whether an alternative is a clear improvement.

TIC performed the model optimizations in ISOLDE, and JSR made and evaluated corrections of CaBLAM, RNA, and other MolProbity outliers. CJW and VBC interacted closely with the CSTF and modified MolProbity and KiNG output to meet their website needs. JSR, DCR, and TIC wrote and illustrated the emails to depositors and the initial draft of the paper. All authors reviewed, edited, and approved the paper in its submitted and revised forms.

This work was supported by National Institutes of Health grants GM R35-131883 to DCR, P01-063210 Project IV to JSR, and Wellcome Trust Principal Research Fellowship 209407/Z/17/Z to Randy Read supporting TIC. We greatly appreciate Andrea Thorn and Gianluca Santoni for the Coronavirus Structural Task Force website's hosting of the condensed MolProbity results and the ISOLDE-rebuilt coordinates (https://github.com/thorn-lab/coronavirus_structural_task_force), the worldwide Protein DataBank's recent establishment of depositor-initiated re-versioning of PDB codes, and most especially the depositors for solving and releasing these important structures and then taking advantage of PDB re-versioning to make even better models available to everyone. 

To be published. Potent antibody binding to an unexpected highly conserved cryptic epitope of the SARS-CoV-2 spike

Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2

Making the invisible enemy visible

Ligandguided assessment of SARS-CoV-2 drug target models in the Protein Data Bank

The PDB_REDO server for macromolecular structure model optimization

Announcing the worldwide Protein Data Bank

EMDataBank.org: unified data resource for CryoEM

MolProbity: More and better reference data for improved all-atom structure validation

ISOLDE: a physically realistic environment for model building into lowresolution electron-density maps

Visualizing and quantitating molecular goodness-of-fit: Small-probe contact dots with explicit hydrogen atoms

Asparagine and Glutamine: Using Hydrogen Atom Contacts in the Choice of Side-chain Amide Orientation

RNA Backbone: Consensus all-angle conformers and modular string nomenclature

Outcomes of the 2019 EMDataResource model challenge: validation of cryo-EM models at nearatomic resolution

The rate of cis-trans conformation errors is increasing in low-resolution crystal structures

Cis-nonPro peptides: Genuine occurrences and their functional roles. bioRxiv

New Tools in MolProbity Validation: CaBLAM for cryoEM backbone, UnDowser to rethink "waters", and NGL Viewer to recapture online 3D graphics

KiNG (Kinemage, Next Generation): A versatile interactive molecular and scientific visualization program

The backrub motion: How protein backbone shrugs when a sidechain dances

Features and development of Coot

Macromolecular structure determination using X-rays, neutrons, and electrons: Recent developments in Phenix

UCSF ChimeraX: Meeting modern challenges in visualization and analysis

Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics

To be published. Crystal structure of RNA binding domain of nucleocapsid phosphoprotein from SARS coronavirus 2

The disulphide β-cross: from cystine geometry and clustering to classification of small disulphide-rich protein folds

To be published. The crystal structure of papain-like protease of SARS CoV-2

To be published. The crystal structure of papain-like protease of SARS CoV-2, C111S mutant

Structure of the RNA-dependent RNA polymerase from COVID-19 virus

Structure of replicating SARS-CoV-2 polymerase

Structure of the SARS-CoV nsp12 polymerase bound to nsp7 and nsp8 co-factors

A global Ramachandran score identifies protein structures with unlikely stereochemistry

Structural basis for inhibition of the RNA-dependent RNA polymerase from SARA-CoV-2 by remdesivir

Structural basis for RNA replication by the SARS-CoV-2 polymerase

To be published. Cryo-EM structure of favipiravir bound to replicating polymerase complex of SARS-CoV-2 in the pre-catalytic state

To be published. Nsp7-Nsp8-Nsp12 SRAR-CoV-2 RNA-dependent RNA polymerase in complex with template:primer dRNA and favapiravir-RTP

To be published. Structure of SARS-CoV-2 RdRp/RNA complex at 3.4 Ångstrom

Structure of RNA-dependent RNA polymerase from 2019-nCoV, a major antiviral drug target