key: cord-0998234-656bia83 authors: Zhang, Kaiming; Pintilie, Grigore D.; Li, Shanshan; Schmid, Michael F.; Chiu, Wah title: Resolving individual atoms of protein complex by cryo-electron microscopy date: 2020-11-02 journal: Cell Res DOI: 10.1038/s41422-020-00432-2 sha: 55b0357f9f86a9952458197d5c6a65a595612667 doc_id: 998234 cord_uid: 656bia83 nan 3 range from -0.35 to -1.3 μm, and "rlnCtfMaxResolution < 4.5". For the Falcon 4 dataset, 5, 427 micrographs were selected with a defocus range from -0.3 to -1.3 μm, and "rlnCtfMaxResolution < 4". All particles were autopicked using the NeuralNet option in EMAN2 2 . Then, particle coordinates were imported to Relion, where the poor 2D class averages were removed by two rounds of 2D classification. The initial models for both datasets were built in cryoSPARC 3 using the ab-initio reconstruction option with octahedral symmetry applied. For the K3 dataset, 1,176,336 particles were picked and 902,455 were selected after 2D classification. For the Falcon 4 dataset, 707,350 particles were picked and 500,643 were selected after 2D classification. The 3D refinement was performed using the particle images selected from 2D classification with further "CTF refinement and Bayesian polishing" in Relion. A 1.34 Å resolution map from the K3 dataset and a 1.36 Å resolution map from the Falcon 4 dataset were obtained (Supplementary information, Segger (v.2.5) was used to first segment the 1.34 Å resolution cryo-EM map into regions corresponding each of the 24 protein subunits (using Group by Connectivity, 30 steps, at a threshold of 0.01) 5 . The X-ray structure of human apoferritin (PDB:3ajo 6 ) was fitted to one of the segments using the Fit to Segments dialog. The structure was then refined using phenix.real_space_refine 7 . It was visually inspected in Chimera 4 residue by residue to ensure a proper fit. No problems were seen in the backbone. Some side chains however did not fit into the observed density and were re-modeled to fit properly using the Rotamers dialog in Chimera. For several residues the map showed more than one possible rotamer, and alternate conformations were added using the same dialog. The resulting structure was then refined one more time with phenix.real_space_refine to allow the re-modeled side chains to be adjusted further into the density. The same process was performed for the 1.36 Å resolution map, segmenting at a threshold of 0.03 (also with 30 steps of grouping by connectivity), and fitting the model refined into the 1.34 Å resolution map. On visual inspection, all residues appeared to fit very well to the map and no manual adjustments were needed. The phenix.real_space_refine procedure however was applied and the model changed slightly into the 1.36 Å resolution structure. 4 Q-score adjustment Q-scores are calculated by correlating map values around each atom to a "reference Gaussian". In our previous paper describing the Q-score 8 , the width (sigma) of the reference Gaussian was set to 0.6 Å, which resulted in the maximum Q-score of ~1.0 at a resolution of 1.5 Å. Using this definition, Q-scores start to drop at resolutions higher than 1.5 Å, as atom peaks become sharper than the reference Gaussian. Hence, we adjusted sigma to 0.4 Å, so that Q-scores are now highest at a resolution of ~1.1 Å, and a linear correlation can again be seen between Q-scores and resolutions (Supplementary information, Fig. S6 ). Smaller sigma values will again be required if resolutions continue to increase past 1.1Å. We calculate B' factors from atom Q-scores using the following empirically derived formula: This formulation establishes the relationship that atoms with higher Q-scores produce lower B'factors, as they are better resolved. We determined the best scaling factor, f, by trying several values (0, 50, 100, 200, 300, 400), and observing which value caused the largest increase in the FSC. Since Q-scores correlate to resolution, whereas B-factors do not, as shown previously 8 . We expect that a different factor f will be required at different resolutions. To search for water molecules and ions, we used conceptual criteria described in reference 9 as summarized below: • A "placed water" that clashes (i.e. is closer than the typical hydrogen bond distance) with two or more atoms of the same polarity, and with no non-polars (C) or opposite polars (O and N), is almost certainly an ion. • If the "placed water" clashes (is too close) to negative atoms, it is a positive ion. • If the 'placed water' clashes with positive atoms, it is a negative ion. • A doubly charged ion (e.g. Mg 2+ , Fe 2+ or Zn 2+ ) almost always interacts with at least one fully charged atom (e.g. phosphate or carboxyl O). • A singly charged ion (e.g. Na + ) often interacts with just partial charges (e.g. OH and backbone CO). 5 The above criteria 9 did not describe exact distances. We chose the distance ranges used in our procedure based on observed distances between waters/ions and nearby atoms in high-resolution apoferritin crystal structure (PDB:3ajo 6 ). They were found as follows: • Water atom to nearby polar atoms: 2.8 ± ~0.3 Å. • Ion to nearby charged/polar atoms: 2.2 ± ~0.3 Å. Based on these criteria and observations, we implement the following procedures. The cryo-EM map is first segmented using the watershed method 10 , which produces regions corresponding to peaks in the map. The boundaries between these regions are the lowest values in the map between these peaks. This is basically a peak-finding algorithm. A threshold of 2-sigma above the mean density value in the map is used here, so that the detected peaks are more likely to correspond to signal rather than noise. The resulting regions are then sorted by volume (number of voxels in the region), and considered in decreasing order, For each region, take the point in it with the highest map density value as its position (P). Then, for each nearby atom to P: a. If the atom is non-polar and non-charged (e.g. carbon atom) and is within 2.6 Å of P, P is ignored and the search continues with the next regions. i. If the atom is within a distance of 1.9 Å to 2.5 Å to P, it is added to ChargedAtoms list. ii. If the atom is within 2.5 Å to 3.1 Å to P, it is added to WaterAtoms list. c. If the atom is polar, e.g. O in the backbone, O or N in the side chains that are not typically charged at the experimental pH, and S in Cysteine (Cys), and: i. If the atom is within 1.9 Å to 2.5 Å to P, it is added to PolarIonAtoms list (this distance range is characteristic of an ion). ii. If the atom is within 2.5 Å to 3.1 Å to P, it is added to PolarWaterAtoms list (this distance range is characteristic of a water molecule). d. If the ChargedAtoms list is not empty, P is added as a 2+ ion (e.g. Fe 2+ , Mg 2+ , Zn 2+ ). 6 e. Otherwise, if the PolarIonAtoms list is not empty, P is added as a singly-charged ion (e.g. Na + if close to O atom or Clif close to N atom). f. Otherwise, if the PolarWaterAtoms list is not empty, P is added as a water. To determine what type of double-charged ion (such as Mg 2+ , Zn 2+ , Ca 2+ , Fe 2+ , and Cu 2+ ) to place may not be directly possible from the density, although some efforts have been made in this direction with X-ray data 11 . The decision of types of ions can be made based on other information such as the buffer condition and knowledge of the protein biochemistry with respect to the metal binding if available. The procedure above has been integrated in the Segger plugin to UCSF Chimera (v 2.5 and later). The plugin, code, installation, and running instructions are detailed at the github page 12 . CTFFIND4: Fast and accurate defocus estimation from electron micrographs High resolution single particle refinement in EMAN2 cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination UCSF Chimera--a visualization system for exploratory research and analysis Quantitative analysis of cryo-EM density map segmentation by watershed and scale-space filtering, and fitting of structures by alignment to regions The universal mechanism for iron translocation to the ferroxidase site in ferritin, which is mediated by the well conserved 7 transit site New tools for the analysis and validation of cryo-EM maps and atomic models Measurement of atom resolvability in cryo-EM maps with Q-scores New tools in MolProbity validation: CaBLAM for CryoEM backbone, UnDowser to rethink 'waters,' and NGL Viewer to recapture online 3D graphics Use of watersheds in contour detection Automated identification of elemental ions in macromolecular crystal structures Segger v2