key: cord-0925503-n5wpoxe3 authors: Slavin, Moriya; Zamel, Joanna; Zohar, Keren; Eliyahu, Siona; Braitbard, Merav; Brielle, Esther; Baraz, Leah; Stolovich-Rain, Miri; Friedman, Ahuva; Wolf, Dana G; Rouvinski, Alexander; Linial, Michal; Schneidman-Duhovny, Dina; Kalisman, Nir title: Targeted in situ cross-linking mass spectrometry and integrative modeling reveal the architectures of Nsp1, Nsp2, and Nucleocapsid proteins from SARS-CoV-2 date: 2021-02-04 journal: bioRxiv DOI: 10.1101/2021.02.04.429751 sha: fab19ae5f303b8b1b833ac2b473b98c2d9e7680d doc_id: 925503 cord_uid: n5wpoxe3 Atomic structures of several proteins from the coronavirus family are still partial or unavailable. A possible reason for this gap is the instability of these proteins outside of the cellular context, thereby prompting the use of in-cell approaches. In situ cross-linking and mass spectrometry (in situ CLMS) can provide information on the structures of such proteins as they occur in the intact cell. Here, we applied targeted in situ CLMS to structurally probe Nsp1, Nsp2, and Nucleocapsid (N) proteins from SARS-CoV-2, and obtained cross-link sets with an average density of one cross-link per twenty residues. We then employed integrative modeling that computationally combined the cross-linking data with domain structures to determine full-length atomic models. For the Nsp2, the cross-links report on a complex topology with long-range interactions. Integrative modeling with structural prediction of individual domains by the AlphaFold2 system allowed us to generate a single consistent all-atom model of the full-length Nsp2. The model reveals three putative metal binding sites, and suggests a role for Nsp2 in zinc regulation within the replication-transcription complex. For the N protein, we identified multiple intra- and inter-domain cross-links. Our integrative model of the N dimer demonstrates that it can accommodate three single RNA strands simultaneously, both stereochemically and electrostatically. For the Nsp1, cross-links with the 40S ribosome were highly consistent with recent cryo-EM structures. These results highlight the importance of cellular context for the structural probing of recalcitrant proteins and demonstrate the effectiveness of targeted in situ CLMS and integrative modeling. The genome of SARS-CoV-2 encodes 29 major proteins -Sixteen non-structural proteins (Nsp1- 16) , four structural proteins (S, E, M, N), nine major Orfs, and several additional non-canonical gene products 1 . Once the human cell is infected, the viral proteins are engaged in a network of protein-protein interactions (PPI) that lead to alterations in signalling pathways and to a global shift towards viral protein production 2,3 . Despite significant progress in viral protein structure determination 4 , there are still gaps in the structural knowledge of several proteins from the coronavirus family 5 . For example, no structure is yet available for Nsp2, despite bioinformatics evidence that predicts most of its sequence to be structured. Another example is the Nucleocapsid protein, which is known to form multi-subunit assemblies that were not yet resolved structurally. A possible reason for these difficulties may be the instability of certain viral proteins in the in vitro state. In such cases, purification procedures that are an integral part of mainstream structural approaches (x-ray crystallography, NMR, and cryo-EM) may cause the purified proteins to disassemble, denature, or aggregate. To avoid such artefacts, in situ techniques for structural studies are required. In situ cross-linking and mass spectrometry ( in situ CLMS) allows probing protein structure inside intact cells 6, 7 . In this approach, cells are incubated with a membrane-permeable cross-linking reagent, which reacts with the cellular proteins in their native environment. Following the chemical cross-linking, the cells are lysed and their protein content is analyzed by mass spectrometry (MS). Computational search can then identify from the mass spectrometry data the pairs of residues that were covalently linked. Because a link between two residues reports on their structural proximity, the list of identified links is a rich resource for modeling protein structures and interactions [8] [9] [10] [11] [12] . In situ CLMS had progressed significantly in recent years, with successful applications on isolated organelles [13] [14] [15] [16] [17] , bacteria 18, 19 , human cells [20] [21] [22] , and heart tissue 23 . An inherent difficulty of in situ CLMS is the high complexity of the initial samples that contain the entire cell proteome. Two general strategies have been employed to reduce the complexity prior to the mass spectrometry analysis. One strategy enriches the cross-linked peptides out of the total tryptic digest by either tagging the cross-linker itself 23 or by extensive chromatography 18, 19 . The other strategy is based on the ability to purify a specific protein-of-interest out of the cell lysate prior to digestion. Wang et al . 20, 21 effectively used the second strategy to study the human proteasome by expressing several of its subunits with a biotin tag. We propose the term "targeted in situ CLMS" to describe the latter approach, which allows the user to focus the mass spectrometry resources on a small set of predetermined proteins (targets). In this work we used targeted in situ CLMS and integrative modeling 24, 25 to probe the structures of three SARS-CoV-2 proteins: Nsp1, Nsp2, and the Nucleocapsid protein (N). Our motivation for choosing these proteins is the incomplete available knowledge on their structures and functions. Following cell transfection with the tagged protein, we were able to identify considerable cross-link sets of in situ origin. Computational integration of the cross-links with additional structural information allowed us to build almost complete models for Nsp2 and the N protein. Targeted in situ CLMS to study viral proteins We employed a targeted strategy for in situ CLMS of viral proteins inside intact human cells. To that end, HEK293 cells were transfected with a plasmid of a selected viral protein fused to a Strep tag ( Figure 1 ). We then cross-linked the intact cells with a membrane-permeable cross-linker (either DSS or formaldehyde), washed away the excess cross-linker, lysed the cells, and purified the viral protein via the Strep tag. The purification step greatly enriches and simplifies the sample for mass spectrometry, and increases the subsequent identification rate of cross-links on that proteIn. We focused on three SARS-CoV-2 proteins for which the structural information is either missing or incomplete: Nsp1 (180aa), Nsp2 (639aa), and the Nucleocapsid protein (N protein, 419aa). The expression levels of all three proteins peaked 40 hours after transfection. Standard proteomics analyses at peak expression (repeated experiments) detected that: The N protein was the most abundant in the cell, Nsp2 was detected among the 15-30 most abundant proteins, and Nsp1 was among the 250-400 most abundant proteins. The cell morphologies appeared normal, but the adherence of the Nsp1-expressing cells to the plate was considerably weaker. These observations are in accordance with the known toxic role of Nsp1, which is mediated through a global translation inhibition 26 . Following the purifications, proteomics analyses of the elutions detected the tagged proteins to be the most abundant by a large margin for all three proteins and for both the DSS and formaldehyde cross-linking reagents. We conclude that the Strep tag activity is largely unaffected by amine-reactive cross-linking, and mark it as a tag of choice for these pursuits. While this methodology is similar to the one introduced by Wang et al . 20, 21 , the current protocol was modified in three major aspects: (i) The incubation time of the cells with the cross-linker was shortened to 20 minutes, rather than 60 minutes in the original protocol. (ii) We used a Strep tag rather than a biotin tag, which allowed us to remove biotinylated proteins from the purification. (iii) We established a transient transfection protocol rather than producing a stable cell line. The transient transfection provides the flexibility of expressing proteins that might be toxic to the cells, such in the case of Nsp1 expression. Blue and red arcs represent DSS and formaldehyde cross-links, respectively. ( B) A secondary set of in situ cross-links that were not part of the primary set. The set comprises cross-links in which one of the peptides is short and has poor fragmentation. The secondary set is only used for final selection of models that are otherwise built by restraints from the primary set. The false detection rate for both sets is 3%. ( C) Comparison of cross-links identified within Nsp2 from in situ CLMS and in vitro CLMS experiments. Cross-link color corresponds to whether it was identified only in the in situ set (blue), only in the in vitro set (grey), or in both sets (cyan). Note several lysine residues (red arrows) that promiscuously link to multiple sites along the sequence only in the in vitro set. ( D) The overlap at the residue-pair level between in situ and in vitro cross-link sets. The role of Nsp2 in the viral pathogenicity is poorly understood. Nsp2 is dispensable for viral replication of SARS-CoV in cell culture, although its deletion attenuates viral growth 27 . In infected cells, Nsp2 translocates to the double-membrane vesicles (DMVs) in which the replication-transcription complexes (RTCs) are anchored 28 . It is yet unclear what is the function of Nsp2 in the context of the DMVs. Secondary structure prediction tools 29 predict that nearly all the sequence of Nsp2 is structured. Yet, this structure remains unknown, thereby making Nsp2 an attractive objective for targeted in situ CLMS. We targeted Nsp2 for in situ CLMS and analyzed the resulting mass spectrometry data for proteomics and cross-links. Proteomics analysis revealed ten proteins that co-purify with Nsp2 in significant amounts ( Figure S1A ). Most of these proteins are part of the Prohibitin complex, which was previously shown to interact with the Nsp2 of SARS-CoV 30 . For identification of DSS cross-links, we ran an exhaustive-mode search of the mass spectrometry data against a sequence database comprising the co-purifying proteins and Nsp2. We compiled two non-overlapping cross-link sets from the search results. The primary set (high quality) comprised 43 internal cross-links within Nsp2 and 10 cross-links within the Prohibitin complex ( Figure 2A , Table S1 ). The secondary set (medium quality) comprised 38 cross-links, which were mostly within Nsp2 ( Figure 2B , Table S2 ). The false detection rate (FDR) for both sets was estimated to be 3% according to a decoy analysis (Methods, Figures S2,S3 ). The secondary set only contained cross-links in which one of the peptides was short (4-6 residues). Such short peptides perform poorly in MS/MS fragmentation, and a common practice in most CLMS studies is to ignore them. However, we found that once a suitable filtration of the search results is applied, many of these cross-links have low FDR values. The primary set is used for structure modeling, while the secondary set is used for structure validation. In addition to the two sets of DSS cross-links, we also identified 5 cross-links within Nsp2 from formaldehyde cross-linking ( Figure 2A , Table S1 bottom ) at a FDR of less than 5% according to a decoy analysis with reverse sequences 22 . We attempted to augment the cross-link set by in vitro cross-linking of purified Nsp2. To that end, we first purified Nsp2 in HEPES buffer (50mM HEPES pH=8.0, 150mM NaCl), and then cross-linked the purified protein with 1 mM of the soluble BS3 reagent. Because BS3 and DSS share the same cross-linking chemistry, one can directly compare the in situ and in vitro cross-linked sets ( Figures 2C,2D , Table S3 ). Surprisingly, the two sets are considerably different, and a significant number of the in situ cross-links do not form in vitro and vice versa. Notably, several lysine residues link to multiple sites over the entire sequence only in the in vitro set. In vitro cross-linking with the DMTMM reagent also showed the same promiscuous linking pattern ( Figure S4 ) that is not seen in the in situ set. These differences are best explained by the tendency of the purified Nsp2 protein to denature and aggregate outside of the cellular context. The inconsistancy between the in situ and in vitro experiments highlights the crucial information that can be extracted by CLMS from intact cells. It may also underlie the reason for the lack of an atomic full-length structure of Nsp2 to date. There are no close homologs with solved structure available for Nsp2. We have therefore referred to the Nsp2 model generated by AlphaFold2 from DeepMind 31 . AlphaFold2 has been highly successful in the recent CASP14 round, submitting highly accurate models. The initial AlphaFold2 model violated 17 out of 44 (39%) and 9 out of 29 (31%) cross-links in the primary and secondary sets, respectively. A cross-link is considered violated if the Cɑ-Cɑ distance is higher than 25 Å. The violated cross-links were mostly inter-domain ones, while almost all the intra-domain cross-links were satisfied ( Figure 3A,B ) . To obtain a model consistent with the cross-link set, we divided the AlphaFold2 model into domains (residues 1-104, 105-132, 133-275, 276-345, 512-638). One domain that was not covered by the initial AlphaFold2 model (residues 359-511), was modeled by homology to partial Nsp2 structure of the infectious bronchitis virus (PDB 3ld1 32 , sequence identity 13%). With the availability of the structures for the individual domains, the modeling task is converted into a domain assembly problem. To this end, we applied the CombDock algorithm for multi-molecular assembly based on pairwise docking [33] [34] [35] . The six domains served as an input ( Figure 3C ) along with the primary set of cross-links and domain connectivity constraints. We have obtained 62 models that satisfied all the primary set inter-domain cross-links ( Figure 3D ,E ) with precision of 8 Å. We validated these models with the secondary cross-link set and found nine models that satisfied all but three secondary set cross-links that were in the 25-30 Å range. These models converged into a single cluster with a higher precision of 1 Å ( Figure 3F ). The left view is the same as in Panel B. The large acidic patch in the right view is conserved across human coronaviruses ( Figure S6 ). To infer possible functions of Nsp2, we mapped residues that are fully conserved in all human coronaviruses onto the integrative model. A cluster of conserved cysteine residues around Cys146 prompted a search for possible metal binding sites by studying the distribution of cysteine and histidine residues in the model. We identified three putative metal binding site regions ( Figure 4 ). Site 2 is conserved in all coronaviruses, while the other sites occur only in the SARS subfamily. All three sites are solvent accessible within the context of the full integrative model. A search for structures in the Protein Data Bank (PDB) with structurally similar domains (DALI server 36 , Z>3.0) identified zinc binding proteins. Therefore, zinc is the likely ion substrate of Nsp2 as well. An additional histidine residue is located next to the quad of metal-binding residues in all three sites. A possible role for this histidine residue is to modulate the metal binding in a pH-dependent manner. We also note that around the third site there is a clustering of additional cysteine and histidine residues that comprise an incomplete adjacent site. This clustering occurs in SARS-CoV-2, but not in SARS-CoV. The overall evolutionary picture is that of accelerated accumulation of metal binding sites within the SARS subfamily. In coronaviruses, Nsp2 is the least conserved member of the non-structural proteins. Presumably, the reduced evolutionary pressure makes it more amenable to such rapid gain of function. Based on these observations we suggest that Nsp2 plays a role in regulation of zinc levels at the RTCs. Zinc is essential for RNA replication and may be depleted at the RTCs, especially when they are enveloped in the DMVs. The recruitment of Nsp2 to the RTCs therefore establishes a reservoir of zinc that can be exchanged with its surrounding. Finally, we note that zinc regulation is almost certainly not the only function of Nsp2. Analysis of sequence conservation on the surface of the integrative model ( Table S5 , Figure S5 ) reveals other prominent features that we cannot at present annotate functionally. These include clusters of conserved residues around Pro245 and Tyr619, and a large conserved acidic patch ( Figure 4C, Figure S6 ). Future in situ CLMS studies in the context of full viral infection will shed more light on these yet uncharacterized functions. In order to obtain a model of the full N dimer, we performed computational docking 34 of the dimerization domain (in the dimer form) and two RNA binding domains. All three docked components contained short single RNA strands to ensure that the final model is consistent with the paths of bound RNA. The RNA binding domain and its bound RNA 10-mer were taken from a recent NMR study (PDB 7act 37 ). The RNA 6-mer on the dimerization domain (PDB 6zco 38 ) was initially docked into the basic groove between the monomers and then refined by a Molecular Dynamics simulation (Methods). The docking was guided by distance restraints derived from the 14 identified inter-domain cross-links 35 . We obtained a single large cluster of models satisfying all the cross-links within 25 Å, except one (residues 100 to 102) which was within 28 Å ( Figure 5B,C ) . This cross-link is between two lysines located on a flexible loop, therefore the distance can vary. The RNA binding domain binds to the dimerization domain at a well-defined region that was largely shared by all the models. This region comprises residues 247-261 from one chain and residues 296-307 and 343-352 from the other chain. The integrative model demonstrates that the N dimer can accommodate three RNA single strands simultaneously. Stereochemically, the two RNA binding domains are located far enough from each other to allow a middle RNA single strand to stretch on the basic surface of the dimerization domain without hindrance. We note however that a rearrangement of residues 247-252 in the dimerization domain is required (compared to current crystal structures) in order to allow the entry and exit of that middle strand. Electrostatically, the closest approach of the phosphate backbones between the middle and the side strands is~10 Å, which is comparable to proximities observed in the eukaryotic nucleosome 44 . Moreover, the cross-links data suggest that the electrostatic repulsion at these closest-approach regions may be further mitigated by positively charged inter-domain linker regions (see ahead). Overall, the model points to an efficient utilization of the RNA binding capacity of the N dimer, which is required for the packing of the relatively large viral genome. Of special interest are cross-links that contain an overlapping peptide pair. Such cross-links necessarily report on a direct interaction between two chains of N. We identified four sequence regions that form such cross-links (residues 100-102, 237-237, 248-249, and 266-266). These identifications are supported by well-annotated MS/MS fragmentation spectra ( Figure S8 ). In several cases, these identifications are also supported by more than one peptide pair. The Mapping of the Nsp1-RS3 crosslinks (blue) onto the cryo-EM structure of the C-terminal of Nsp1 (pink) bound to the 40S ribosomal subunit (gray) (PDB 6lzw 46 ). RS3 is marked in green. The crystal structure of the N-terminal is docked into the unassigned density of the structure (red arrow). The unstructured linker between the N-and C-domains of Nsp1 is depicted by a dashed line. One of the cross-links involves a lysine within the unstructured linker. Nsp1 inhibits host protein translation, thereby interfering with the cellular antiviral response 26 . This short protein comprises a structured N-terminal domain (residues 1-125) and a disordered C-terminal tail (residues 126-180). Two recent cryo-EM studies 46,47 revealed the C-terminal (residues 148-180) to bind strongly to the mRNA entry tunnel of the ribosome, thus obstructing the tunnel and inhibiting protein synthesis. The results we obtained from targeted in situ CLMS of Nsp1 fully support these findings. The main proteins that co-purified with Nsp1 were components of the 40S ribosomal subunit, in particular the ribosomal S3 protein and the eukaryotic translation initiation factor 3 (eIF3) ( Figure S1C ). A search for cross-links among these sequences identified 12 intra-protein cross-links in Nsp1 and two cross-links between Nsp1 and RS3 ( Figure 6A, Table S7 ). Another three cross-links were identified within the eIF3 consistent with Nsp1 binding also to the 43S pre-initiation complex 47 . The estimated FDR of this set is less than 5% ( Figures S2D,S3D ). The intra-protein cross-links within Nsp1 fit well to the available crystal structure of the N-terminal domain ( Figure 6B ). The two cross-links between Nsp1 and RS3 are in accord with the likely path of the C-terminal along the mRNA entry tunnel ( Figure 6C ). Three in situ cross-links are not compatible with the structure of the C-terminal lodged deep within the mRNA entry tunnel (120 to 141; 125 to 141; 120 to 164). Because the bulky N-terminal cannot enter the mRNA entry tunnel, the occurrence of these cross-links implies an additional conformation of Nsp1 in which the C-terminal is interacting with the N-terminal. It is suggestive of an auto-inhibition mechanism for Nsp1 in which the N-terminal modulates the availability of the C-terminal towards the interaction with the ribosome. The results establish the effectiveness of targeted in situ CLMS and integrative structure modeling to study a variety of proteins. The large number of identified cross-links allowed us to apply integrative structure modeling for domain assembly (Nsp2), oligomerization and domain assembly (N), and complex assembly (Nsp1-ribosome). Integration of domain level models generated by deep learning methods (AlphaFold2) with in situ CLMS data enabled ab initio structure modeling of the relatively long Nsp2 (638 amino acids). Two factors contributed significantly to the identification yield of the in situ CLMS. The first is the use of the highly hydrophobic DSS reagent for cross-linking. The hydrophobicity improves the membrane permeability, thereby allowing to shorten the incubation time considerably. In our opinion, shortening the incubation times is crucial for maintaining the native cell state while minimizing the toxic effects of the cross-linker. The second factor is the utilization of the Strep tag technology that proved to be both highly effective for purification and highly compatible with CLMS. In contrast to the success of in situ CLMS to identify dozens of intra-protein cross-links within the three SARS-CoV-2 proteins, the number of inter-protein cross-links was significantly lower. Cross-links between Nsp2 and Prohibitin were not identified, nor were cross-links between Nsp1 and other ribosomal proteins beside RS3. Three reasons may underlie these missing identifications. First, these proteins may be intended to interact strongly with other viral proteins that are not present without the context of a full viral infection. Second, the lack of 'cross-linkable' residues at the interacting interfaces of these particular systems. If this is the case, then the use of membrane-permeable reagents that employ other cross-linking chemistries (e.g. UV-induced, etc.) may be beneficial in future studies. Third, the sub-stoichiometric nature of the virus-host interactions reduces the signal of inter-protein cross-links. We note that in the opposite case, Wang, et al . [20] [21] [22] targeted a fully stoichiometric complex (the proteasome) and identified inter-protein cross-links with high yield. A general observation from this study is the large variation between in situ and in vitro experiments. For both Nsp2 ( Figure 2D ) and N ( Figure S7 ) the overlap between the cross-link sets is partial, even though the underlying chemical reactivity is identical. One would expect the in vitro sets to fully contain the in situ sets, because the formers are not encumbered by issues of membrane-permeability. Yet, notably, a considerable number of cross-links are found only in the in situ sets. We interpret these results to indicate that in situ cross-linking probes protein states that are not occuring in vitro . These states may require certain cellular factors that are depleted upon cell lysis, thereby leading to denaturation, aggregation, or oligomer disassembly. The fragmentation score of the cross-link (defined as the number of all matching MS/MS fragments divided by the combined length of the two peptides) is above a threshold determined by the FDR analysis (see ahead); 4) The fragmentation score is at least 15% better than the score of the next best peptide pair or linear peptide. Estimation of the false detection rate (FDR). The FDR was estimated from decoy-based analysis. To that end, the cross-link identification analysis was repeated 20 times with an erroneous cross-linker mass of 138.0681*N/138 Da, where N=160, 161, 162, … 179. This led to bogus identifications with fragmentation scores that were generally much lower than the scores obtained with the correct cross-linker mass (see histograms in Figure S3 ). For the identification of true cross-links, we set the threshold on the fragmentation score according to the desired FDR value. For example, a threshold of 0.65 on the fragmentation score of the Nsp2 data set gave 53 cross-links above the threshold in the true analysis, and a median of 1 cross-link in a typical decoy run ( Figure S2C ). We therefore estimate the corresponding FDR to be about 1 in 53, or~2%. The final thresholds used: Nsp2 Primary Set -0.65, Nsp2 Secondary Set -0.9, Nsp2 in vitro Set -0.65, N -0.7, Nsp1 -0.65. Domain assembly via pairwise docking. The input to the domain assembly problem consists of a set of structural models and a list of cross-links. The goal is to predict an assembly with good complementarity between the domains and consistency with the input cross-links. We use CombDock 33 , a combinatorial docking algorithm, which was modified to support cross-linking data 35 . First, pairwise docking is applied on each pair of input structures to generate a set of docked configurations ( Step 1 ). Second, combinatorial optimization is used to combine different subsets of the configurations from pairwise docking to generate clash-free complex models consistent with the cross-links and chain connectivity ( Step 2 ). A cross-link is considered satisfied if the distance between the Cɑ atoms of the cross-linked residues is below a specified threshold. Here we used a threshold of 25 Å and 20 Å for DSS and formaldehyde cross-links, respectively. Step 1. All-pairs docking. We used PatchDock to generate pairs of docked configurations 34 . PatchDock employs an efficient rigid docking algorithm that maximizes geometric shape complementarity. Protein flexibility is accounted for by a geometric shape complementarity scoring function, which allows a small amount of steric clashes at the interface. Only domain pairs with at least one cross-link between them were structurally docked at this stage. The PatchDock scoring function was augmented by restraints derived from the cross-linking data. Each docking configuration is represented by a transformation (three rotational and three translational parameters) and we keep the K=1,000 best scoring transformations for each pair of domains. Two basic principles are used by the algorithm: a hierarchical construction of the assembly and a greedy selection of subcomplexes. The input comprises the pair-wise docking of Step 1 (subcomplexes of size 2). At each step, the algorithm generates subcomplexes with n subunits by connecting two subcomplexes of smaller size. Only valid subcomplexes are retained at each step. Valid subcomplexes do not contain steric clashes and satisfy distance constraints (chain connectivity) and restraints (cross-links). Searching the entire space is impractical, even for relatively small K (number of models per pair) and N (number of domains), due to computer speed and memory limitations. Therefore, the algorithm performs a greedy selection of subcomplexes by keeping only the D=1,000 best-scoring models at each step. The final models are clustered using RMSD clustering with a cutoff of 4 Å. The final best scoring models are selected based on the cross-links satisfaction and cluster size. The model precision is calculated as the average Cα RMSD between the best-scoring models. Pairwise docking was applied to dock the RNA binding domain of N (PDB 7act) to the dimerization domain (PDB 6zco). Combinatorial optimization was performed for assembly of Nsp2 domains. Molecular dynamics ( MD) simulations. MD simulations were performed on the dimerization domain of the N protein model with docked RNA using GROMACS 2020 software 55 and the PARMBSC1 force field 56 . Any steric clashes between the docked RNA and protein were resolved by removing nucleotide bases, leaving a poly-U hexameric RNA fragment. Then the protein-RNA complex was solvated with a simple point charge water model with a self-energy polarization correction term (SPC/E) 57 , and the system charge was neutralized with the addition of Clions. In order to ensure appropriate initial geometry, a steepest-descent energy minimization MD run was allowed to run until convergence at Fmax < 1000 kJ/(mol · nm). Position restraints with k = 1000 kJ/(mol · nm 2 ) are then applied to the protein and RNA heavy atoms to allow the water and ions to equilibrate around the protein in two-steps. The first equilibration step is conducted at a constant number of atoms, volume, and temperature (NVT) of 300° K. The second equilibration step is conducted at a constant number of atoms, pressure of 1 bar, and temperature of 300° K (NPT). These equilibrations have a time step of 2 fs and last for 100 ps. Once the system is equilibrated at 300° K and 1 bar, a production simulation of 100 nanoseconds with a time step of 2 fs provides 10,000 MD simulation frames at intervals of 10ps. Proteomics of SARS-CoV-2-infected host cells reveals therapy targets The coding capacity of SARS-CoV-2 The Architecture of SARS-CoV-2 Transcriptome Coronavirus3D: 3D structural visualization of COVID-19 genomic divergence The sprint to solve coronavirus protein structures -and disarm them with drugs In vivo protein complex topologies: sights through a cross-linking lens A new in vivo cross-linking mass spectrometry platform to define protein-protein interactions in living cells An Integrated Technology to Understand the Structure and Function of Molecular Machines Cross-Linking/Mass Spectrometry for Studying Protein Structures and Protein-Protein Interactions: Where Are We Now and Where Should We Go from Here? Protein Tertiary Structure by Crosslinking/Mass Spectrometry Cross-Linking Mass Spectrometry: An Emerging Technology for Interactomics and Structural Biology A strategy for dissecting the architectures of native macromolecular assemblies Mitochondrial protein interactome elucidated by chemical cross-linking mass spectrometry Histone Interaction Landscapes Visualized by Crosslinking Mass Spectrometry in Intact Cell Nuclei The interactome of intact mitochondria by cross-linking mass spectrometry provides evidence for coexisting respiratory supercomplexes Situ Structural Restraints from Cross-Linking Mass Spectrometry in Human Mitochondria Stitching the synapse: Cross-linking mass spectrometry into resolving synaptic protein interactions Probing the protein interaction network of Pseudomonas aeruginosa cells by chemical cross-linking mass spectrometry In-cell architecture of an actively transcribing-translating expressome Molecular Details Underlying Dynamic Structures and Regulation of the Human 26S Proteasome The proteasome-interacting Ecm29 protein disassembles the 26S proteasome in response to oxidative stress Mass spectrometry reveals the chemistry of formaldehyde cross-linking in structured proteins Chemical Crosslinking Mass Spectrometry Analysis of Protein Conformations and Supercomplexes in Heart Tissue Principles for Integrative Structural Biology Studies Integrative Modelling of Biomolecular Complexes Severe acute respiratory syndrome coronavirus nsp1 suppresses host gene expression, including that of type I interferon, in infected cells The nsp2 replicase proteins of murine hepatitis virus and severe acute respiratory syndrome coronavirus are dispensable for viral replication Dynamics of coronavirus replication-transcription complexes Protein secondary structure prediction based on position-specific scoring matrices Severe acute respiratory syndrome coronavirus nonstructural protein 2 interacts with a host protein complex involved in mitochondrial biogenesis and intracellular signaling It will change everything': DeepMind's AI makes gigantic leap in solving protein structures Expression, crystallization and preliminary crystallographic study of the C-terminal half of nsp2 from SARS coronavirus Combinatorial docking approach for structure prediction of large proteins and multi-molecular assemblies Geometry-based flexible and symmetric protein docking Modeling of Multimolecular Complexes Using Dali for Protein Structure Comparison Structural basis of RNA recognition by the SARS-CoV-2 nucleocapsid phosphoprotein High-resolution structure and biophysical characterization of the nucleocapsid phosphoprotein dimerization domain from the Covid-19 severe acute respiratory syndrome coronavirus 2 The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein Phosphoregulation of Phase Separation by the SARS-CoV-2 N Protein Suggests a Biophysical Basis for its Dual Functions SARS CoV-2 nucleocapsid protein forms condensates with viral genomic RNA. bioRxiv (2020) Molecular Architecture of the SARS-CoV-2 Virus Architecture and self-assembly of the SARS-CoV-2 nucleocapsid protein Crystal structure of the nucleosome core particle at 2.8 A resolution Structural characterization of nonstructural protein 1 from SARS-CoV-2. iScience Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2 SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Efficient and robust proteome-wide approaches for cross-linking mass spectrometry Expanding the chemical cross-linking toolbox by the use of multiple proteases and enrichment by size exclusion chromatography MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification The Perseus computational platform for comprehensive analysis of (prote)omics data Subunit order of eukaryotic TRiC/CCT chaperonin by cross-linking, mass spectrometry, and combinatorial homology modeling In-Search Assignment of Monoisotopic Peaks Improves the Identification of Cross-Linked Peptides GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit Parmbsc1: a refined force field for DNA simulations The missing term in effective pair potentials The PRIDE database and related tools and resources in 2019: improving support for quantification data