key: cord-0995325-aunryj0j authors: Noske, G. D.; Nakamura, A. M.; Gawriljuk, V. O.; Fernandes, R. S.; M. A. Lima, G.; V. D. Rosa, H.; Pereira, H. D.; C. M. Zeri, A.; A. F. Z. Nascimento, A.; C. L. C. Freire, M.; Fearon, D.; Douangamath, A.; von Delft, F.; Oliva, G.; Godoy, A. S. title: A Crystallographic Snapshot of SARS-CoV-2 Main Protease Maturation Process date: 2021-06-24 journal: J Mol Biol DOI: 10.1016/j.jmb.2021.167118 sha: a51fcd8c27fede4f0b3831b267f9b31ca7e6caf7 doc_id: 995325 cord_uid: aunryj0j SARS-CoV-2 is the causative agent of COVID-19. The dimeric form of the viral Mpro is responsible for the cleavage of the viral polyprotein in 11 sites, including its own N and C-terminus. The lack of structural information for intermediary forms of Mpro is a setback for the understanding its self-maturation process. Herein, we used X-ray crystallography combined with biochemical data to characterize multiple forms of SARS-CoV-2 Mpro. For the immature form, we show that extra N-terminal residues caused conformational changes in the positioning of domain-three over the active site, hampering the dimerization and diminishing its activity. We propose that this form preludes the cis and trans-cleavage of N-terminal residues. Using fragment screening, we probe new cavities in this form which can be used to guide therapeutic development. Furthermore, we characterized a serine site-directed mutant of the Mpro bound to its endogenous N and C-terminal residues during dimeric association stage of the maturation process. We suggest this form is a transitional state during the C-terminal trans-cleavage. This data sheds light in the structural modifications of the SARS-CoV-2 main protease during its self-maturation process. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of COVID-19, a highly infectious disease that rapidly spreads causing a global pandemic. SARS-CoV-2 is an enveloped RNA virus belonging to the β-lineage of coronaviruses, which includes SARS-CoV and Middle East (MERS-CoV) respiratory viruses [1] [2] [3] . The viral genome is a single-stranded positive RNA comprising about 30 ,000 nucleotides, that shares 82% sequence identity with SARS-CoV [4] . The replicase gene (ORF1ab) encodes two overlapping polyproteins (pp1a and pp1ab) that are required for viral replication and transcription [5] . The main protease (M pro ), also known as 3C-like protease (3CL pro ) is a viral cysteine protease specific for glutamine at the S1 subsite, showing variable recognition preferences at S2 (Leu/Phe/Met/Val) and S2' subsites (Ser/Ala/Gly/Asn) [6] . M pro is responsible for the maturation of pp1a and pp1ab in at least 11 characterized sites, including its auto-processing at the N and C terminus, which is essential for its activity and dimerization [4, 7, 8] . Due to its essential role in viral replication, M pro is one of the most well characterized non-structural proteins of SARS-CoV-2. In addition, its unique features of cleavage site recognition and the absence of closely related homologues in humans, identify M pro as a major target for antiviral drug development [4, 9, 10] . Although M pro activity is crucial to viral biology, its self-maturation process is still poorly understood. Several biochemical and crystallographic studies on native and mutated forms of SARS-CoV M pro tried to elucidate its maturation mechanism (reviewed in [11] ), by evaluating if the N and C-terminus processing occurs within a dimer (ciscleavage) or between two distinct dimers (trans-cleavage). The first 2005 model suggested that M pro probably forms a small amount of active dimer after autocleavage that immediately enables the catalytic site to act on other cleavage sites in the polyprotein [12] . In 2010, based on the observation that dimerization of mature M pro is enhanced by the presence of substrates, Li and colleagues proposed that after the translation, two M pro protomers form a transient dimer which is stabilized by binding the N-terminal site of its substrate (another M pro in polyprotein) and further cleave to free its N-terminus [13] . In addition, Chen et al. suggested that the N-terminal autocleavage might only need two immature forms of M pro in monomeric polyproteins to form an intermediate dimer that is not related to the active dimer of the mature enzyme [14] . Herein, we used X-ray crystallography integrated with biochemical techniques to investigate the self-maturation process of SARS-CoV-2 M pro . The construct of M pro containing N-terminal insertions produced an immature form of the enzyme (IMT M pro ), unable to form a dimer, that showed a reduced enzymatic activity. We used fragment screening to probe new cavities for drug development in this construct. The inactive mutant C145S with inserted native N-terminal residues (C145S M pro ) produced a form of the protein that behaves as monomers, dimers, trimers and tetramers in solution. Crystals of the tetrameric form revealed details of the dimeric association of M pro during selfprocessing of its N and C-terminal residues. All forms of the enzyme revealed important conformation changes of the enzyme during maturation, which can guide direct-acting drug development. A general strategy to produce SARS-CoV-2 M pro is to maintain its self-cleavage Nterminal portion and add the HRV-3C cleavage site with a histidine-tag at the C-terminal portion. We successfully used ammonium sulfate precipitation followed by ion exchange chromatography to obtain pure M pro , simplifying the protocol to one that takes less than 8 h and with a final yield of ~2.5 mg/L of culture. The SARS-CoV-2 IMT M pro was obtained by adding a non-cleavable sequence (Gly-Ala-Met) at the N-terminal Ser1 of M pro , and purified by a similar protocol. The SARS-CoV-2 IMT M pro was produced as a soluble protein, yielding ~80 mg/L of culture. To further investigate the role of N-terminal residues in the maturation of M pro , we designed a construct containing the mutated C145S residue with its native cleavage peptide of M pro (Ser-4, Ala-3, Val-2, Leu-1, Gln-0↓) at the Nterminal of Ser1 (Figure 1a and S1a). During gel filtration, two M pro peaks were identified with mass consistent with a monomer and a tetramer ( Figure S1 ). (Table 1) . As previously reported, the M pro N-terminal is fundamental for dimerization and any additional residues would reduce or even abolish its activity [4, [15] [16] [17] [18] . As expected, C145S M pro has only shown residual activity ( Figure 1b ). All three M pro constructs exhibited similar thermal-stability profiles, indicating similar folding (Figure 1c ). Analysis in solution using SEC-MALS suggests that M pro behaves as a dimer in the tested conditions, as expected ( Figure 1c ) [10] . For IMT M pro , the additional residues at N-terminal seem to prevent dimerization completely (Figure 1c ). For C145S M pro , however, the additional residues allow the protein to adopt multiple conformational states ranging from monomers to tetramers (Figure 1c ). Despite the site-direct mutagenesis of the C145S M pro , this enzyme exhibited residual proteolytic activity which allowed us to observe the self-processing of the monomeric peak of C145S M pro by SDS-PAGE in the course of two days (Figure 1e ). By quantifying the mass intensity, we estimate that at the end of two days about 30% of the protein was self-cleaved after incubation ( Figure 1e ). By using SEC-MALS, we also monitored the formation of dimers by monomeric C145S M pro sample after 0h, 24h, 48h and 72h incubation at room temperature (Figure 1f ). At 0h, the mass recovery reason between monomers/dimers was 14.9, which decreased to 1.04 at 24h, and 0.09 at 48h incubation, with complete degradation of the monomer peak after 72h (Table 2 ). This data indicated that the cleavage of the N-terminal is directly proportional to the formation of dimers in solution, highlighting the importance of the N-terminal processing for the assembly of the M pro . To investigate the effect of M pro in the N-terminal processing, we monitored the effect of adding M pro to the C145S M pro samples in a ratio of 1:6000. On the SDS-Page, we can see that the sample containing M pro showed an increase ratio of protein cleavage after 20h when compared with the previous experiment (Figure 1e ). At the SEC-MALS, the mass recovery reason between monomers/dimers for this sample at 0h was 12.9, them 0.5 at 24h, and 0.02 at 48h incubation, also with complete degradation of the monomer peak after 72h ( Figure 1g and Table 2 ). The data suggests the addition of M pro to the C145S M pro sample at 1:6000 ratio increased the speed of N-terminal processing and dimer formation by the order of 50% after 24h. Crystal structure of M pro in monoclinic and orthorhombic crystal system M pro was crystallized in the monoclinic crystal system in several conditions and its X-ray structure was determined at 1.46 Å in C2 1 space group, as the majority of the PDB deposits. All 306 residues were refined at the electron density to a final R work /R free of 0.16/0.18, with 99% of Ramachandran in favored positions (Table S2 ). The crystal asymmetric unit contains one monomer which could be symmetry expanded to the biological dimer, following the same pattern of the majority of known structures deposited in PDB (r.m.s.d of 0.2 Å vs PDB 5RGG, for all Cα 306). The M pro protomers are formed by three domains (DI, DII and DIII), with its catalytic region located between the betabarrels comprising DI and DII [4] ( Figure S2 ). Using seeds from IMT M pro , we were able to obtain a new crystal system in orthorhombic space group P2 1 2 1 2 1 at final resolution of 1.86 Å. This structure was refined to final R work /R free of 0.19/0.22 and 98.33% of Ramachandran in favored positions (Table S2 ). This crystal system shows the full dimer in the asymmetric unit, and its packing appears to offer advantages for soaking compounds in M pro active site when compared with the canonical C2 1 form, especially for compounds targeting subsites S3-S4, which are less constrained by crystal packing in the orthorombic form ( Figure S14 ). This is being explored by the COVID Moonshot initiative, and will be latter described in a separated manuscript. The crystal structure of IMT M pro at 1.6 Å was determined using 3 merged datasets ( Figure S3 , S4, Table S1) in P2 1 2 1 2 1 space group, with two molecules in the asymmetric unit, packed in similar shape to the known biological unit of M pro . The structure was refined to a final R work /R free of 0.20/0.22, with 97% of Ramachandran in favored positions (Table S2 ). In the recent published structures of GM-M pro , both apo and ligand-complexes exhibited minor differences with the mature form [15] . However, in our structure there are distinguishable differences in the overall structure, especially in the position of DIII helices Phe140 rises to the surface of the molecule, leading to major conformation alterations of the chain B active site souring residues, such as Glu166, Pro168 and Gln189 ( Figure 2 and S6). As DIII is known for being extremely flexible [20] , we compared the structure of IMT M pro with the structure of M pro in orthorhombic crystal system in order to investigate if the dislocation of DIII was being promoted by the distinct crystal packing. In fact, the structure of M pro in the orthorhombic crystal system is much more similar to the canonical M pro (r.m.s.d of 0.52 Å for Cα of 604 residues) rather than IMT M pro (r.m.s.d of 0.91 Å for Cα of 604 residues), indicating that DIII dislocation is indeed caused by the extra Nterminal residues ( Figure S13 ). The plasticity of SARS-CoV-2 M pro active site was already reported when apo Xray structures collected at cryo and room temperatures were compared [21] , and its expected given the broad spectrum of endogenous substrates that M pro is has to process. However, the IMT M pro revealed major structural alterations in the oxyanion hole, likely affecting enzyme processing. The cascade effect of the steric hindrance caused by the N-terminal extra residues affects the position of Ser1, Phe140, Glu166 and Pro168, disrupting the shape of subsites S1, S2 and S4 ( Figure 2 ). Within these the S1 seem to be most affected, assuming an unusual flattened configuration that seem to disrupt the cavity responsible for the recognition of glutamine side-chain, likely affecting substrate recognition ( Figure 2 ). This not only explains the diminished activity of this construct, as well as shows the importance of full N-terminal processing for the correct folding of M pro . Despite the significant changes of the active site, relative position of the catalytic dyad Cys145-His41 remains unchanged in this form ( Figure 2 ). Gln0 and Ser1 are non-covalently bound in the amino region, clearly indicating that the N-terminal cleavage was completed. At the S1 subsite, Gln0 NE2 interacts with Glu166 OE1 by a hydrogen bound (2.7 Å), while Gln0 form interacts with Ser145 in the position of the native oxyanion hole (Figure 4 , Figure S8 -S9). To accommodate the hydrophobic sidechain of Leu-1 at P2, Met49 and Met165 are pushed further of each other ( Figure S10 ), leading to a more opened groove of this subsite relatively to the apo-state, explaining the ability of this subsite to accommodate a variety of hydrophobic side chain residues, such as Leu, Met, Ile, Val and Phe [6, 24] . Yet, from the eleven endogenous [25] . During this event, two C145S M pro dimers appear to be linked by the interaction of the C-terminal and a respective active site, revealing details of the dimeric association in a non-closed complex ( Figure 5 ). The electron density of this dataset indicates that chain B Ser145 OG is covalently bound to Gln306 C from crystallographic symmetric correlated chain B (distance of 1.4 Å), with the loss of one oxygen by Gln306 ( Figure S9 ). We believe the diminished activity of the mutant as allowed the formation of these crystals after almost 20 days, from which we were able to capture this intermediary state of the maturation. We highlight here that the model deposited model does not depicted this covalent bound, as we found impossible to link two atoms outside the asymmetric unit (even after consultant with software developers). Within the active site, Gln306 occupies the respective position of Gln0 at S1, while S2 is occupied by Phe305, increasing the distance between Met49 and Met165 relatively to chain A bound to N-terminal ( Figure S10 ). As for the N-terminal residues, subsites S3-S5 interactions with C-terminal are mainly maintained by hydrogen bounds between main chains (Figure 4b ). M pro is firstly produced as the Nsp5 domain of the viral polyproteins before they are proteolytically processed into 15 or 16 non-structural proteins [11] . Immediately after translation, the immature form of M pro would contain both N and C-terminal insertions, which requires self-processing to generate the mature form of the enzyme [12] . In M pro , the cleaved N-terminals are sandwiched between the two protomers of the dimeric enzyme, being a part not only of the dimer interface but also from the respective protomer active site. In the IMT M pro , the N-terminal extra amino acids seem to disrupt the active site shape at S1-S3 subsites ( Figure 5 ), affecting its capacity of recognizing the substrate and processing. Notwithstanding, the extra N-terminal residues also seem affect the enzyme ability to form dimers by pushing the reciprocal DIII further to its native conformation. The same process appears to occurs to C145S M pro monomers with native Nterminal inserted residues, although in this case, the slow cleavage of the N-terminus results in the formation of dimers overtime (Figure 1e ). The incubation of this samples to allow dimerization appears to significantly enhance the enzymatic residual activity of this construct, indicating that the dimeric form is important for activity even for this serine mutant ( Figure 6 ). By monitoring the formation of dimer overtime, we saw that the monomeric enzyme is capable of processing its N-terminal, suggesting that cis-cleavage as a mechanism for the first step of the maturation process ( Figure 1e ). However, when we compare the results from both experiments, we notice that the ratio of dimer formation seen to be far superior to the ratio of N-terminal processing (Figure 1e-g) . This is in partial agreement with the model proposed by Li and colleagues (2010) in which two M pro form a transient dimer that is stabilized by the binding the N-terminal site of its substrate (another M pro in polyprotein) and further cleave to free its N-terminus [13] . It is also another argument for our model of the immature form, in which a partially cleaved M pro would result in a constrained packing of M pro with diminished activity (Figure 2 ). During our analysis we also observed that when M pro is added to C145S M pro , the N-terminal cleavage and dimer formation seem to be enhanced significantly (Figure 1e X-ray data for apo IMT M pro was collected from three isomorphous independent crystals, that were processed by XDS via autoPROC [29, 30] . Data herein was used for confirm reliability of the beamline ( Figure S4 and Table S1 ). Datasets were then scaled and merged using Aimless [31] , and the resulting dataset was used for structural determination of IMT M pro by molecular replacement with Phaser [32] using PDB 5RGQ as template. Model was refined with COOT [33] and BUSTER [34] at 1.6 Å and deposited under the code 7KFI. X-ray data for mature M pro and C145S M pro monomeric and tetrameric were processed by XDS via autoPROC [29, 30] and scaled using Aimless [31] . Mature M pro , C145S M pro tetrameric and C145S M pro monomeric were solved by molecular replacement with Phaser [32] using template models 5RGQ, 7KFI and 5R8T respectively. Mature M pro and C145S M pro monomeric and tetrameric were refined with COOT [33] and phenix.refine [35] , and are respectively deposited under the codes 7KPH (at 1.4 Å), 7N5Z (at 1.7 Å) ) and 7N6N (at 2.8 Å). Details of data processing parameters and statistics are given in Table S2 . For the fragment screening of IMT M pro , we used the settled plates fragment libraries of FragMAXlib (Talibov et al., to be published) and F2XEntry [36, 37] , in a total of 192 fragments tested. In those plates, the content of each drop-well was resuspended in 1.0 µL of 0.1 M MES pH 6.7, 5% DMSO (v/v), 8% PEG 4,000 (w/v), 30% PEG 400 (w/v), and crystals were added afterwards. After 4 h soaking at room temperature, crystals were manually harvested and flash cooled for data collection. During the commissioning phase of MANACA, 166 of those crystals were tested, leading to 77 usable datasets. To analyze the data, a simplified version of FragMAXapp was configurated in our laboratory end-station computer [38] . Within FragMAXapp, restrictions libraries were generated by phenix.eLBOW [39] using rm1 force field for geometry optimization, and datasets were processed through autoPROC/STARANISO or DIALS via XIA2 [29, 40, 41] . Molecular replacement and initial refinement were performed using DIMPLE [42] using PDB 7KFI as template. To highlight electron density of weak binding events, map averaging and statistical modelling were performed by PanDDA software [43] . Models were refined with COOT [33] and phenix.refine [35] . Details of data processing and refinement statistics are given in Table S3 . Polder maps of fragments are available in Figure S12 [44]. All enzymatic assays were carried out using FRET-based substrate DABCYL- Table S4 . For dimer formation monitoring, in solution oligomeric states of C145S M pro monomer peak were determined similarly by SEC-MALS using a Superdex 75 Increase Figure S7 . -Crystal structure of the immature form of Mpro (1.6A) -immature form of M pro in complex with 5 fragments, one new cavity in the dimerization interface -Crystal structure of M pro C145S mutant bound to its N and C-terminals endogenous residues at 2.8 A (this is the first structure of N-terminal residues, which enzyme has the highest affinity from all endogenous viral recognition sites) -Description of the tetrameric intermediary formed during C-terminal cleavage -Description of a maturation process with support of biochemistry data ) and Fundação de Amparo à Pesquisa do Estado de São Paulo A novel coronavirus from patients with pneumonia in China A pneumonia outbreak associated with a new coronavirus of probable bat origin A new coronavirus associated with human respiratory disease in China Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved a-ketoamide inhibitors Genome Organization, Replication, and Pathogenesis of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Biosynthesis, Purification, and Substrate Specificity of Severe Acute Respiratory Syndrome Coronavirus 3C-like Proteinase Identification of key interactions between SARS-CoV-2 main protease and inhibitor drug candidates Coronavirus main proteinase (3CLpro) Structure: Basis for design of anti-SARS drugs Structural basis for the inhibition of SARS-CoV-2 main protease by antineoplastic drug carmofur Boceprevir, GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viral main protease Activation and maturation of SARS-CoV main protease Mechanism of the maturation process of SARS-CoV 3CL protease Maturation mechanism of severe acute respiratory syndrome (SARS) coronavirus 3C-like proteinase Liberation of SARS-CoV main protease from the viral polyprotein: N-terminal autocleavage does not depend on the mature dimerization mode Both Boceprevir and GC376 efficaciously inhibit SARS-CoV-2 by targeting its main protease Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra α-helical domain Production of Authentic SARS-CoV Mpro with Enhanced Activity: Application as a Novel Tag-cleavage Endopeptidase for Protein Overproduction Structure and inhibition of the SARS-CoV-2 main protease reveal strategy for developing dual inhibitors against Mpro and cathepsin L Inference of macromolecular assemblies from crystalline state pHdependent conformational flexibility of the SARS-CoV main proteinase (Mpro) dimer: Molecular dynamics simulations and multiple X-ray structure analyses Structural plasticity of SARS-CoV-2 Mpro active site cavity revealed by room temperature X-ray crystallography Crystallographic and electrophilic fragment screening of the SARS-CoV-2 main protease Targeting the Dimerization of the Main Protease of Coronaviruses: A Potential Broad-Spectrum Therapeutic Strategy SARS-CoV-2 Mpro inhibitors and activity-based probes for patient-sample imaging Crystallographic structure of wild-type SARS-CoV-2 main protease acyl-enzyme intermediate with physiological C-terminal autoprocessing site Ligation-independent cloning of PCR products (LIC-PCR) Protein production by auto-induction in high density shaking cultures The Sirius project Data processing and analysis with the autoPROC toolbox How good are my data and what is the resolution? Phaser crystallographic software Features and development of Coot Exploiting structure similarity in refinement: Automated NCS and target-structure restraints in BUSTER Towards automated crystallographic structure refinement with phenix.refine F2X-Universal and F2X-Entry: Structurally Diverse Compound Libraries for Crystallographic Fragment Screening FragMAX: the fragment-screening platform at the MAX IV Laboratory FragMAXapp : crystallographic fragmentscreening data-analysis and project-management system Electronic ligand builder and optimization workbench (eLBOW): A tool for ligand coordinate and restraint generation DIALS: Implementation and evaluation of a new integration package Advances in automated data analysis and processing within autoPROC , combined with improved characterisation, mitigation and visualisation of the anisotropy of diffraction limits using STARANISO Overview of the CCP4 suite and current developments A multi-crystal method for extracting obscured crystallographic states from conventionally uninterpretable electron density Polder maps: Improving OMIT maps by excluding bulk solvent Aline Nakamura: Conceptualization, Methodology, Software, Data curation, Writing-Original draft preparation. Victor Gawriljuk: Conceptualization, Methodology, Software, Data curation, Writing-Original draft preparation. Rafaela Fernandes: Conceptualization, Writing-Original draft preparation Investigation . Ana Zeri: Investigation. Andrey Nascimento: Investigation. Marjorie Freire: Writing -Original Draft. Daren Fearon: Investigation. Alice Douangamath: Investigation. Frank von Delft: Investigation. Glaucius Oliva: Supervision, Funding acquisition Software, Data curation, Investigation, Writing-Original draft preparation, Writing -Review & Editing, Software, Validation ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: