key: cord-0998331-op31hwwg authors: Akbayrak, Ibrahim Yagiz; Caglayan, Sule Irem; Durdagi, Serdar; Kurgan, Lukasz; Uversky, Vladimir N.; Ulver, Burak; Dervisoğlu, Havvanur; Haklidir, Mehmet; Hasekioglu, Orkun; Coskuner‐Weber, Orkid title: Structures of MERS‐CoV macro domain in aqueous solution with dynamics: Impacts of parallel tempering simulation techniques and CHARMM36m and AMBER99SB force field parameters date: 2021-05-26 journal: Proteins DOI: 10.1002/prot.26150 sha: 833a34097c10cb36e30e46224ccd2b1c549315b6 doc_id: 998331 cord_uid: op31hwwg A novel virus, severe acute respiratory syndrome Coronavirus 2 (SARS‐CoV‐2), causing coronavirus disease 2019 (COVID‐19) worldwide appeared in 2019. Detailed scientific knowledge of the members of the Coronaviridae family, including the Middle East Respiratory Syndrome Coronavirus (MERS‐CoV) is currently lacking. Structural studies of the MERS‐CoV proteins in the current literature are extremely limited. We present here detailed characterization of the structural properties of MERS‐CoV macro domain in aqueous solution. Additionally, we studied the impacts of chosen force field parameters and parallel tempering simulation techniques on the predicted structural properties of MERS‐CoV macro domain in aqueous solution. For this purpose, we conducted extensive Hamiltonian‐replica exchange molecular dynamics simulations and Temperature‐replica exchange molecular dynamics simulations using the CHARMM36m and AMBER99SB parameters for the macro domain. This study shows that the predicted secondary structure properties including their propensities depend on the chosen simulation technique and force field parameter. We perform structural clustering based on the radius of gyration and end‐to‐end distance of MERS‐CoV macro domain in aqueous solution. We also report and analyze the residue‐level intrinsic disorder features, flexibility and secondary structure. Furthermore, we study the propensities of this macro domain for protein‐protein interactions and for the RNA and DNA binding. Overall, results are in agreement with available nuclear magnetic resonance spectroscopy findings and present more detailed insights into the structural properties of MERS CoV macro domain in aqueous solution. All in all, we present the structural properties of the aqueous MERS‐CoV macro domain using different parallel tempering simulation techniques, force field parameters and bioinformatics tools. Since the first outbreak of the severe acute respiratory syndrome (SARS) in 2003, a fatal viral disease-causing pneumonia and death was first reported in Saudi Arabia in 2012. This virus was named Middle East Respiratory Syndrome Coronavirus (MERS-CoV). 1 The current SARS-CoV-2 infection, coronaviruses and coronavirus-related infection aroused the attention of the entire world. A history of the SARS-CoV outbreak justifies these high-levels of attention. By the time the global SARS-CoV outbreak was contained, the virus spread to 26 countries, infected over 8000 people worldwide and killed almost 800. Similarly, even though MERS-CoV appeared initially in Saudi Arabia, the virus-that was new to humans-spread to several other countries in or near the Arabian Peninsula, Asia, Europe, and the United States. 2 The mortality of MERS was reported to be 4-fold higher than SARS-CoV. 3 In fact, at the end of 2019, there were a total of 2494 laboratory-confirmed cases of MERS-CoV world-wide and the MERS- MERS-CoV belongs to the lineage C of β-coronaviruses (β-CoVs) that includes CoVs isolated from bats and hedgehogs. CoVs use the RNA genome to encode several structural proteins, including the spike glycoprotein (S), membrane protein (M) and nucleocapsid protein (N), and various non-structural proteins (NSPs) to facilitate its fast replication processes. 5 A single large replicase gene encodes the proteins that play a role in viral replication. 4 This gene contains two open reading frames; ORF1a and ORF1b encoding the polyproteins pp1a and pp1b, with the production of pp1b requiring a À 1 ribosome frameshift at the 3 0 end of ORF1a. 6 ORF1a encodes viral proteases: main protease (M pro ) and papain like protease (PL pro ). These viral proteases play a central role in the cleavage of ORF1a and ORF1b gene products in order to produce functional NSPs. 7 The largest NSP member of the MERS-CoV genome is the ORF1a-encoded, multifunctional and multidomain protein NSP3 that serves as a major evolutionary selection target in β-CoVs. 8 processes. For instance, the SARS-CoV and MERS-CoV macro domains were shown to possess poly(AD)P-ribose binding affinity, which suggested that this domain regulates cellular proteins that are important for an apoptotic way via poly(ADP)-ribosylation to mediate the host response to infection. 4 Even though X-ray structure is available for the MERS-CoV macro domain in complex with adenosine monophosphate (AMP), such structure does not capture the impact of the bulk solvent environment on protein structure and dynamics and provides a rather limited view of the underlying structural and functional residue-level characteristics. A detailed understanding of the structural properties of MERS-CoV macro domain in solution will provide the lacking structural information on CoVs and may be used for comparison with SARS-CoV-2 macro domain. In the long run, the information gleaned from such structural studies could help to design more efficient treatments including vaccines and small molecule drugs. Therefore, we present the characterization of the structural properties of MERS-CoV macro domain in aqueous solution at body temperature with dynamics at the atomic level via linking parallel tempering simulations to bioinformatics. We combine these results with several residue-level analyses that focus on the structural flexibility, presence of intrinsically disordered regions, and functional features related to the predisposition for protein-protein and protein-nucleic acid interactions. However, the chosen simulation techniques, simulation protocols and force field parameters may impact the predicted aqueous MERS-CoV macro domain structural properties. Therefore, in this study, we conduct Hamiltonian-replica exchange molecular dynamics simulations and Temperature-replica exchange molecular dynamics simulations and we also look at the impacts of CHARMM36m and AMBER99SB parameters on the calculated structural properties of aqueous MERS-CoV macro domain. For this study, we conducted three extensive different sets of parallel tempering simulations. Many molecular simulation scenarios require ergodic sampling of conformations. Their energy landscapes may feature many minima and barriers between minima that can be difficult to cross at ambient temperatures over reachable simulation time scales. This means that the corresponding findings are confounded by the choice of initial conditions because such conditions determine the space region that is explored by a simulation. 9 On the other hand, replica exchange simulations seek to enhance the conformational sampling by running numerous independent replicas in different conditions, and periodically exchanging the coordinates of different ensembles (replicas). 9 Usually, temperature is used as the parameters which changes among replicas, which in turn enables conformations trapped in a local minima at a low temperature to escape by passing to a higher temperature replica. Potential energy overlap is required for efficient exchange between neighboring replicas, which results in simulations with large number of replicas especially when we investigate proteins in explicit water. Specifically, for covering a desired temperature range, replica number grows as the square root of the number of particles, which in turn introduces limitations to the method's potential by means of computational costs. Hamiltonian replica exchange molecular dynamics (H-REMD) 10 provides a possible solution for alleviating temperature-replica exchange molecular dynamics (T-REMD) simulations-based limitations in which the different replicas are treated at a constant temperature while the Hamiltonian are used as a parameter and is reported to be more efficient in protein conformational sampling than T-REMD. As an enhanced technique, it is based on executing simultaneous replicas with different Hamiltonians of the system and enabling exchanges at a given frequency between i and j replicas at neighboring scales m and n with a probability 10 of In the H-REMD simulations, we used the CHARMM36m parameters 13 for the MERS-CoV macro domain and the TIP3P model for water. 14 We isolated the initial structure for the MERS-CoV macro domain from the publicly available crystal structure (PDB ID: 5zu7). We applied a water layer of 10 Å with 11 827 water molecules to solvate the macro domain using a cubic box. Energy minimization was per- short-range Coulombic interactions were truncated at 12 Å, and the long-range electrostatic interactions were calculated using the particle mesh Ewald method. 17 The neighbor list was updated every 10 steps with a cut-off of 12 Å. The LINCS algorithm 18 was used to constrain all bond lengths during the H-REMD simulations. An exchange between neighboring replicas was attempted every 2 ps, and the coordinates were also saved every 5 ps. The H-REMD were tested for convergence of the replica at λ m = 1.0 from H-REMD simulations for further analysis (see Supplementary Materials section [Appendix S1]). T-REMD simulations 19 were performed between the temperatures ranging from 280 to 320 K using 32 replicas distributed exponentially between these temperatures. We used the CHARMM36m parameters 13 and the AMBER99SB parameters 20 for the MERS-CoV macro domain and the explicit TIP3P model for water 14 for studying the impact of these chosen force field parameters on the predicted structural properties of the macro domain in water. 21 After solvating the macro domain in water by using a 10 Å water layer (11 827 water molecules), we first conduct equilibration simulations for 20 ns (per replica) using the canonical ensemble and then for additional 20 ns (per replica) using the isothermal-isobaric ensemble. We run T-REMD simulations for a total simulation time of 6.4 μs (200 ns per replica). We perform exchanges between replicas every 5 ps with a time step of 2 fs. We save trajectories every 500 steps and the LINCS algorithm was used to constrain all bond lengths in both simulations. The electrostatic and van der Waals interactions were calculated using the particle mesh Ewald (PME) method and the real-space components truncated at 12 Å. We controlled the temperature and pressure using a velocity rescaling algorithm with a relaxation time of 0.1 ps and a Parrinello-Rahman barostat 16 We calculate the content of the secondary structure components per residue for the aqueous MERS-CoV macro domain utilizing the DSSP program both for data obtained from H-REMD simulations and different sets of T-REMD simulations using CHARMM36m and AMBER99SB parameters (see above). 22 Additionally, we determine the end-to-end distances (R EE ) and radius of gyration (R g ) of the MERS-CoV macro domain in water using all converged trajectories. Based on the relationship between the R g and R EE values, we apply the k-means clustering method to perform vector quantization and consequently to partition the structural observations into five clusters. We assign each observation to the cluster with the nearest cluster centroid that serves as a prototype of the cluster. 23 This way the structural data space is partitioned into Voronoi cells and the k-means clustering minimizes within cluster variances using squared Euclidean distances. Finally, we compute the root mean square fluctuations for each residue of the MERS-CoV macro domain in water. We compare these results to findings secured by using disorder predictors, which we describe next. In addition, we perform residue-level analysis of the intrinsic disorder predisposition of the MERS-CoV macro domain and selected functional features related to its protein and nucleic acid binding potential. We evaluate the intrinsic disorder predisposition using a set of commonly utilized and publicly available computational tools, such as PONDR VLXT, 24 PONDR VSL2, 25 PONDR FIT, 26 and IUPred capable of predicting long and short disordered regions. [27] [28] [29] Residue-level predisposition of this domain to interact with proteins was evaluated with the state-of-the SCRIBER (SeleCtive pRoteIn-Binding rEsidue pRedictor) method. 30 SCRIBER is currently the most accurate method that predicts protein-binding residues (PBRs), and the only tool that Arg152-Val157 (probability; 70%-100%). Additionally, we detect ten turn structure regions via H-REMD simulations with higher abundancies and these are Gly1-Glu10 (probability, 9%-95%), Ile14-Cys17 (probability; 7%-100%), Asp24-Gly33 (probability; 2%-100%), Asn42-Leu45 (probability; all at $100%), Ala58-Gly61 (probability; 1%-100%), Ala73-Asp81 (probability; 0.2%-100%), Gly87-Asn93 (probability; 1%-100%), Asp101-Lys105 (probability; all at $64%), Ala117-Leu123 (probability; 17%-100%) and Leu128-Gly135 (probability; 94%-100%). While the six α-helices were also observed in the NMR measurements, 34 they also annotate adjacent residues as helical, but this might be related to the buffer used in the experiments. NMR measurements 34 also detected seven β-sheet structure regions in MERS-CoV macro domain. However, we should mention here that the abundancies of α-helix conformations are higher with H-REMD simulations using the CHARMM36m parameters in comparison to NMR experiments. From the T-REMD simulations using the CHARMM36m parameters (Figure 2 ), we find again six α-helix regions and these are located at Ala25-Tyr32 (probability; 1%-100%), Gly50-Ser59 (probability; 91%-100%), Ala62-Lys74 (probability; 95%-100%), Val108-Asn119 (3%-98%), Pro138-Glu148 (probability; 84%-100%) and Gln160-Thr167 (11%-100%). We note that results yield the same resi- These are located at Glu10-Thr15 (probability; 34%-100%), Val18-Leu22 (probability; all at 100%), Ser35-Ala41 (probability; 6%-100%), Asp81-Gln86 (probability; 27%-100%), Asn93-Val98 (probability; 97%-100%), Leu123-Pro127 (probability; 25%-100%) and Arg152-Val157 (probability; 77%-100%). Moreover, we find 13 turn regions with high abundancies from these T-REMD simulations and these are located at Gly1-Glu10 (probability; 1%-99%), Ile14-Cys17 (probability; 5%-100%), Lys30-Ser35 (probability; 2%-100%), Asn42-Gly48 (probability; 3%-100%), Ser59-Gly61 (probability; 7%-100%), Gln78-Gly80 (probability; all at 100%), Gly87-Asn93 (probability; 3%-100%), Asp101-Lys105 (probability; all at 54%), Asp107-Ser109 (probability; 19%), Lys116-Leu123 (probability; 7%-98%), Leu128-Gly135 (probability; 94%-99%), Arg147-Thr151 (probability; 1%-3%) and Ser126-Thr167 (probability; 6%-7%). The turn structure abundancies per residue obtained from these T-REMD simulations vary from results obtained from our H-REMD simulations (see above). We find again six α-helix regions in the structures of MERS-CoV macro domain, which agree with NMR experiments, by T-REMD simulations using the AMBER99SB parameters ( Figure 2 ). These are located at Ala25-Tyr32 (probability; 9%-99%), Gly50-Ser59 (probability; 71%-100%), Ala62-Lys74 (probability; 91%-100%), Val108-Ala120 (probability; 1%-94%), Pro138-Glu148 (probability; 37%-100%) and Gln160-Leu166 (67%-100%). The abundancies of α-helix are slightly smaller at some residues (Pro138-Glu148) using the AMBER99SB parameters rather than the CHARMM36m parameters for the macro domain (see above) and show slightly better agreement with NMR experiments. The 3 10 -helix conformation is located at Pro5-Asn8 (probability; 31%), Ala102-Ala104 (probability; 58%), Val108-Ala120 (probability; 1%-50%), Gly132-Phe134 (probability; 7%) and represents four regions in the structures of the macro domain F I G U R E 2 Secondary structure elements and their residue-level probabilities recovered from the MERS-CoV macro domain structures calculated with dynamics in aqueous media: A, α-helix structure formation along with its abundances obtained from H-REMD simulations using the CHARMM36m parameters (blue), T-REMD simulations using the CHARMM36m parameters (green) and T-REMD simulations using the AMBER99SB parameters (yellow); B, 3 10 -helix structure formation along with its abundances obtained from H-REMD simulations using the CHARMM36m parameters (blue), T-REMD simulations using the CHARMM36m parameters (green) and T-REMD simulations using the AMBER99SB parameters (yellow); C, β-sheet structure formation along with its abundances obtained from H-REMD simulations using the CHARMM36m parameters (blue), T-REMD simulations using the CHARMM36m parameters (green) and T-REMD simulations using the AMBER99SB parameters (yellow); D, turn structure formation along with its abundances obtained from H-REMD simulations using the CHARMM36m parameters (blue), T-REMD simulations using the CHARMM36m parameters (green) and T-REMD simulations using the AMBER99SB parameters (yellow) [Color figure can be viewed at wileyonlinelibrary.com] (probability; 2%-100%), Gly87-Lys92 (probability; 97%-100%), Gly99 Current overview on MERS-CoV Emerging developments on pathogenicity, molecular virulence, epidemiology and clinical symptoms of current Middle East respiratory syndrome coronavirus (MERS-CoV) Systematic review of the efficacy and safety of antiretroviral drugs against SARS, MERS or COVID-19: initial assessment Macro domain from Middle East respiratory syndrome coronavirus (MERS-CoV) is an efficient ADP-ribose binding module: CRYSTAL STRUCTURE AND BIOCHEM-ICAL STUDIES MERS-CoV: understanding the latest human coronavirus threat. Viruses Unique and conserved features of genome and proteome of SARS-coronavirus, an early Split-off from the coronavirus group 2 lineage Supramolecular architecture of the coronavirus particle Extensive positive selection drives the evolution of nonstructural proteins in lineage C Betacoronaviruses Applications of molecular dynamics simulation in structure prediction of peptides and proteins Hamiltonian replica exchange molecular dynamics using soft-Core interactions GROMACS: a message-passing parallel molecular dynamics implementation PLUMED 2: new feathers for an old bird CHARMM36 all-atom additive protein force field: validation based on comparison to NMR data Comparison of simple potential functions for simulating liquid water Canonical sampling through velocity rescaling Polymorphic transitions in single crystals: a new molecular dynamics method Particle mesh Ewald: an N Álog( N ) method for Ewald sums in large systems Metallic Systems: A Quantum Chemist's Perspective Replica-exchange molecular dynamics method for protein folding Comparison of multiple Amber force fields and development of improved protein backbone parameters How accurate are your simulations? Effects of confined aqueous volume and AMBER FF99SB and CHARMM22/CMAP force field parameters on structural ensembles of intrinsically disordered proteins: amyloid-β 42 in water Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers J. the global K-means clustering algorithm Sequence complexity of disordered protein Lengthdependent prediction of protein intrinsic disorder PONDR-FIT: a meta-predictor of intrinsically disordered amino acids IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences. Bioinformatics Review and comparative assessment of sequencebased predictors of protein-binding residues The authors acknowledge TRUBA resources because the numerical calculations reported in this study were performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources). The peer review history for this article is available at https://publons. com/publon/10.1002/prot.26150. The data that support the findings of this study are available from the corresponding author upon reasonable request. https://orcid.org/0000-0002-7749-0314Orkid Coskuner-Weber https://orcid.org/0000-0002-0772-9350