key: cord-0964729-4jrro0gd authors: Du, Qi-Shi; Wang, Shu-Qing; Zhu, Yu; Wei, Dong-Qing; Guo, Hong; Sirois, Suzanne; Chou, Kuo-Chen title: Polyprotein cleavage mechanism of SARS CoV M(pro) and chemical modification of the octapeptide date: 2004-07-31 journal: Peptides DOI: 10.1016/j.peptides.2004.06.018 sha: d3a95f272935fd267310ab6f8fc247b62b531402 doc_id: 964729 cord_uid: 4jrro0gd The cleavage mechanism of severe acute respiratory syndrome (SARS) coronavirus main proteinase (M(pro) or 3CL(pro)) for the octapeptide AVLQSGFR is studied using molecular mechanics (MM) and quantum mechanics (QM). The catalytic dyad His-41 and Cys-145 in the active pocket between domain I and II seem to polarize the π-electron density of the peptide bond between Gln and Ser in the octapeptide, leading to an increase of positive charge on C(CO) of Gln and negative charge on N(NH) of Ser. The possibility of enhancing the chemical bond between Gln and Ser based on the “distorted key” theory [Anal. Biochem. 233 (1996) 1] is examined. The scissile peptide bond between Gln and Ser is found to be solidified through “hybrid peptide bond” by changing the carbonyl group CO of Gln to CH(2) or CF(2). This leads to a break of the π-bond system for the peptide bond, making the octapeptide (AVLQSGFR) a “distorted key” and a potential starting system for the design of anti SARS drugs. Spreading of severe acute respiratory syndrome (SARS) [22, 26, 27, 31] in Asia, North America and several countries in Europe prompted an unprecedented global effort to fight the disease. Researchers have identified crucial proteins of SARS-coronavirus and thousands of compounds are being screened in an effort to find new drugs [7, 20, 32] . Anand et al. [7] based on the experimental M pro structures of human coronavirus (HCoV) and porcine transmissible gastroenteritis virus (TGEV) complex. Recently, the crystal structures of SARS CoV M pro at different pH values were reported by Yang et al. [34] . The X-ray structures revealed some differences with the homology model obtained earlier [7] . However, experimental structure of SARS CoV M pro folds in an arrangement that is similar to the HCoV and TGEV M pro structures, and both homology model and experimental structure have a His-Cys catalytic dyad between domains I and II [34] . The structures of the three coronavirus main proteinases reveal remarkable degree of conservation on the substrate-binding sites and form the structure basis for rational drug design [7, 20] . 0196 At the same time, structure-based drug design has been progressing based on the molecular structure of SARS-CoV M pro [20] . AG7088 suggested by Anand et al. [7] could well serve as a starting point to design efficient inhibitors for SARS CoV M pro . AG7088 is a peptide inhibitor designed based on the structure of HCoV M pro and is being clinically tested by Pfizer for the treatment of the human common cold. Chou et al. [20] have found some deficiency of AG7088 for binding to SARS CoV M pro and suggested to use its derivative KZ7088 that could form better interactions with the active pocket of SARS CoV M pro . Chou et al. also proposed an octapeptide inhibitor NH 2 AVLQ↓SGFR COOH (the cleavage site is indicated by ↓) based on molecular modelling. This suggestion was supported by the recent work of Yang et al. [34] . In their structural determination, these authors used a decade peptide NH 2 TSAVLQ↓SGFR COOH, which is quite similar to the octapeptide NH 2 AVLQ↓SGFR COOH. According to a very recent report by Gan et al. [23] , the octapeptide originally proposed by Chou et al. [20] has been synthesized and tested as the most active in inhibiting replication of the SARS coronavirus compared with other compounds reported. Moreover, it has been found that the octapeptide had no toxicity in vivo under the physiological concentration [23] . That a peptide is cleavable by a protease means that there is good binding between ligand and receptor on the active region of protease and that the peptide has a scissile bond to be cleaved (see, e.g., [13] ) and a comprehensive review [12] . However, one needs to make some further chemical modification for the octapeptide in order to stabilize its inhibition power to SARS CoV M pro and make it become an effective drug. This may be realized based on the "distorted key" theory [12, 13] . To reach such a goal, a detailed understanding of the cleavage mechanism of the octapeptide by SARS CoV M pro as well as the 3D structure of the enzyme is essential. Similar strategies have been used to investigate Cdk5-Nc5a * -ATP complex [19, 36] , apoptosis proteins [10, 11, 15, 17] , and beta-secretase zymogen [14] . Many useful insights have been gained through these studies. Accordingly, it is expected that the present study may also provide useful insights for the development of anti SARS drug. The protease-susceptible sites in a given protein or peptide usually extend to an octapeptide region [12, 13] . The corresponding amino acid residues are sequentially symbolized by eight subsites R 4 , R 3 , R 2 , R 1 , R 1 , R 2 , R 3 , R 4 , and the eight combination positions of protease are noted by S 4 , S 3 , S 2 , S 1 , S 1 , S 2 , S 3 , S 4 (see, e.g., [8, 9] as well as Fig. 3 of [12] ). Occasionally, the susceptible sites in some proteins may contain one subsite less or more [13] , however, eight amino acid residues are the most common cases. Although the protein being cleaved contains much more than eight amino acid residues, usually only the segment of an octapeptide fits in Fig. 1 . A schematic drawing to illustrate the "distorted key" theory [12, 13] : (a) the cleavage location in the octapeptide by protease is the peptide bond between R 1 and R 1 ; (b) after chemical modification, the scissile peptide bond changes to a strong "hybrid peptide bond" and the cleavage is difficult. Adapted from Chou [12] with permission. the active site region of a protease. Therefore, our research will focus on the cleavability of an octapeptide. As shown in Fig. 1a , the combination of a thin line and a dashed line is used to represent the conjugate property in peptide bond. In Fig. 1b , however, the scissile peptide bond is replaced by a strong "hybrid peptide bond" between R 1 and R 1 through a chemical modification, and the enhanced chemical bond become not cleavable by the protease. According to the "distorted key" theory [12] , a cleavable octapeptide can be likened to a key that fits well in binding to the protease active region leading to a cleavage at its scissile bond. The octapeptide after some chemical modification can still bind to the active region but its peptide bond can no longer be cleaved by the protease. Thus, the modified octapeptide can be vividly compared to a "distorted key" that can be inserted into a lock but that can neither open it nor be easily pulled out from the lock [12] . In view of this, the modified octapeptide naturally becomes a stable competitive inhibitor and a potential candidate of drug. The octapetide AVLQSGFR is the first designed octapeptide [20] based on the molecular structure of SARS CoV M pro and is proved cleavable experimentally. In this study, we use molecular mechanical and quantum mechanical simulations to investigate the cleavage mechanism, properties of the chemical bonds concerned as well as the catalytic interaction between the octapeptide and SARS CoV M pro . The study is performed in the following four steps: (1) using molecular mechanics to minimize the energy of the SARS CoV M pro complex with the octapeptide from the basis derived from docking studies; (2) computing the atomic charge distribution around the binding pocket of SARS CoV M pro using ab initio quantum mechanics and the minimizing conformational energy; (3) computing the molecular energy, chemical bond properties, and atomic charges of the octapeptide in the background charge distribution [24] of SARS CoV M pro using ab initio quantum mechanics; and (4) in the same back- ground charge distribution, computing the molecular energy, chemical bond properties, and atomic charges of the modified octapeptide. The docking operation of the octapeptide AVLQSGFR to SARS CoV M pro was performed based on the homology structure [7, 20] using MOE (molecular operating environment) program package [28] . Twenty-five docking structures were obtained, and the one with the most optimal docking score was used for further energy minimization. Fig. 2 shows the energy-refined docked structure obtained by the aforementioned step 1. In contrast to the common serine proteases that have a Ser-His-Asp catalytic triad, SARS CoV M pro has a His-Cys catalytic dyad (His-41 and Cys-145), which is similar to TGEV M pro (His-41 and Cys-144) and HCoV M pro (His-41 and Cys-144) [7, 29] . According to Chou et al. [20] , the catalytic active region is located within the pocket between domain I and II of SARS coronavirus main protease that contains the following 23 amino acid residues: Cys-22, Gly-23, Thr-24, Thr-25, Leu-27, His-41, Val-42, Cys-44, Thr-45, Ala-46, Glu-47, Asp-48, Met-49, Leu-50, Asn-51, Pro-52, Tyr-54, Cys-145, His-164, Met-165, Asp-187, Arg-188, and Gln-189. Fig. 3a shows the location of catalytic dyad His-41 and Cys-145 in SARS CoV M pro . The active cleft of SARS CoV M pro can well accommodates the octapeptide, and the ligand binds to the receptor through six hydrogen bonds (Fig. 3b) , fully in consistent with the results reported by Chou et al. [20] . In order to study the influence of SARS CoV M pro on chemical bonds of the octapeptide, we considered a small region from the catalytic cleft surrounding the octapeptide, as shown in Fig. 3b . The catalytic dyad His-41 and Cys-145 are in the front of peptide bond Gln-Ser on the subsites R 1 and R 1 . The polar hydrogen H 2 on N 2 in midazole group of His-41 points to the peptide bond Gln-Ser and has large influence in the active region. Electrostatic interaction plays the dominant role in ligand-receptor combination and must be taken into consideration during the quantum mechanical calculations for the influence of SARS CoV M pro to the chemical bonds of the octapeptide. For this purpose, we divide the amino acid residues in the active cleft between domain I and II of SARS CoV M pro into six segments and compute the atomic charges using ab initio quantum mechanics separately (Fig. 4) . The 62 amino acid residues in the six segments are listed in Table 1 . After deducting the overlap atoms, there are a total of 953 atomic background charges, including all atoms in catalytic cleft. In Table 2 , we list Mulliken atomic charges q Mull and electrostatic potential equivalent charges q ESP , which are obtained by fitting atomic charges to the electrostatic potential at the van der Waals surface. The atomic charges from semi [25] , especially q ESP of AM1 are not reasonable. Because q ESP reproduces quantum mechanical electrostatic potential on molecular surface, in this research, we use q ESP from HF/6-31G * calculations to illustrate our points. It can be seen from Table 2 that the two polar hydrogen atoms on imidazole group have large atomic charges q ESP . The atomic charge (0.4201) of polar hydrogen H 2 is a little smaller than the atomic charge (0.4288) of proton H + δ1 on nitrogen N ␦1 and is close to the peptide bond Gln-Ser on subsites R 1 and R 1 . The polar hydrogen H 2 attracts the -electron density from peptide bond Gln-Ser so as to weaken this chemical bond. We calculate the cleavage reaction energy from the octapeptide AVLQSGFR to two tetrapeptides AVLQ and SGFR using ab initio HF/6-31G * in the gaseous phase. The molecular energies of the octapeptide and two tetrapeptides are shown in Table 3 and the hydrolyzing reaction energy is 110.8 kJ/mol. The peptide bond is considered as a pseudo -bond, i.e., a partial -bond consisting of three atoms and four electrons [33] . Table 4 shows the atomic coordinates of the six atoms on the two ends of peptide bond (Gln)C␣CO NHC␣(Ser). The X-coordinates of the six atoms are almost the same. The carbonyl group CO of glutamine and the nitrogen atom N(NH) of Serine form a 4 3 bond. The catalytic dyad His-41 and Cys-145 in the active pocket between domain I and II attracts -electron density from the peptide bond Gln-Ser, causing the increase of positive charge on C(CO) of glutamine and negative charge on N(NH) of serine, and that the electrophilic proton H + attacks N(NH) of serine and neucleophilic OH − attacks C(CO) of glutamine, respectively. The catalytic functional group is the imidazole ring of His-41 and plays the acid-base catalytic role. The pK value of imidazole group of histidine is 6.0, the concentration of [H + ] is the same as in water, and hence, His-41 serves as a good proton provider in life condition [33] . The electron density counters surrounding atom N(NH) and C(CO) form two triangles like sp 2 hybrid orbits, therefore, the 2p x orbits of the three atoms, which are perpendicular to the plane, form a 4 3 bond system. In Table 4 , we list the atomic charges of the six atoms on both sides of peptide bond Gln-Ser obtained from ab inito HF/6-31G * calculations in gaseous phase with the background charges of SARS CoV M pro . The negative charge of N(NH) in serine increases to −0.8689 in SARS CoV M pro background charges from −0.8344 in gaseous phase. The positive charge of carbonyl carbon C(CO) of glutamine increases to 0.8074 in the background charges from 0.7706 in gaseous phase, and hence, this is favorable to the cleavage reaction. Fig. 5 is the counter map of electronic density difference of peptide bond Gln-Ser in the octapeptide AVLQSGFR obtained by subtracting the electron density in gaseous phase from the electron density in the background charges [24] of SARS CoV M pro . In Fig. 5 , the grey bold line is the 0-value line that means the electronic densities in gaseous phase and in protease background charges are unchanged. The solid thin lines show the regions where the electron densities are greater in SARS CoV M pro background charges than in gaseous phase and the dashed thin lines show the areas where the electron densities are smaller in protease background charges than in gaseous phase. We find that along the peptide bond between (Gln)C-N(Ser) the electron densities increase on N(Ser) side and decrease on (Gln)C side. This change is favorable for the neucleophilic attack of anion OH − on (Gln)C and electrophilic attack of cation H + on N(Ser). SARS CoV M pro has a very high selectivity [7, 34] , and in the polyprotein cleavage sites, the subsite R 1 is invariably occupied by Gln. A simple routine way to make the octapeptide AVLQSGFR to an effective inhibitor is to change the cleavable scissile peptide bond to the solid single bond by some chemical modification [12, 30] . If we replace the carboxyl group CO of glutamine on subsite R 1 to CH 2 or CF 2 group, the -bond system is broken and the modified octapep-tide AVLQSGFR [20] may become a competent inhibitor for SARS CoV M pro and an effective drug candidate against SARS. Here we show the possibility of chemical modification to the octapeptide AVLQSGFR through computational modeling. Table 5 lists atomic charges of the six atoms on the both sides of hybrid peptide bond Gln-Ser after changing carbonyl group CO of Gln to CH 2 group. Comparing with Table 4 , we find that the carbon atom in CH 2 group of hybrid peptide bond turns to be negative −0.1040 from positive charge 0.7706 in CO group (see Table 4 ), and hence, the neucleophilic attack by OH − is impossible. On the other hand, the negative charge of N(NH) of the Ser side decreases from −0.8689 to −0.7771 in Table 4 , and hence, the electrophilic attack by H + is more difficult. The third row in Table 5 is the atomic charges in the background charge distribution of SARS CoV M pro . The atomic charge of C(CH 2 ) on Gln side is down to −0.1211, and hence, the neucleophilic attack by OH − is, indeed, more difficult. The octapeptide NH 2 Ala-Val-Leu-Gln-Ser-Gly-Phe-Arg COOH is the first one designed based on the molecular structure of SARS CoV M pro [20] and is proved to be a cleavable octapeptide experimentally. The last eight amino acid residues in the decapeptide used by Yang et al. [34] are exactly the same as the octapeptide AVLQSGFR originally investigated by Chou et al. [20] . A cleavable octapeptide could be changed to an effective inhibitor of SARS CoV M pro or a candidate for anti SARS drug after a proper chemical modification to replace the scissile peptide bond to a hybrid peptide bond. The modified octapetide loses its cleavability but it can still bind to the active site, thus becoming a stable inhibitor [16] or a "distorted key" [12] . The number of possible octapeptides is huge (20 8 = 2.56 × 10 10 ) [12, 13] and the octapeptide AVLQSGFR may be not the best cleavable one. Accordingly, in searching for the potential inhibitors, a matter of paramount importance is to discern what kind of peptides can be cleaved by SARS CoV M pro and what kind cannot. Even limited in the range of octapeptides, it is by no means easy to answer the question. It would be exhausting to experimentally test out such a huge number of possible octapeptides. A good cleavable octapeptide could be found by using some existing prediction algorithms, such as discriminant function algorithm [12, 18] , vectorized sequence-coupled algorithm [13] , and some other relevant algorithms [8, 9, 35] . These prediction methods have been successfully applied to help the inhibitor design for HIV protease [1] [2] [3] [4] [5] [6] 12, 16] . Any statistical prediction methods depend on training sets of known samples [21, 37] . For the prediction of cleavable octapeptides for SARS CoV M pro , we need two types of sets: one is positive training set consisting of cleavable octapeptides and other is negative training set consisting only of uncleavable octapeptides. The building of negative training set is easy. For example, if we know a protein consisting of 129 amino acid residues is not cleavable by SARS CoV M pro , we can get 122 = 129 − 7 non cleavable octapeptides immediately. However, building a positive training set needs some experiments. Computer simulation and molecular modeling are a good ways to accelerate this procedure. The method and conclusion from this research is helpful for this purpose. Kinetic studies with the nonnucleoside HIV-1 reverse transcriptase inhibitor U-88204E Steady-state kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-87201E Steady-state kinetic studies with the polysulfonate U-9843, an HIV reverse transcriptase inhibitor Kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-90152E The benzylthio-pyrididine U-31355 is a potent inhibitor of HIV-1 reverse transcriptase The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase Coronavirus main proteinase (3CLpro) structure: basis for design of anti-SARS drugs A formulation for correlating properties of peptides and its application to predicting human immunodeficiency virus proteasecleavable sites in proteins Predicting cleavability of peptide sequences by HIV protease via correlation-angle approach Solution structure of BID, an intracellular amplifier of apoptotic signaling Solution structure of the RAIDD CARD and model for CARD/CARD interaction in caspase-2 and caspase-9 recruitment Review: prediction of HIV protease cleavage sites in proteins A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins Prediction of the tertiary structure of the beta-secretase zymogen Prediction of the tertiary structure and substrate binding site of caspase-8 Review: Steady-state inhibition kinetics of processive nucleic acid polymerases and nucleases Prediction of the tertiary structure of a caspase-9/inhibitor complex Predicting HIV protease cleavage sites in proteins by a discriminant function method A Model of the complex between cyclin-dependent kinase 5(Cdk5) and the activation domain of neuronal Cdk5 activator Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS (erratum: ibid Review: Prediction of protein structural classes Identification of a novel coronavirus in patients with severe acute respiratory syndrome Synthesis and biological evaluation of a novel SARS CoV Mpro inhibitor Exploring chemistry with electronic structure methods: a guide to using Gaussian Ab initio molecular orbital theory A novel coronavirus associated with severe acute respiratory syndrome N-terminal domain of the murine coronavirus receptor CEACAM1 is responsible for fusogenic activation and conformational changes of the spike protein Molecular operating environment (MOE) SARS coronavirus BJ03 isolate genome sequence Principles of protein structure Severe acute respiratory syndrome and influenza: virus incursions from southern China Virtual screening for SARS CoV protease based on KZ7088 pharmacophore points The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor An alternate-subsite-coupled model for predicting HIV protease cleavage sites in proteins Identification of the N-terminal functional domains of Cdk5 by molecular truncation and computer modeling Subcellular location prediction of apoptosis proteins This work is supported by grants from the Tianjin Commission of Sciences and Technology under the contract number 023618211 and the Chinese National Science Foundation under the contact no. 20373048.