key: cord-308583-vtmwv8zl authors: Du, Qishi; Wang, Shuqing; Wei, Dongqing; Sirois, Suzanne; Chou, Kuo-Chen title: Molecular modeling and chemical modification for finding peptide inhibitor against severe acute respiratory syndrome coronavirus main proteinase date: 2005-02-15 journal: Analytical Biochemistry DOI: 10.1016/j.ab.2004.10.003 sha: doc_id: 308583 cord_uid: vtmwv8zl Abstract Severe acute respiratory syndrome (SARS) is a respiratory disease caused by a newly found virus, called SARS coronavirus. In this study, the cleavage mechanism of the SARS coronavirus main proteinase (Mpro or 3CLpro) on the octapeptide NH2-AVLQ↓SGFR-COOH was investigated using molecular mechanics and quantum mechanics simulations based on the experimental structure of the proteinase. It has been observed that the catalytic dyad (His-41/Cys-145) site between domains I and II attracts the π electron density from the peptide bond Gln–Ser, increasing the positive charge on C(CO) of Gln and the negative charge on N(NH) of Ser, so as to weaken the Gln–Ser peptide bond. The catalytic functional group is the imidazole group of His-41 and the S in Cys-145. Nδ1 on the imidazole ring plays the acid–base catalytic role. Based on the “distorted key theory” [K.C. Chou, Anal. Biochem. 233 (1996) 1–14], the possibility to convert the octapeptide to a competent inhibitor has been studied. It has been found that the chemical bond between Gln and Ser will become much stronger and no longer cleavable by the SARS enzyme after either changing the carbonyl group CO of Gln to CH2 or CF2 or changing the NH of Ser to CH2 or CF2. The octapeptide thus modified might become an effective inhibitor or a potential drug candidate against SARS. nucleotides. The replicase gene alone encompasses more than 21,000 nucleotides, 2/3 of the whole genome sequence. The replicase gene encodes two overlapping polyproteins, pp1a (4377 a.a.) and pp1ab (7072 a.a.). SARS CoV main proteinase (M pro ) cleaves the polyprotein at no less than 11 conserved sites, a process initiated by the enzymeÕs own autolytic cleavage from pp1a and pp1ab [7, 8] . The released polypeptides mediate all of the functions required for SARS viral replication and transcription. The functional importance of the M pro 0003-2697/$ -see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ab.2004. 10.003 in the viral life cycle makes it an attractive target for the drugs design against SARS [7] [8] [9] [10] . Anand et al. [7] suggested that the rhinovirus 3C pro inhibitor AG7088 could well serve as a starting point for an anti-SARS drug based on the theoretical homology model of SARS CoV M pro . Chou et al. [9] found the fitting problem of AG7088 to the binding pocket of SARS CoV M pro , and they suggested its derivative KZ7088 as a better starting point. Sirois et al. [11] did virtual screening for SARS-CoV protease based on KZ7088 pharmacophore points. Chou et al. [9] also suggested an octapeptide NH 2 -AVLQ fl SGFR-COOH that can be converted to an inhibitor of SARS CoV M pro by some chemical modification. This octapeptide was synthesized, proved to be a cleavable peptide, and showed strong inhibiting activity in a tube test [12] . Du et al. [13] analyzed the cleavage mechanism of the octapeptide by SARS CoV M pro and discussed the possible chemical modification of the octapeptide using molecular dynamical and quantum mechanical simulations based on the homology structure of SARS CoV M pro , to convert it into a stable and effective drug. This progress on finding inhibitors against the SARS enzyme is also summarized in a recent review [10] . The experimental structure of SARS CoV M pro was first determined by RaoÕs research group [8] . They provided several crystal structures of SARS CoV M pro at different pH conditions and the complex structure of SARS CoV M pro with ligand hexapeptidyl CMK [7, 8] . The experimental structure and the theoretical homology structure of SARS CoV M pro share many structural similarities, such as the catalytic active region and the Cys-His cleavage dyad [7, 8] . However, there are several differences between the theoretical and the experimental structures. For examples, the experimental structure of SARS CoV M pro is a dimer, the protomer A has catalytic activity, and the protomer B has no activity. However, the N finger of protomer B, which plugs in the active cleft of protomer A, plays an important role in both dimerization and maintenance of the active form of the enzyme [8] . These new findings from the experimental structure of SARS CoV M pro provide a reliable basis for inhibitor design. The octapeptide NH 2 -AVLQ fl SGFR-COOH shows strong inhibiting activity in vitro [12] . However, the use of native peptide for clinical applications has been hampered by their intrinsic metabolic instability (indigestion by enzymes) and poor absorption through membrane. A cleavable peptide of SARS CoV M pro means that it possesses a competent binding with its receptor and a breakable peptide bond in the cleavage site. To convert a cleavable peptide into a stable and effective inhibitor, the peptide has to be modified according to the ''distorted key'' theory [14, 15] . One important step for the chemical modification is a good understanding of the cleavage mechanism, particularly the changes in chemical bonds [16] [17] [18] [19] [20] [21] [22] . The protease-susceptible sites in a given protein or peptide usually extend to an octapeptide region [14, 15] . The corresponding amino acid residues are sequentially symbolized by eight subsites R 4 , R 3 , R 2 , R 1 , R 1 0 , R 2 0 , R 3 0 , R 4 0 and the eight matching positions in the protease by S 4 , S 3 , S 2 , S 1 , S 1 0 , S 2 0 , S 3 0 , S 4 0 , respectively (see Fig. 3 of [15] ). Occasionally, the susceptible sites in some proteins may contain one subsite less or one more. However, eight amino acid residues are the most common case. Although the protein being cleaved contains much more than eight amino acid residues, only an octapeptide segment fits in the active region of the protease. Therefore, our research will focus on the cleavability of octapeptides. As shown in Fig. 1 , the cleavage point is always on the peptide bond between R 1 and R 1 0 . In Fig. 1A we use the combination of a thin line and a dashed line to represent the conjugate p property in the peptide bond. In Fig. 1B the breakable peptide bond between R 1 and R 1 is converted to a strong ''hybrid peptide bond'' through a chemical modification and hence cannot be cleaved by the protease. The modified octapeptide can still bind to the active site, but it cannot be cleaved by the enzyme, just like a distorted key that can neither open the lock nor be pulled out. The octapeptide thus modified will automatically become a desired inhibitor candidate. The octapeptide AVLQSGFR is the first suggested [9] based on the molecular structure of SARS CoV M pro and has been proved cleavable by experiments. In this research we study the cleavage mechanism, the properties of the relevant chemical bonds, and the catalytic interactions between the octapeptides and the SARS CoV M pro using molecular mechanical and quantum chemical simulations to provide useful insights for the chemical modification. The study was carried out in the following four steps: (1) docking the octapeptide to SARS CoV M pro followed by an energy minimization for the complex structure using molecular mechanics, (2) computing the atomic charge distribution in the binding pocket for the energy-minimized structure using ab initio quantum mechanics, (3) computing the molecular energy, chemical bond properties, and atomic charges of the octapeptide with the background charge distribution [23] of SARS CoV M pro using ab initio quantum chemistry, and (4) with the same background charge distribution, computing the molecular energy, chemical bond properties, and atomic charges of the modified octapeptide. The computations have been carried out with the molecular dynamics and standard ab initio quantum chemistry at the HF/6-31G* level [23] . The experimental structure (1uj1.pdb) of SARS CoV M pro [8] is used in the calculation and shown in Fig. 2A . Each protomer structure of the SARS CoV M pro dimer has three domains. The two protomers A and B form a right angle and the join part is in the domain III. The N finger of protomer B plugs into the domain II of protomer A and plays an important role in the catalytic activity. The PDB data of SARS CoV M pro contains 300 hydration water molecules, which provide the solvation ion environment with some water molecules joining the catalytic reaction. In Fig. 2A the green signs ''<'' represent the solvation water molecules. The docking calculation between SARS CoV M pro and the octapeptide AVLQSGFR was performed using the molecular although still bound to the active site, the peptide has lost its cleavability after its scissile bond was modified from a hybrid peptide bond to a strong bond. Adapted from Chou [15] with permission. operating environment program package [24] . A total of 25 docking conformations were obtained, and the one with the lowest energy was used for further energy minimizing calculation. Fig. 2B shows the complex structure of SARS CoV M pro with the ligand octapeptide. The positions of catalytic dyad His-41 and Cys-145 are indicated in Fig. 2B . According to Anand et al. [7] , Yang et al. [8] , and Chou et al. [9] , the catalytic active region is in the pocket between domains I and II of protomer A. During calculation the catalytic active region on which we were focusing contains the amino acid residues Cys-22, Gly-23, Thr-24, Thr-25, Leu-27, His-41, Val-42, Cys-44, Thr-45, Ala-46, Glu-47, Asp-48, Met-49, Leu-50, Asn-51, Pro-52, Tyr-54, Phe-140, Leu-141, Asn-142, Gly-143, Ser-144, Cys-145, Glu-163, His-164, Met-165, Glu-166, His-172, Asp-187, Arg-188, and Gln-189 from promoter A and the residues Phe-3, Arg-4, Lys-5, Met-6, and Ala-7 from promoter B. But in the practical calculation, the actual region involved contains much more than the above-listed amino acid residues. It includes all the amino acid residues within 5 Å surrounding the octapeptide [9, 18] . water molecule plays an important role in the proteolytic reaction. These results are derived from the complex structure obtained by following the aforementioned docking and energy-minimization procedures. The hydrophilic and hydrophobic complementary interaction is important in the ligand-receptor interaction. According to the a sphere model [25, 26] , the active region of proteinase consists of a series of a spheres. Each interaction site between the ligand and receptor contains one or more a spheres. There must be at least one hydrophobic interaction site. Figs. 4A and B are the hydrophilic and hydrophobic surfaces of octapeptide AVLQSGFR and the catalytic active region of SARS CoV M pro , respectively. In the figures the green parts are hydrophobic surfaces and the blue parts are hydrophilic surfaces. There is good hydrophilic and hydrophobic complementary interaction between ligand and receptor: the hydrophilic parts of the ligand correspond to the hydrophilic parts of the receptor and the hydrophobic parts of the ligand correspond to the hydrophobic parts of the receptor. The octapeptide NH 2 -AVLQ fl SGFR-COOH has four hydrophobic amino acid residues: R4(Ala), R3(Val), R3 0 (Phe), and R4 0 (Arg). Among them R4(Ala) and R3(Val) are exposed in solvent and R3 0 (Phe) and R4 0 (Arg) are covered by hydrophobic surfaces of the proteinase. The peptide bond to be cleaved between R1(Gln) and R1 0 (Ser) is just in the join point between the exposed part and the covered part. The calculations were focused on the effect of SARS CoV M pro on the peptide bond to be cleaved. How to properly treat a biomacromolecule and the solvent effects is a nontrivial problem in quantum chemical calculations. Electrostatic interaction plays the dominant role in ligand-receptor combinations and must be taken into account in the quantum mechanical study. In view of this, we treated the SARS CoV M pro as a background with many point charges. The amino acid residues in the catalytic region between domains I and II of protomer A were divided into seven segments, and their atomic charges were separately calculated using the ab initio quantum chemical HF/6-31G* method. The 5 amino acid residues of the N finger of protomer B were taken into account also. The 76 amino acid residues in eight segments are listed in Table 1 . There are 1148 atomic charges in the background, including all atoms in the catalytic cleft; the overlapped atoms are not counted. Another difficult problem for quantum mechanical calculation for a biological system in solution is the solvent effect. In the PDB data [8] of SARS CoV M pro there are 300 solvation water molecules. These water molecules not only provide the solvation environment but also indicate that some water molecules are involved in the catalytic reaction. In our calculation the 300 solvation water molecules were treated as the background charges. Each water molecule has 3 point charges. Therefore, there are 1148 + 300 · 3 = 2048 background point charges. In Table 2 we list the Mulliken atomic charges q Mull and electrostatic potential equivalent charges q ESP of His-41, which are obtained by fitting atomic charges to the electrostatic potential at the van der Waals surface. The atomic charges from semiempirical method AM1 are quite different from the results by ab initio HF/6-31G* [23] ; in particular; the q ESP charges of AM1 do not appear reasonable. Because the HF/6-31G* q ESP charges reproduce the quantum mechanical electrostatic potential on the molecular surface, in this research we use q ESP from the HF/6-31G* calculations to illustrate our points. The imidazole group of His-41 is a nucleophilic agent and its pK = 6.0, near the concentration of proton [H + ] in pure water. The imidazole ring is both a proton donor and a receptor. It has good catalytic activity in the physiological condition [27] . It can be seen from Table 2 that the negative charge of N d1 (À0.4094) and the positive charge of H þ d1 (0.4318) are large in magnitude and are close to the peptide bond Gln-Ser. Therefore, the proton H þ d1 attracts the p electron density from the peptide bond Gln-Ser so as to weaken this chemical bond. The polar nitrogen N d1 attracts one hydrogen H 1 of bridge water and makes the water molecule dissociated, causing the nucleophilic attack of hydroxyl group OH À to the C(CO) and the electrophilic attack of H + to N(NH) on the two sides of peptide bond Gln-Ser. The sulfur atom S in Cys-145 forms a hydrogen bond with H(NH) and helps the proteolytic reaction (Fig. 3B) . The peptide bond is considered a pseudo p bond. The six atoms in the peptide bond (Gln)Ca CO-NHCa (Ser) are almost in a plane. Fig. 5A shows the electron density contour map of the peptide bond Gln-Ser in the gaseous phase on the p plane containing the carbonyl group CO of Gln and the NH group of Ser. The electron density contours surrounding atoms N(NH) and C(CO) form two triangles reminiscent of the hybrid orbits of the sp 2 type, and the 2p z orbits, which are perpendicular to the plane and form a p 4 3 bond system. In Table 3 , we list the calculated atomic charges of the six atoms on both sides of the peptide bond Gln-Ser obtained from the ab initio HF/6-31G* calculations in the gaseous phase with the background charges of SARS-CoV M pro . The negative charge of N(NH) of serine in the octapeptide increases to À0.8612 in the background of proteinase charges from the original value of À0.8534 in the gaseous phase. Similarly, the corresponding positive charge of the carbonyl carbon C(CO) in glutamine of the octapeptide increases to 0.8190 from the original 0.8050. This is favorable to the electrophilic attack of OH À to C(CO) and the nucleophilic attack of H + to N(NH) in the proteolytic reaction. Fig. 5B is the contour map of differential electronic density of the peptide bond Gln-Ser in the octapeptide AVLQSGFR, obtained by subtracting the electron density in the gaseous phase from the electron density in the background [23] of SARS CoV M pro and solvent water molecules. In Fig. 5B the bold gray line is the 0 value line where the electron densities in the gaseous phase and in the protease background are equal. The solid thin lines show the regions where the computed electron densities are greater in the SARS CoV M pro background than in the gaseous phase, and the dashed thin lines show the areas where computed electron densities are smaller in the proteinase background than in the gaseous phase. We find that along the peptide bond between (Gln)C-N(Ser) the electron densities increase on the N(Ser) side and decrease on the C(Gln) side. This change is favorable for a nucleophilic attack of anion OH À to (Gln)C and for an electrophilic attack of cation H + to N(Ser). SARS CoV M pro has very high cleavage specificity and is highly conserved in the coronavirus family [7] . The cleavage specificity of SARS CoV M pro concentrates on the sites R2, R1, and R1 0 . The development of peptides as clinically useful drugs is limited by their poor stability and bioavailability, which is due in part to their inability to readily cross membrane barriers such as the intestinal and blood-brain barriers. Systematic chemical modification strategies that convert peptides into drugs are an attractive research topic in current medicinal chemistry [28] . For the peptide inhibitor of proteinase, the chemical modification to cleavable octapeptide should focus on the scissile peptide bond between R 1 and R 1 0 to be cleaved by SARS CoV M pro . A successful example shows that after the peptide bond CO@NH is replaced by reduced single bond CH 2 ANH, the modified octapeptide rennin inhibitors have shown greatly increased affinity for rennin and strong resistance to enzyme hydrolysis [29] . In this study we suggest a similar modification to octapeptide AVLQSGFR. In the peptide bond (Gln) CO@NH(Ser) the carboxyl group CO of glutamine on subsite R 1 is replaced by CH 2 or CF 2 or the NH of serine on subsite R 1 0 is replaced by CH 2 ; then the p bond system is broken and the modified octapeptide AVLQ SGFR may become a stable inhibitor for SARS-CoV M pro and an effective drug candidate against SARS. Here we show the possibility of the chemical modification through computational modeling. Table 4 lists the atomic charges of the six atoms on both sides of the hybrid peptide bond Gln-Ser after the chemical modification. Comparing with Table 3 , we find that after CO is replaced by CH 2 , the charge of the carbon atom in CH 2 turns negative (À0.6572) from originally a positive value (0.8612) in the CO group; hence the nucleophilic attack by OH À becomes impossible. Furthermore, the negative charge of N(NH) on the Ser side decreases to À0.2892 from À0.8190 in Table 3 , and hence the electrophilic attack by H + also becomes more difficult. Likewise, the modifications CO fi CF 2 and NH fi CH 2 will make the proteolytic reaction more difficult also. Searching for cleavable peptides is the first step in finding potential inhibitors against the proteinase. Even within a limited range of octapeptides, it is by no means an easy job because the possible different patterns of octapeptides are as many as 20 8 = 2.56 · 10 10 . However, some algorithms, such as the vectorized sequence-coupled algorithm [14] and the discriminant function algorithm [15] , can be successfully used to narrow the search scope. Meanwhile, similar methods that can be used to search the cleavable peptides by SARS enzyme have been developed [5, 6] . The octapeptide NH 2 -Ala-Val-Leu-Gln-Ser-Gly-Phe-Arg-COOH is the first designed on the basis of the molecular structure of SARS-CoV M pro [9] and it has been experimentally demonstrated to be a cleavable peptide with high bioactivity [12] . A detailed study of the cleavage mechanism needs a great deal of experimental work. However, computational simulation may provide useful insights for an in-depth understanding of the mechanism. The cleavable octapeptide could be changed into an effective inhibitor of SARS-CoV M pro or a drug candidate against SARS after a proper chemical modification to replace the cleavable peptide bond with a noncleavable bond. The octapeptide thus modified will lose its susceptibility although it can still tightly bind to the active site and become a competent inhibitor, just like the role of a distorted key in blocking the access of other keys to a lock, as elucidated by Chou [15] in a review paper. Identification of a novel coronavirus in patients with severe acute respiratory syndrome A novel coronavirus associated with severe acute respiratory syndrome Severe acute respiratory syndrome and influenza: virus incursions from southern China N-terminal domain of the murine coronavirus receptor CEACAM1 is responsible for fusogenic activation and conformational changes of the spike protein ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes, and its applications in analyzing SARS-CoV genomes Prediction for proteinase cleavage sites in polyproteins of coronaviruses and its applications in analyzing SARS-CoV genomes Coronavirus main proteinase (3CL pro ) structure: basis for design of anti-SARS drugs The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS Review: structural bioinformatics and its impact to biomedical science Virtual screening for SARS-CoV protease based on KZ7088 pharmacophore points Synthesis and biological evaluation of a novel SARS-CoV M pro inhibitor Polyprotein cleavage mechanisms of SARS CoVM Pro and chemical modification of the octapeptide A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins Review: prediction of HIV protease cleavage sites in proteins Prediction of the tertiary structure and substrate binding site of caspase-8 Solution structure of the RAIDD CARD and model for CARD/CARD interaction in caspase-2 and caspase-9 recruitment A Model of the complex between cyclin-dependent kinase 5(Cdk5) and the activation domain of neuronal Cdk5 activator Solution structure of BID, an intracellular amplifier of apoptotic signaling Prediction of the tertiary structure of a caspase-9/inhibitor complex Prediction of the tertiary structure of the beta-secretase zymogen Identification of the N-terminal functional domains of Cdk5 by molecular truncation and computer modeling Exploring chemistry with electronic structure methods: a guide to using Gaussian A computational procedure for determining energetically favorable binding sites on biologically important macromolecules Functionality maps of binding sites: a multiple copy simultaneous search method Converting a peptide into a drug: strategies to improve stability and bioavailability A potent new renin inhibitor. In vitro and in vivo studies This work was supported by grants from the Chinese National Science Foundation under Contract No. 20373048 and the Tianjin Commission of Sciences and Technology under Contract Nos. 033801911 and 023618211.