key: cord-0894888-y2it4zoq authors: Ishola, Ahmed Adebayo; Adedirin, Oluwaseye; Joshi, Tanuja; Chandra, Subhash title: QSAR modeling and Pharmacoinformatics of SARS coronavirus 3C-like protease inhibitors date: 2021-05-11 journal: Comput Biol Med DOI: 10.1016/j.compbiomed.2021.104483 sha: 1838432a9597c25fbd6e6383f30dc5070773a528 doc_id: 894888 cord_uid: y2it4zoq The search for effective treatment against novel coronavirus (COVID-19) remains a global challenge due to controversies on available vaccines. In this study, data of SARS coronavirus 3C-like protease (3CLpro) inhibitors; a key drug target in the coronavirus genome was retrieved from CHEMBL database. Quantitative Structure-Activity Relationship (QSAR) studies, Molecular docking, Absorption-Distribution-Metabolism-Excretion-Toxicity (ADMET) and molecular dynamics simulation (MDS) were carried out using these 3CLpro inhibitors. QSAR model constructed using the data had correlation coefficient R(2) value of 0.907; cross-validated correlation coefficient Q(2) value of 0.866 and test set predicted correlation coefficient R(2)(pred) value of 0.517. Variance inflation factor (VIF) values for descriptors contained in the model ranged from 1.352 to 1.68, hence, these descriptors were orthogonal to one another. Therefore, the model was statistically significant and can be used to screen and design new molecules for their inhibitory activity against 3CLpro. Molecular docking showed that seven of the compounds (inhibitors) used in the study had a remarkable binding affinity (-9.1 to 10.0 Kcal/mole) for 3CLpro. ADMET study revealed that five (CHEMBL Accession IDs 19438, 196635, 377150, 208763, and 210097) of the seven compounds with good binding ability obeyed Lipinski’s rule of five. Hence, they were compounds with drug-like properties. MDS analysis revealed that 3CLpro-compound 21, 3CLpro-compound 22, 3CLpro-compound 40 complexes are very stable as compared to the reference 3CLpro-X77 complex. Therefore, this study identified three potent inhibitors of 3CLpro viz. CHEMBL194398, CHEMBL196635, and CHEMBL210097 that can be further explored for the treatment of COVID-19. Coronavirus re-emerged recently in the Wuhan region of China as a novel coronavirus (CoVID- 19) causing severe upper respiratory tract infection with symptoms that include; fever, pneumonia, dyspnea, and asthenia reported by people in Wuhan [1] [2] [3] . Since then, the virus has spread to almost all countries in the world prompting several lockdown measures by governments to curb the spread of this disease. Even with increasing attention on the development of vaccines to stop the daily mortality recorded, all effort so far has proved abortive with about 98.2 million reported cases and over 2.1 million deaths globally [4] . Coronavirus is a positive-stranded RNA virus with the largest genome of all know RNA viruses with a length of about 26 to 32 kb [5] . The coronavirus genome encodes 4 crucial structural proteins namely; the spike (S) protein, nucleocapsid (N) protein, membrane (M) protein, and the envelope (E) protein, all of which are essential for the production of a structurally complete viral particle [6, 7] . Besides encoding structural proteins, a significant part of the coronavirus genome is transcribed and translated into a polypeptide, which encodes proteins essential for viral replication and gene expression [8] . One of the best-characterized drug targets among coronaviruses is the chymotrypsin-like protease (3CLpro) [9] . Together with the papain-like protease (PLpro), 3CLpro is crucial for processing the translated polyproteins from the viral RNA [10] . The highly conserved 3CLpro consisting of about 306 amino acids, is a key enzyme for coronavirus replication. Consequently, it is a vital target for the development of vaccines against coronavirus. The design and development of pharmaceutical agents for the control of coronavirus infections is of utmost importance in our world and scientist are leaving no stone unturned in this endeavor. To this effect, scientists are into the traditional methods of obtaining new drugs by screening numerous compounds (either synthesized or extracted phytochemical agents) using non-living systems or simplified living systems such as rats, until a suitable lead is identified. These processes are the arbiter of truth in any scientific stride; however, they are time-consuming and costly. Any procedure that can assist in reducing the cost, time and still maintain scientific integrity is a welcome development. This is where computer-aided drug design methodologies (quantitative structure-activity relationship (QSAR), molecular docking, molecular dynamics simulation, and so on) comes in. QSAR establishes a mathematical relationship between chemical structures of compounds with defined biological activity and their biological activities. This relationship can be used to screen or design new molecules for better biological activity [11] . QSAR is an effective method for optimizing or correlating specific structural features or molecular descriptors like polarizability, lipophilicity, electronic and steric properties within an analogous series of molecules with their biological activities [12] . Also, molecular docking elucidates the binding of small compounds (drugs or ligands) with a known macro-molecular target (receptor) [13] . Chemical structures with the inhibitory activity against 3C-like protease deposited in the CHEMBL database were used in this study with the aim of developing a QSAR model that will reveal the structural feature of the molecules that relates to their inhibitory activity. Besides, molecular docking was carried out to show the interaction between the compounds and amino acids in the binding pocket of 3CLpro. Compounds with the most negative binding affinity were subjected to ADMET studies followed by a 100 ns molecular dynamics simulation to determine the stability of the lead compounds. J o u r n a l P r e -p r o o f The geometries of the spatial data (SDF) files of the dataset compounds were optimized in order to make the conformations have the least potential energy using the GROMOS96 force field in Swiss-PDB viewer [14] . The optimized structures were imported into PaDEL-Descriptors [15] , which calculated about 1875 molecular descriptors for each molecule. The calculated descriptors and their corresponding activity values for each molecule were arranged in an n × m matrix format (Supplementary Table 1 ). This constituted the dataset used in the study, where n is the number of molecules and m is the number of descriptors. Descriptor values were normalized to values between 0 and 1 using the equation (2) below in order to convert them to values with a similar unit which is needed for regression analysis and to reduce skewness in the measured values [16] . The selection of combinations of important descriptors that best explain the variability in the activity values of the compounds was done using a genetic algorithm (GA) which divides the descriptors into proper subsets from where models can be generated. Multiple linear regression (MLR) method available in MLRplus Validation 1.3 software was used to construct the model. The quality of the model developed in this study was assessed using validation parameters calculated by the MLRplus Validation 1.3 software. These parameters include determination coefficient R 2 , adjusted determination coefficient R 2 adj, Variance ratio F, Standard errors of estimate SEE, and Golbraikh and Tropsha criteria for an acceptable model [17] . Furthermore, co-linearity between the descriptors contained in the model was checked using the descriptors correlation matrix and their corresponding variance inflation factor. The model variance inflation factor (VIF) also known as inverse of tolerance [18] was calculated using the equation below: In equation 3, R 2 j , is the coefficient of determination of the regression of descriptor j on other descriptors contained in the model. A model cannot be used to predict the biological activity for the entire chemicals in the universe except for those in its region of reliable/acceptable prediction, which is defined in terms of J o u r n a l P r e -p r o o f descriptors contained in the model. This region is known as the applicability domain (AD) of the model. In this study, the AD of the developed model was defined using the extent of the extrapolation method. This method employs leverage h values of dataset molecules and the standardized prediction residual (SDR) of the models to define their AD. The result of this method is often visualized by the plot of h versus SDR (Williams plot). Leverage h is a special type of distance measures used to show similarity/dissimilarity among objects and it's obtained as the diagonal element of a hat matrix H: In equation 4, X is the model's descriptor matrix and X T is the transpose of matrix X. Generally, AD of models in the study was defined by a square area with vertical boundary 0 500 g/mol) which is acceptable. Lipinski's rule states that, generally, an orally active drug has no more than one violation of the following criteria: (1) Not >5 hydrogen bond donors (nitrogen or oxygen atoms with one or more hydrogen atoms). (2) Not >10 hydrogen bond acceptors (nitrogen or oxygen atoms) (3) A molecular mass < 500 g/mol and (4) an octanol-water partition coefficient log P not greater than 5 [40] . Consequently, compound 57 was not considered for further studies due to two Lipinski violations i.e. Mw > 500 g/mol and HBA > 10 ( Table 6 ). OSIRIS server identified 2, 4dichlorotoluene scaffold in compound 42 as a possible mild anti-reproductive unit while the server also highlighted the presence of dapsone constituent in compound 57 that is capable of mild tumorigenic effect (Table 6 ). The docked 3CLpro-compound complexes were subsequently used to study the detailed dynamic, structural, as well as binding behaviors to know how it targets the active site of SARS-CoV2 3CLpro. The MDS trajectories of 100 ns simulations were examined to study the detailed structural and (Table 7) . All the 3CLpro-compound complexes exhibited overall lower RMSF than the 3CLpro-X77 complex during the simulation ( Figure 6B ). The RMSF results predicted that all the 3CLpro-compound complexes were stable and can act as potential drug candidates against SARS-CoV2. The Rg of the protein and protein-ligand complex indicates the degree of compactness and rigidity of the protein. Therefore, we investigated the Rg of 3CLpro and 3CLpro-compound complexes to know how they show their compactness during the simulation run. For this, we have calculated the Rg of 3CLpro and 3CLpro-compound complexes during the 100 ns simulation time. Figure 6C showed that all the 3CLpro-compound complexes have almost J o u r n a l P r e -p r o o f similar stability as the 3CLpro protein and 3CLpro-X77 complex. (Table 7) . From Rg profiles, it has been observed that the 3CLpro-compound complexes exhibited a more compact behavior than the 3CLpro protein without ligand. The lower RMSD, reduced residuewise fluctuation, and higher compact nature in the 3CLpro-compound complexes indicate their overall stability as well as convergence. The H-bonds are essential for drug specificity, metabolization, and stability. Therefore, the Hbonding pattern was evaluated to understand the H-bond and its contributions to the overall stability of the systems. From Figure 7A , it can be observed that the maximum numbers of intermolecular hydrogen bond interactions were found to be 4 for, 3CLpro-X77 complex, J o u r n a l P r e -p r o o f The Gibbs energy plots were generated from the PC1 and PC2 coordinates and are shown in Figure 8 . In these plots, ΔG values ranging from 0 to12.5 kJ mol -1 , 0 to 14.8 kJ mol -1 , 0 to 12.9 kJ mol -1 , 0 to 13.5 kJ mol -1 , 0 to 12.4 kJ mol -1 , and 0 to 12.9 kJ mol -1 for 3CLpro-X77 complex, 3CLpro-compound 21 complex, 3CLpro-compound 22 complex, 3CLpro-compound 23 complex, 3CLpro-compound 31 complex, and 3CLpro-compound 40 complex, respectively. All the 3CLpro-compound complexes represent significantly similar energy as the 3CLpro-X77 complex, which indicates that these compounds follow the energetically favorable transitions during the MDS. To determine how strongly compounds bind to 3CLpro and their respective associated binding modes, the binding free energies were calculated using the MM-PBSA approach. The MD trajectories were analyzed through MM-PBSA to know the binding free energy values and their energy component. For this purpose, the last ns trajectory was investigated to calculate binding energies and insights into the binding modes of compounds with 3CLpro. The reference molecule X77 was found to display binding energy of -57.380 kJ mol -1 for 3CLpro ( Table 8) . Computation of the binding energy of compounds for the 3CLpro revealed that compound 40, For the last 1 ns of MD simulation trajectories, a per residue interaction energy profile was also developed using the MM-PBSA approach to identify the essential residues involved in ligand binding toward 3CLpro protein. Figure 9 shows a per-residue decomposition plot of the total binding energy of the 3CLpro-ligand complexes. Only residues that contribute most to overall binding energy are illustrated in the figure for a better representation of the results. The plot showed that the strongly involved amino acids in all complexes were Thr25, Leu27, His41, Cys44, Met49, Gly143, Cys145, Met165, Pro168, and Asp187. The per-residue interaction plot revealed that the majority of residues had negative binding energy, while only a few had positive binding energy. The residues with a negative binding affinity were important in maintaining the stable protein-ligand complex. Figure 10 depicts the per-residue decomposition plot of active site residues of 3CLpro in various 3CLpro-ligand complexes. When compared to other active site residues, Thr25, Leu27, Met49, Gly143, Cys145, Met165, and Pro168 showed higher binding J o u r n a l P r e -p r o o f affinity. The overall results revealed that Met49, Cys145, and Met165 play the most significant roles in 3CLpro-ligand stabilization, which is consistent with previous research [41] . The present study aimed to identify novel inhibitors against the SARS-CoV-2 3CLpro. In this study, quantitative structure-activity relationship and molecular docking were used to evaluate the relationship between molecular properties and inhibitory activity of selected compounds were identified against SARS-CoV-2 3CLpro and these can be considered as a possible treatment for COVID-19. However, further in-vitro and in-vivo studies are necessary to investigate the efficacy of these potential compounds against SARS-CoV-2. The authors declare that there is no conflict of interest regarding the publication of this paper. J o u r n a l P r e -p r o o f Green dotted line represents hydrogen bond, the faint green dotted line represents a carbon-hydrogen bond, the deep pink dotted line represents π-π stacking, the faint pink dotted line represents πalkyl interaction, cyan dotted line represents π-fluoride interaction, the yellow dotted line The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health -The latest 2019 novel coronavirus outbreak in Wuhan, China Outbreak of pneumonia of unknown etiology in Wuhan, China: The mystery and the miracle A novel coronavirus from patients with pneumonia in China World Health Organization, Coronavirus disease (COVID-19) Weekly Epidemiological Update Coronavirus envelope protein: Current knowledge Efficient assembly and release of SARS coronavirus-like particles by a heterologous expression system MERS-CoV virus-like particles produced in insect cells induce specific humoural and cellular imminity in rhesus macaques Potential inhibitors against 2019-nCoV coronavirus M protease from clinically approved medicines Coronavirus main proteinase (3CLpro) Structure: Basis for design of anti-SARS drugs, Science (80-. ) From SARS to MERS: crystallographic studies on coronaviral proteases enable antiviral drug design A practical overview of quantitative structure-activity relationship Quantitative structureactivity relationship (QSAR) studies on a series of 1,3,4-thiadiazole-2-thione derivatives as tumor-associated carbonic anhydrase IX inhibitors Advances and challenges in Protein-ligand docking The GROMOS96 manual and user guide PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints Computational studies on αaminoacetamide derivatives with anticonvulsant activities Beware of q2! QSAR studies on chalcones and flavonoids as anti-tuberculosis agents using genetic function approximation (GFA) method Automated docking of flexible ligands: Applications of AutoDock DataWarrior: An open-source program for chemistry aware data visualization and analysis Open Babel: An Open chemical toolbox Autodock4 and AutoDockTools4: automated docking with selective receptor flexiblity AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading Molecular complexes at a glance: Automated generation of two-dimensional complex diagrams ILOGP: A simple, robust, and efficient description of noctanol/water partition coefficient for drug design using the GB/SA approach A BOILED-Egg To Predict Gastrointestinal Absorption and Brain Penetration of Small Molecules SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules 5: A highthroughput and highly parallel open source molecular simulation toolkit CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields Accuracy limit of rigid 3-point water models g _ mmpbsa -A GROMACS tool for MM-PBSA and its optimization for high-throughput binding energy calculations Quantitative structure activity relationship studies on some N-benzylacetamide and 3-(phenylamino) propanamide derivatives with anticonvulsant properties Chance Correlations in Structure-Activity Studies Using Multiple Regression Analysis Best practices for QSAR model development, validation, and exploitation QSAR modeling of antimalarial activity of urea derivatives using genetic algorithm-multiple linear regressions On some aspects of validation of predictive quantitative structure-activity relationship models Dissection study on the severe acute respiratory syndrome 3C-like protease reveals the critical role of the extra domain in dimerization of the enzyme. Defining the extra domain as a new target for design of highly specific protease inhibitors Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors The catalysis of the SARS 3C-like protease is under extensive regulation by its extra domain Drug-like properties and the causes of poor solubility and poor permeability Structure-based screening of novel lichen compounds against SARS Coronavirus main protease (Mpro) as potentials inhibitors of COVID-19 The authors are thankful to the Head Department of Botany, Kumaun University, Nainital for providing the facility, space, and resources for this work. The authors also acknowledge Kumaun No