key: cord-0790843-7r0hamk9 authors: Adhikari, Nilanjan; Banerjee, Suvankar; Baidya, Sandip Kumar; Ghosh, Balaram; Jha, Tarun title: Ligand-based quantitative structural assessments of SARS-CoV-2 3CL(pro) inhibitors: An analysis in light of structure-based multi-molecular modeling evidences date: 2021-11-29 journal: J Mol Struct DOI: 10.1016/j.molstruc.2021.132041 sha: 74392fd02a1b3e74fab9017689f8a04b7419b4e9 doc_id: 790843 cord_uid: 7r0hamk9 Due to COVID-19, the whole world is undergoing a devastating situation, but treatment with no such drug candidates still has been established exclusively. In that context, 69 diverse chemicals with potential SARS-CoV-2 3CL(pro) inhibitory property were taken into consideration for building different internally and externally validated linear (SW-MLR and GA-MLR), non-linear (ANN and SVM) QSAR, and HQSAR models to identify important structural and physicochemical characters required for SARS-CoV-2 3CL(pro) inhibition. Importantly, 2-oxopyrrolidinyl methyl and benzylester functions, and methylene (hydroxy) sulphonic acid warhead group, were crucial for retaining higher SARS-CoV-2 3CL(pro) inhibition. These GA-MLR and HQSAR models were also applied to predict some already repurposed drugs. As per the GA-MLR model, curcumin, ribavirin, saquinavir, sepimostat, and remdesivir were found to be the potent ones, whereas according to the HQSAR model, lurasidone, saquinavir, lopinavir, elbasvir, and paritaprevir were the highly effective SARS-CoV-2 3CL(pro) inhibitors. The binding modes of those repurposed drugs were also justified by the molecular docking, molecular dynamics (MD) simulation, and binding energy calculations conducted by several groups of researchers. This current work, therefore, may be able to find out important structural parameters to accelerate the COVID-19 drug discovery processes in the future. In December 2019, many people of Wuhan, China, have been affected with a serious, unknown pneumonia-like disorder that did not respond to any antibiotics [1] . Certainly, the disease spread at lightning speed to the rest of the world, and therefore, the World Health Organization (WHO) announced this infection as coronavirus disease 2019 (COVID-19) [2] . This disease has been produced by the infection of the novel coronavirus or severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) [3] [4] . Consequently, this novel coronavirus is found to be closely related to a similar virus, i.e., SARS-CoV spread during 2003 [1] . Currently, COVID-19 has greatly affected the whole world and around 222 countries and territories throughout the world have been found to have the COVID -19 infection. More than 232 million COVID-19 cases and over 4.7 million deaths were reported due to this CoVID-19 disease as of May 2021 [5] . The severe COVID-19 is diagnosed as acute respiratory distress syndrome (ARDS) and multiple organ failure possibly induced by the uncontrolled immune response in the host cells [6] [7] . Coronavirus (CoV) infection in humans, as well as in other animals, has resulted in a variety of highly frequent and serious diseases, along with the SARS and the Middle East respiratory syndrome (MERS) [8] . The SARS was found first in China in November 2002. In 2003, the WHO found out the causative agent of SARS, and they identified the SARS-CoV [9] . Again, SARS-CoV can also be found among some animals like civet cats and horseshoe bats [10] . However, the source of the MERS-CoV was primarily searched in bats, but the dromedary camels of Oman and Canary Islands showed to have a high frequency of MERS-CoV-neutralizing antibodies [9] . Interestingly, in line with the debate on the actual source, the SARS-CoV-2 was found in bats (RaTG13) and pangolins [11] . The pangolins and the civet cats served as an intermediary host, but researchers detect the similarity among the strains of CoVs which are closely related to SARS-CoV-2 [1] . The structure of coronaviruses consists of single-stranded positive-sense RNA that acquires a greater number of viral RNA genomes [12] [13] . Some of the newer studies demonstrated that SARS-CoV-2 has a similar genomic resemblance with other Betacoronaviruses. The main structure of SARS-CoV-2 contains spike protein (S), membrane protein (M), open reading frame 1ab (ORF1ab) which encodes non-structural proteins (nsps), envelope protein (E), 50untranslated region (UTR), 30-untranslated region (UTR), nucleocapsid protein (N) and other unidentified non-structural ORFs that are useful toward the construction of viral particles [14] . A common coronavirus genome gives rise to more than 20 proteins and at least six open reading frames (ORFs) [13, 15] . Moreover, two-third of a genome contains first ORF (ORF1a/b) and 16 non-structural proteins (nsp1-nsp16) but the Gammacoronavirus lacks nsp1 [13] . Several anti-coronaviral viral proteins have already been established as promising targets for COVID-19 drug discovery. These include helicase, RNA-dependent RNA polymerase (RdRp), hemagglutinin esterase along with the structural and other non-structural proteins which can be targeted for anti-SARS-CoV-2 drug discovery [3, 16] . Among these proteins, there are two proteases, namely papain-like protease (PL pro ) and main protease or 3-chymotrypsin-like protease (M pro /3CL pro ), that are essential components for replication of SARS-CoV-2. These PL pro and 3CL pro are responsible for cleaving the two polyproteins, i.e., PP1A and PP1AB into several functional components. PL pro split the Nterminal domain to viral precursor protein at the three sites whereas 3CL pro splits the Cterminal domain to precursor protein at the 11 sites [17] . However, viral proteases such as (3CL pro /M pro ) along with PL pro are responsible for the nsp preparation by processing these polyproteins [13] . Thus, M pro and PL pro are auspicious targets for anti-viral drug development. Enormous efforts have been made to analyze their significance and to develop effective therapeutics against the SARS-CoV. Similar approaches had been adopted for other pathogenic coronaviruses (i.e., MERS-CoV) because of the similarity of active sites and enzymatic mechanisms with SARS-CoV [18] [19] . Interestingly, the sequence homology of SARS-CoV-2 3CL pro is 96% structurally closer to the SARS-CoV 3CL pro [20] . These structural similarities point out that targeting SARS-CoV-2 3CL pro as a potential therapeutic target is quite a feasible approach for anti-CoVID-19 drug development. Besides, the development of newer anti-SARS-CoV-2 agents, the study of previously reported SARS-CoV inhibitors can be helpful. In this scenario, drug repurposing has emerged out as a major approach in the development of anti-SARS-CoV-2 treatment [21] . Also, several repurposed drugs such as lopinavir, ritonavir, azvudine, favipiravir, remdesivir, chloroquine, hydroxychloroquine, methylprednisolone, tocilizumab, ribavirin, and oseltamivir are being tested in the clinical studies against SARS-CoV-2 treatment [22] [23] . Our group has already performed several extensive studies on the previous SARS-CoV inhibitors [3, [18] [19] [24] [25] [26] [27] [28] [29] [30] . In this study, we have reported the structural analysis of 69 diverse SARS-CoV-2 3CL pro inhibitors by the regression-based quantitative structure-activity relationship (QSAR) methodologies to identify the fundamental structural features having crucial effects on SARS-CoV-2 3CL pro inhibition. The outcomes of this current research were also validated using the available SARS-CoV-2 3CL pro crystal structure-bound ligand interactions. This may be beneficial in the newer anti-coronaviral drug development, optimization of previous SARS-CoV inhibitors, and screening of newer drugs for CoVID-19 treatment. A total of 69 structurally diverse chemical entities (Supplementary Table S1 ) with a wide range of SARS-CoV-2 3CL pro inhibitory activity in vitro (IC 50 value ranging from 0.01 µM -124.93 µM) were mustered together from the literature [20, [31] [32] [33] [34] [35] [36] [37] [38] . To maintain the uniformity of the dataset, the mean SARS-CoV-2 3CL pro inhibitory activity (IC 50 in µM) values were transformed into their negative logarithmic scale. A set of 1,444 2D molecular descriptors was calculated for each compound using PaDEL descriptor software [39] followed by the dataset pre-treatment technique to eliminate the highly correlated descriptors. The dataset division was carried out using the Kennard-Stone (KS) method using DTC Lab software [40] where a 3:1 ratio was preserved for the training and the test set (N Train = 51, N Test = 18). For the multiple linear regression (MLR) model development, two different techniques such as stepwise (SW) and genetic algorithm (GA) methods were utilized to reduce the dimension of predictor parameters [41] [42] . The best subset selection process was applied to this reduced data to identify the best regression model consisting of five molecular descriptors using DTC Lab software [40, 42] . The best models were selected based on their squared correlation coefficient (R 2 ), Leave-One-Out (LOO)-cross-validated R 2 (Q 2 ), and externally validated R 2 Pred ) values [42] [43] . Both the internal and external cross-validations were conducted to justify the reliability and predictive ability of selected MLR models. Statistical validation parameters, namely adjusted R 2 (R 2 A ), standard error of estimate (SEE), predicted residual sum of squares (PRESS), and variance ration (F) at a specified degree of freedom value were estimated. To rationalize the robustness of the selected MLR models, the internal crossvalidation parameter such as Q 2 was calculated that indicates the internal predictability of the MLR models for the training population. The external predictability of these MLR models was evaluated using the R 2 Pred values [43] . Additionally, employment of the Y-randomization test (cRp 2 ), calculation of the r m 2 metrics, and the Golbraikh-Tropsha model acceptability criteria were also determined for these MLR models [42] [43] [44] . Here, the development of non-linear QSAR models such as the artificial neural network (ANN) and the support vector machine (SVM) was carried out by the descriptors that were used to build the MLR models. This was performed to validate these MLR models further as well as to investigate the machine learning capability of those selected molecular descriptors. The artificial neural network (ANN) method is a popular machine learning technique for QSAR model development. It imitates the biological neuronal functions and deals with feedforward and back-propagation of error algorithms [45] [46] . The typical construction of an ANN model generally comprises three different layers namely, the input, hidden, and output layers. Compared to the feed-forward network of the brain neural nodes, in ANN method, the input is provided in the input layer which is forwarded to the successive hidden layer. In this hidden layer, the information is conveyed to the nodes present in it, and the output is finally sent from the hidden layer to the output layer [45] [46] [47] . In this study, to investigate the machine learning capability of these MLR models, the Autoweka software [48] was used to optimize the learning process (optimization of parameters like learning time, number of epochs, momentum, and learning rate). The final ANN model was constructed in Weka 3.8 software [49] using the backpropagation of error algorithm and was both internally and externally cross-validated. The support vector machine (SVM) model was first introduced by Vapnik [50] which follows the structure risk minimization principles of statistical learning theory [46] . SVM is used to solve problems through either regression or classification-based analyses. In SVM, to create a maximum margin, the input samples are grooved in separate classes by hyperplane construction into the linear data. Here, the non-linear conversion of the linear data is carried out using the Kernel function [K (x, y) ] and project the variable matrix into a higher dimensional feature space employing the Kernel function [46] [47] 51] . Here, the radial basis function (RBF) kernel was utilized to transform the data. Like the ANN model development, the optimization of the SVM model parameters such as the complexity of the kernel (C) and kernel width (γ) was done using Autoweka software [48] . The final SVM models were constructed using Weka 3.8 software [49] . The utilization of fragment-based approaches in the arena of drug design and discovery has provided several successes over time and has become more popular over the last decade [52] [53] . The hologram QSAR (HQSAR) method utilizes the molecular hologram or specialized molecular fingerprints of different lengths as the independent variable to correlate them with the biological response of compounds with the help of the partial least square (PLS) technique [54] . In this study, the HQSAR models were constructed on the molecules of the training set by SYBYL-X 2.0 software [55] the HQSAR models were constructed on the training set molecules. The best model among these constructed HQSAR models was chosen depending on the highest Leave-One-Out (LOO) cross-validated R 2 (Q 2 ) and the lower standard error (SE) value. Moreover, this best HQSAR model was subjected to external validation on test set molecules [43, 47] . Both of the final MLR models (SW-MLR and GA-MLR) models were constructed using 5 molecular descriptors and were selected as the final models depending on their Q² and R² pred values. The developed SW-MLR (Eq. 1) and GA-MLR (Eq. 2) models are shown below: The SW-MLR model (Eq. 1) was found to predict 63.6% and explain 67% SARS-CoV-2 3CL pro inhibition of the dataset molecules, whereas the GA-MLR model (Eq. 2) was capable of explaining 71.1% and predict 67.4% activity variation of the dataset molecules. The meaning of the descriptors used to build the SW-MLR and GA-MLR models are provided in Table 1 . Additionally, the MLR models were also able to pass the Golbraikh and Tropsha model acceptability criteria [44] ( Table 2) . MATS3e Moran autocorrelation-lag 3/weighted by Sanderson electronegativities The observed versus predicted activity plots for the SW-MLR and GA-MLR models are given in Figure 1A and 1B, respectively. The robustness of both these final SW-MLR (Eq. 1) and GA-MLR (Eq. 2) models was further validated. 50 different training and test set combinations were made and keeping the same descriptors, these models were tried for their statistical evaluation. Interestingly, all these 50 models (Supplementary Table S8 and S9) generated were statistically validated as evidenced by their internal and external cross-validation parameters. This further suggests that the selection of these descriptors used to develop the SW-MLR and GA-MLR models is validated. In As both these SW-MLR (Eq. The AD of the Eq. 3 was also tested similarly to the SW-MLR (Eq. 1) and GA-MLR (Eq.2) models ( Figure 2B) . The Golbraikh and Tropsha model acceptability parameters [42] for Eq.3 are provided in Table 1 Supplementary Tables S10-S12. Several compounds (compounds 4-5, 7, 9-11, 20-21, 27, 40, and 42 ) are found to possess the benzyl acetate amide function in their molecular structure and therefore, are effective SARS-CoV-2 3CL pro inhibitors. On the other hand, compounds bearing the 2-oxopyrrolidinyl methyl group (compounds 11, 19-37) are also potent SARS-CoV-2 3CL pro inhibitors. However, it was interesting to note that compounds containing both these functional moieties are the most potent compounds in this series (compounds 19-21) . Therefore, both these functional moieties must be taken into consideration during designing highly potent SARS-CoV-2 3CL pro inhibitors. Here, the descriptors used to construct the SW-MLR (Eq. 1) and GA-MLR (Eq. 2) models were employed in the generation of the ANN models (SW-ANN and GA-ANN). To achieve the optimal parameters for ANN model development, an extensive search regarding the different ANN model parameters, i.e., the number of nodes in the hidden layer, learning rate, momentum, and the number of learning epochs was performed using the Autoweka software [48] . For both the SW-ANN (Eq. 1) and GA-ANN (Eq. 2) models, the parameter search provided an optimal learning epoch of 0.1. Besides, for the SW-ANN model, the parameter search provided an optimal number of hidden layer nodes of 03 with an optimal training time of 300 ( Figure 3A and 3C) . Also, an optimal number of nodes in the hidden layer was found to be 01 for the GA-ANN model along with an optimal learning time of 100 (Figure 3B and 3D) . Additionally, the predictability of the ANN models has been depicted in Table 3 . Similar to the ANN models, the SW-SVM and GA-SVM models were also constructed by using the descriptors of the SW-MLR and GA-MLR models. The optimal SVM parameters, i.e., the kernel width (γ) and complexity (C) was also done using Autoweka software [48] by grid searching (Figure 4C and 4D Table 3) . The observed versus predicted activity of the ANN models has been depicted in (Figure 4C and 4D) . Also, the predicted activities from the SW-ANN, SW-SVM models are given in Supplementary Table S2 whereas the predicted activities from the GA-ANN, GA-SVM models are given in Supplementary Table S4 . Regarding the HQSAR analysis, all the probable model combinations were investigated by using different fragment distinction parameters (A, B, C, H, Ch, and DA) to generate 50 HQSAR models ( Table 4 ). Figure 2C . According to the best HQSAR model (model 47B), the importance of the good and bad fragments of the molecules can be obtained. The good fragments of these compounds are shown in green and blue-green colors, whereas the bad fragments are shown in red and oranges-red colors. However, the white-colored fragments display moderate contributions towards biological activity. The best active compound (Compound 21) shows the significance of the good fragments as depicted in Figure 5A . Figure 5A ). Again, the chiral carbon atom where the 2-oxopyrrolidinyl methyl group is attached shows good contributions. Moreover, the adjacent carbonyl oxygen atom shows a good contribution towards SARS-CoV-2 3CL pro inhibitory activity. One of the terminal methyl carbon atoms of the i-butyl moiety, adjacent to the benzyloxy carboxamido function, projects its good contribution towards SARS-CoV-2 3CL pro inhibition. The positive influence of the benzyl and benzyloxy carboxamido functions on the biological activity is also noticed for several other molecules (Compounds 5, 7, 19-20, 40, and 43) . This result is in agreement with our earlier observation [30] where it was observed from the crystallographic data analysis that the i-butyl group enters the S2 subsite of the SARS-CoV-2 3CL pro enzyme surrounded by the amino acid residues His41, Met49, and Met169. Apart from that, the caboxybenzyl group occupies the S4 pocket, whereas the carboxy group can form potential hydrogen bonding interaction. The other potent compounds of this series (Compounds 9, 11, 19-20, 26-27, and 37) only show the importance of several good fragments but do not show any bad fragments. Several FDA-approved drugs and other compounds which have already been repurposed by several groups of researchers [3, were considered here to judge whether our QSAR models can explain/predict these molecules or not. A number of compounds were found to be highly predicted (IC 50 < 1 µM) by our SW-MLR (Eq. 1) and GA-MLR (Eq. 2) models. Among these compounds, five compounds were predicted as higher active SARS-CoV-2 3CL pro inhibitors by both of these models (Figure 6 and Figure 7) . Various molecular docking studies have already been performed with curcumin and SARS-CoV-2 3CL pro enzyme [74, [77] [78] [79] [80] [81] . Khaerunnisa et al. [82] showed that curcumin showed strong binding energy (ΔG = -7.05 kcal/mol) with the SARS-CoV-2 3CL pro enzyme (PDB: 6LU7). It forms potential hydrogen bond interaction with Cys145, Leu141, Gly143, Ser144, and Thr190 and π-sulfur interaction with Met165. The results are also in agreement with the observation of Ibrahim and co-workers [78] showing a docking score of -9.2 kcal/mol ( Figure 8A ). On the other hand, the second-highest active repurposed drug ribavirin displayed good docking interaction (relative docking score = 2.01, relative ligand efficiency = 3.21, relative glide lipo = 0.37, and relative glide Hbond = 4.36) at the SARS-CoV-2 3CL pro active site [74] . It formed potential hydrogen bonding interaction with the backbone amino acid residues Thr25 and Gln189 (Figure 8B) . Deshpande et al. [83] also showed that ribavirin was effectively bound to the active site of SARS-CoV-2 3CL pro (PDB: 6Y84). Again, Gupta et al. [84] demonstrated that ribavirin strongly binds to SARS-CoV-2 3CL pro (PDB: 6LU7) and produced favorable hydrogen bonding with Leu141, Gly143, Arg188, and Gln189. The result was slightly varied in the case of the results obtained by Kumar et al. [85] . It displayed a docking score of -6.813 and binding energy of -35.63 kcal/mol during docking with SARS-CoV-2 3CL pro (PDB: 6LU7). Moreover, it showed potential hydrogen bonding with His164, Glu166, Gln189, and Thr190 at the active site. The 3 rd highest active compound saquinavir was studied through molecular docking analysis by different groups of researchers [58, 63, [86] [87] [88] [89] [90] [91] [92] [93] [94] . Hall et al. [86] reported a docking score of -7.285 kcal/mol while binding to the SARS-CoV-2 3CL pro (PDB: 6LU7). Again, Talluri et al. [87] showed that saquinavir had a binding affinity of -9.2 kcal/mol during the active site binding (PDB: 6LU7). It formed hydrogen bonding interactions with His163, His164, Gly143, Ser144, Cys145, and Glu166. Saquinavir also showed similar binding energy (-9.6 kcal/mol) while binding to SARS-CoV-2 3CL pro (PDB: 6LU7) that was noticed by the molecular docking study conducted by Ortega et al. [88] . Raphael et al. [58] examined that saquinavir had binding energy of -9.0 kcal/mol with SARS-CoV-2 3CL pro (PDB: 6LU7) making several hydrogen bonding interactions with Glu166, His41, His164, Gln189 along with several π-alkyl interactions. Al-Khafaji et al. [63] showed that saquinavir, while docked into SARS-CoV-2 3CL pro enzyme (PDB: 6LU7) displayed the highest docking score (-9.856 kcal/mol) and the lowest MM-GBSA binding energy (-72.17 kcal/mol). Saquinavir formed five hydrogen bonds with Glu166, Gln189, His164 as well as covalent bonding with Cys145 ( Figure 8C) . Nevertheless, the molecular dynamics (MD) simulation study revealed that saquinavir has an average RMSD of 0.0186 as well as lower fluctuation and the binding became stable at 50 ns. Sepimostat was the 4 th highest predicted molecule as per the GA-MLR model. Tsuji et al. [64] showed that sepimostat effectively fit into the active site of SARS-CoV-2 3CL pro enzyme (PDB: 6Y2G) having an RDOCK score of -58.121 kcal/mol and a Vina score of -7.9 kcal/mol. The carbonyl moiety of sepimostat was closely located at the Cys145 whereas the dihydroimidazole ring is located closely towards His41 at the enzyme active site ( Figure 8D ). Various molecular modeling studies [83, [95] [96] [97] [98] [99] [100] [101] were conducted on remdesivir, which was found to be the 5 th highest molecule in our analysis. Murugan et al. [96] displayed that remdesivir showed a strong binding affinity with viral SARS-CoV-2 3CL pro enzyme (-44.4 kcal/mol). Alajmi [97] showed that remdesivir had a docking score of - [96] showed that remdesivir forms hydrogen bonding interaction with His41, Arg188, and Thr190 with SARS-CoV-2 3CL pro (PDB: 6LU7). The post MD simulation study by Naik et al. [101] depicted that remdesivir displayed potential hydrogen bonding with His163, Glu166, Cys145, Gly143 ( Figure 8E ) along with several water molecules to make stable interactions. The RMSF values showed lower atomic fluctuations in binding site residues, which suggested smaller conformational changes and established stable binding. As far as the HQSAR model was concerned, it also predicted several repurposed drug molecules as potential inhibitors of SARS-CoV-2 3CL pro inhibitors. Among these compounds, the five highly active molecules were lurasidone (predicted IC 50 Thurakkal et al. [102] showed that lurasidone had strong binding energy (-8.4 kcal/mol) during binding to the SARS-CoV-2 3CL pro enzyme (PDB: 6Y84). The MD simulation study revealed that the complex had a lower average backbone RMSD (1.54 Å). Elmezayen et al. [103] showed that lurasidone had a binding energy score (-11.17 kcal/mol) and inhibition constant (K i = 6.52 nM) while coordinating to the active site of SARS-CoV-2 3CL pro enzyme (PDB: 6LU7). Lurasidone formed hydrogen bonding interaction with His41, Glu166 as well as alkyl interactions with Met165 and Met49 along with π-alkyl interactions with Pro168, Met165, and His41 ( Figure 9A ). Wang et al [62] showed that elbasvir had a docking score of -9.9 and the MM-PBSA-WSAS binding free energy (ΔG = -6.5 kcal/mol). Elbasvir is strongly bound to the active site of SARS-CoV-2 3CL pro showing interaction with Thr25, Thr26, His41, Met49, Met165, Glu166, Gln189, and Thr190 ( Figure 9B ). As far as the observation of Tripathi et al. [104] was concerned, lopinavir exhibited effective SARS-CoV-2 3CL pro inhibition at 16 µM dose. Deshpande et al. [83] showed that lopinavir had good binding energy (∆G = -9.9 kcal/mol) with SARS-CoV-2 3CL pro (PDB: 6Y84). Similarly, Gyebi et al [105] showed that lopinavir had a docking score of - respectively. The molecular docking study of lopinavir with SARS-CoV-2 3CL pro (PDB: 5R81) showed that lopinavir formed hydrogen bonds with Glu166 and His41. Moreover, His41 formed a π-π stacking interaction with the phenyl ring of lopinavir ( Figure 9C) . Paritaprevir is another molecule that was predicted well by the HQSAR modeling conducted in this study. Bahadur Gurung et al. [60] displayed that paritaprevir formed hydrogen bonds with Glu166 and Asn142 along with several hydrophobic interactions with Thr25, Thr26, His41, Met49, Gly143, Cys145, Met165, Gln189, and Gln192. Alamri et al. [107] showed that paritaprevir formed hydrogen bonding interactions with Gly143 and Cys145. It also formed amide-π stacking interaction with Thr45 and π-alkyl interaction with Met49, Met165, and Pro168. The MD simulation study resulted in an RMSD value of 1.2 Å. Again, the total binding free energy resulted in a favorable MM-GBSA total energy of -47.15 kcal/mol ( Figure 9D ). The world is currently running through a devastating situation due to COVID-19, and SARS-CoV-2 has badly affected the whole world. To date, no such small molecule drug candidate has been approved exclusively for the treatment of COVID-19, and thus, there is an extreme requirement for such drugs. In this current work, a robust, regression-dependent molecular modeling study was conducted on 69 diverse compounds possessing SARS-CoV-2 3CL pro inhibitory activity. Different linear (SW-MLR and GA-MLR) and non-linear (ANN and SVM) QSAR, as well as HQSAR models, were constructed in this present work to extract out important structural features responsible for imparting SARS-CoV-2 3CL pro inhibition. All these QSAR models were validated by internal and external cross-validation methods in a robust fashion. For the SW-MLR and GA-MLR models, 50 different test and training set combinations were used to validate the robustness of these models. These QSAR models suggest that the E-State indices related to SP 3 hybridized carbon atoms, molecular distance edge between all secondary nitrogen atoms, molecular distance edge between all secondary oxygen atoms, molecular complexity, maximum E-state descriptor of strength for potential hydrogen bonds of path length 10 and 8 are crucial for pertaining the SARS-CoV-2 3CL pro inhibitory activity. Another SW-MLR model with well-known physicochemical parameters and indicator variables showed that the 2-oxopyrrolidine and the benzylester functions are crucial for higher SARS-CoV-2 3CL pro inhibition. The non-linear ANN models (SW-ANN, GA-ANN) and SVM (SW-SVM, GA-SVM) models were successfully able to optimize all the linear QSAR models (SW-MLR and GA-MLR) as suggested by the statistical validation parameters. The HQSAR model was also able to identify important good and bad structural fragments essential for modulating the SARS-CoV-2 3CL pro inhibition. It also showed that 2oxopyrrolidinyl methyl function, benzylester function, and methylene (hydroxy) sulphonic acid warhead group are important for retaining higher SARS-CoV-2 3CL pro inhibitory activity. Nevertheless, the GA-MLR and HQSAR models were also tried to predict externally Conceptualization, Writing -review, and editing, Supervision. Tarun Jha: Conceptualization, Writing -review, and editing, Supervision. The authors declare no conflict of interests. Shell disorder analysis suggests that pangolins offered a window for a silent spread of an attenuated SARS-CoV-2 precursor among humans Progress in developing inhibitors of SARS-CoV-2 3C-like protease Dissecting the drug development strategies against SARS-CoV-2 through diverse computational modeling techniques COVID-19: The first documented coronavirus pandemic in history Microvascular COVID-19 lung vessels obstructive thromboinflammatory syndrome (MicroCLOTS): an atypical acute respiratory distress syndrome working hypothesis The immune system and COVID-19: Friend or foe? Discovery of M protease inhibitors encoded by SARS-CoV-2 SARS and MERS: recent insights into emerging coronaviruses Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak The severe acute respiratory syndromecoronavirus replicative protein nsp9 is a single-stranded RNA-binding subunit unique in the RNA virus world Emerging coronaviruses: genome structure, replication, and pathogenesis Structural basis of SARS CoV-2 3CLpro and anti-COVID-19 drug discovery from medicinal plants Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL pro) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates Drug targets for corona virus: A systematic review Design and evaluation of anti-SARS-coronavirus agents based on molecular interactions with the viral protease Fight against novel coronavirus: a perspective of medicinal chemists Protease targeted COVID-19 drug discovery: What we have learned from the past SARS-CoV inhibitors? Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro Brief review on repurposed drugs and vaccines for possible treatment of COVID-19 Coronavirus puts drug repurposing on the fast track Chemical-informatics approach to COVID-19 drug discovery: Exploration of important fragments and data mining based prediction of some hits from natural origins as main protease (Mpro) inhibitors Chemical-informatics approach to COVID-19 drug discovery: Monte Carlo based QSAR, virtual screening and molecular docking study of some in-house molecules as papain-like protease (PLpro) inhibitors Protease targeted COVID-19 drug discovery and its challenges: Insight into viral main protease (Mpro) and papain-like protease (PLpro) inhibitors Unmasking of crucial structural fragments for coronavirus protease inhibitors and its implications in COVID-19 drug discovery Exploring naphthyl derivatives as SARS-CoV papain-like protease (PLpro) inhibitors and its implications in COVID-19 drug discovery First structure-activity relationship analysis of SARS-CoV-2 virus main protease (Mpro) inhibitors: an endeavor on COVID-19 drug discovery Robust classification-based molecular modelling of diverse chemical entities as potential SARS-CoV-2 3CLpro inhibitors: theoretical justification in light of experimental evidences GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viral main protease Structure of M pro from SARS-CoV-2 and discovery of its inhibitors Structure and inhibition of the SARS-CoV-2 main protease reveal strategy for developing dual inhibitors against Mpro and cathepsin L Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease Feline coronavirus drug inhibits the main protease of SARS-CoV-2 and blocks virus replication 3C-like protease inhibitors block coronavirus replication in vitro and improve survival in MERS-CoV-infected mice Identification of SARS-CoV-2 3CL protease inhibitors by a quantitative high-throughput screening Discovery of baicalin and baicalein as novel, natural product inhibitors of SARS-CoV-2 3CL protease in vitro PaDEL descriptor: An open source software to calculate molecular descriptors and fingerprints QSAR and QSAAR modeling of nitroimidazole sulfonamide radiosensitizers: application of small dataset modeling Development of a simple, interpretable and easily transferable QSAR model for quick screening antiviral databases in search of novel 3C-like protease (3CLpro) enzyme inhibitors against SARS-CoV diseases A primer on QSAR/QSPR modeling: fundamental concepts Beware of q2! Prediction of GFP spectral properties using artificial neural network Quantitative prediction of imprinting factor of molecularly imprinted polymers by artificial neural network First report on the structural exploration and prediction of new BPTES analogs as glutaminase inhibitors AutoWeka: toward an automated data mining software for QSAR and QSPR studies The WEKA data mining software: an update An overview of statistical learning theory Prediction of bond dissociation enthalpy of antioxidant phenols by support vector machine Experiences in fragment-based drug discovery When fragments link: a bibliometric perspective on the development of fragment-based drug discovery Rational Drug Design: Novel Methodology and Practical Applications Virtual screening, ADME/T, and binding free energy analysis of anti-viral, antiprotease, and antiinfectious compounds against NSP10/NSP16 methyltransferase and main protease of SARS CoV-2 Molecular docking and dynamic simulations for antiviral compounds against SARS-CoV-2: a computational study Computational Evaluation of the Inhibition Efficacies of HIV Antivirals on SARS-CoV-2 (COVID-19) Protease and Identification of 3D Pharmacophore and hit compounds silico studies on therapeutic agents for COVID-19: drug repurposing approach Structure-based virtual screening of phytochemicals and repurposing of FDA approved antiviral drugs unravels lead molecules as potential inhibitors of coronavirus 3C-like protease enzyme Dı´az-Sa´nchez, FDA-approved thiol-reacting drugs that potentially bind into the SARS-CoV-2 main protease, essential for viral replication Fast identification of possible drug treatment of coronavirus disease-19 COVID-19) through computational drug repurposing study Using integrated computational approaches to identify safe and rapid treatment for SARS-CoV-2 Potential anti-SARS-CoV-2 drug candidates identified through virtual screening of the ChEMBL database for compounds that target the main coronavirus protease Identification of chymotrypsin-like protease inhibitors of SARS-CoV-2 via integrated computational approach Peptide-like and small-molecule inhibitors against Covid-19 An investigation into the identification of potential inhibitors of SARS-CoV-2 main protease using molecular docking study Statins and the COVID-19 main protease: in silico evidence on direct interaction A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS CoV-2 Glecaprevir and Maraviroc are high-affinity inhibitors of SARS-CoV-2 main protease: possible implication in COVID-19 therapy Identification of bioactive molecules from tea plant as SARS-CoV-2 main protease inhibitors Drug repurposing for coronavirus (COVID-19): in silico screening of known drugs against coronavirus 3CL hydrolase and protease enzymes Potential inhibitors for novel coronavirus protease identified by virtual screening of 606 million compounds Virtual screening and repurposing of FDA approved drugs against COVID-19 main protease Molecular investigation of SARS-CoV-2 proteins and their interactions with antiviral drugs Potential inhibitors against 2019-nCoV coronavirus M protease from clinically approved medicines Activity of phytochemical constituents of Curcuma longa (turmeric) and Andrographis paniculata against coronavirus (COVID-19): an in silico approach In silico drug discovery of major metabolites from spices as SARS-CoV-2 main protease inhibitors Novel cyclohexanone compound as a potential ligand against SARS-CoV-2 main-protease In silico Screening of Natural Compounds as Potential Inhibitors of SARS-CoV-2 Main Protease and Spike RBD: Targets for COVID-19 Virtual screening and molecular dynamics simulation study of plant-derived compounds to identify potential inhibitors of main protease from SARS-CoV-2 Potential inhibitor of COVID-19 main protease (Mpro) from several medicinal plant compounds by molecular docking study In silico molecular docking analysis for repurposing therapeutics against multiple proteins from SARS-CoV-2 Secondary metabolites from spice and herbs as potential multitarget inhibitors of SARS-CoV-2 proteins Reprofiling of approved drugs against SARS-CoV-2 main protease: an in-silico study A search for medications to treat COVID-19 via in silico molecular docking models of the SARS-CoV-2 spike glycoprotein and 3CL protease Molecular Docking and Virtual Screening based prediction of drugs for COVID-19 Unrevealing sequence and structural features of novel coronavirus using in silico approaches: The main protease as molecular target Identification of saquinavir as a potent inhibitor of dimeric SARS-CoV2 main protease through MM/GBSA Rational approach toward COVID-19 main protease inhibitors via molecular docking, molecular dynamics simulation and free energy calculation Interaction of drug candidates with various SARS-CoV-2 receptors: An in silico study to combat COVID-19 Evaluation of the binding affinity of anti-viral drugs against main protease of SARS-CoV-2 through a molecular docking study Discovery of potent inhibitors for SARS-CoV-2's main protease by ligand-based/structure-based virtual screening, MD simulations, and binding energy calculations A study of potential SARS-CoV-2 antiviral drugs and preliminary research of their molecular mechanism, based on Anti-SARS-CoV drug screening and molecular dynamics simulation Computational screening of antagonists against the SARS-CoV-2 (COVID-19) coronavirus by molecular docking Searching for target-specific and multi-targeting organics for Covid-19 in the Drugbank database with a double scoring approach Antiviral potential of some novel structural analogs of standard drugs repurposed for the treatment of COVID-19 Natural Compounds as Inhibitors of SARS-CoV-2 Main Protease (3CLpro): A molecular docking and simulation approach to combat COVID-19 The inhibitory effect of some natural bioactive compounds against SARS-CoV-2 main protease: insights from molecular docking analysis and molecular dynamic simulation In silico docking analysis revealed the potential of phytochemicals present in Phyllanthus amarus and Andrographis paniculata, used in Ayurveda medicine in inhibiting SARS-CoV-2 Structure-based virtual screening, molecular dynamics and binding affinity calculations of some potential phytocompounds against SARS-CoV-2 An in-silico study on selected organosulfur compounds as potential drugs for SARS-CoV-2 infection via binding multiple drug targets Drug repurposing for coronavirus (COVID-19): in silico screening of known drugs against coronavirus 3CL hydrolase and protease enzymes Screening and evaluation of approved drugs as inhibitors of main protease of SARS-CoV-2 Potential inhibitors of coronavirus 3-chymotrypsin-like protease (3CLpro): an in silico screening of alkaloids and terpenoids from African medicinal plants In silico studies on therapeutic agents for COVID-19: Drug repurposing approach Pharmacoinformatics and molecular dynamics simulation studies reveal potential covalent and FDA-approved inhibitors of SARS-CoV-2 main protease 3CLpro