key: cord-0807862-097nv80j authors: Umar, Haruna Isiyaku; Ajayi, Adeola; Bello, Ridwan Opeyemi; Alabere, Hafsat Olateju; Sanusi, Afees Akinbode; Awolaja, Olamide Olusegun; Alshehri, Mohammed Mansour; Chukwuemeka, Prosper Obed title: Novel Molecules derived from 3-O-(6-galloylglucoside) inhibit Main Protease of SARS-CoV 2 In Silico date: 2021-10-05 journal: Chem Zvesti DOI: 10.1007/s11696-021-01899-y sha: 5a6f344a5dec534f9bae88c610660236e4549d6c doc_id: 807862 cord_uid: 097nv80j The ongoing pandemic caused by the severe acute respiratory syndrome 2 (SARS-CoV 2) has led to more than 168 million confirmed cases with 3.5 million deaths as at 28th May, 2021 across 218 countries. The virus has a cysteine protease called main protease (Mpro) which is significant to it life cycle, tagged as a suitable target for novel antivirals. In this computer-assisted study, we designed 100 novel molecules through an artificial neural network-driven platform called LigDream (https://playmolecule.org/LigDream/) using 3-O-(6-galloylglucoside) as parent molecule for design. Druglikeness screening of the molecules through five (5) different rules was carried out, followed by a virtual screening of those molecules without a single violation of the druglike rules using AutoDock Vina against Mpro. The in silico pharmacokinetic features were predicted and finally, quantum mechanics/molecular mechanics (QM/MM) study was carried out using Molecular Orbital Package 2016 (MOPAC2016) on the overall hit compound with controls to determine the stability and reactivity of the lead molecule. The findings showed that eight (8) novel molecules violated none of the druglikeness rules of which three (3) novel molecules (C33, C35 and C54) showed the utmost binding affinity of −8.3 kcal/mol against Mpro; C33 showed a good in silico pharmacokinetic features with acceptable level of stability and reactively better than our controls based on the quantum chemical descriptors analysis. However, there is an urgent need to carry out more research on these novel molecules for the fight against the disease. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11696-021-01899-y. Early 2020, the World Health Organization (WHO) acknowledged the most recent coronavirus (COVID-19) a pandemic and a globewide emergency that causes a global outbreak after its emergence from Wuhan, China in December of 2019 (Kamaz et al. 2020; Anjorin 2020) . The ongoing pandemic has resulted in more than 168 million confirmed cases with 3.5 million deaths (as at 28th May, 2021) across 218 countries (Sakurai et al. 2021 ). The culprit virus that is devastating our world is the severe acute respiratory syndrome 2 (SARS-CoV 2), with a spherical shape, single-stranded, and positively sensed RNA virus. The virus has a genome length of approximately 30,000 base pairs consisting of about eleven open reading frames (ORFs) that encodes numerous proteins (structural and non-structural) that are of importance to the viral life cycle (Anjorin 2020; Alazmi and Motwalli 2020; Yoshimoto 2020) . Once SARS-CoV 2 gains entry into the host cell, the viral RNA is liberated into the cytoplasm which results in the expression of replicase gene. The replicase gene accommodates two overlapping open reading frames, ORF1a/1b which covers two-third of the viral genome. The ORFs undergo expression to yield polyprotein 1a and 1ab. These polyproteins are biochemically sliced by two cysteine-rich proteinases known as 3-chymotrypsin-like protease or main protease (Mpro/3CLpro) and papain-like protease (PLpro) emancipating 16 non-structural proteins (nsp1 to 16). 3-Chymotrypsin-like protease (nsp5) is a three domain (I to III) protease that is of significance for the sustenance of the viral life cycle since its maturation leads to the production of nsp4 to nsp16 after self-cleaving from the precursor polyprotein (Dai et al. 2020; Jin et al. 2020) . The cleft between domains I and II features a noncanonical Cys-His dyad that recognizes amino acids in substrates from the N terminus to C terminus (Dai et al. 2020; Jin et al. 2020) . In it active form, is homodimeric with two protomers (A and B) (Jin et al. 2020 ). There are no homologs of Mpro present in humans, making it an ideal target for antiviral drug design. Additionally, the role it plays in the replication and proliferation of SARS-CoV 2 cannot be silenced. The process of discovering and developing drugs is time-consuming and cost-effective, thus the need for the use of computer-aided therapeutic discovery (CATD) which embraces structurebased, system-based and ligand-based therapeutic design (Dai et al. 2020; Romano and Tatonetti 2019; Prieto-martı et al. 2019; Umar et al. 2021a) . Today, the key roles played by computational methods in therapeutic design, discovery and development expedition cannot be overemphasized because of its numerous dimensional usage for assemblage of data, processing it before evaluation and interpretation (Dai et al. 2020; Romano and Tatonetti 2019; Prieto-martı et al. 2019; Umar et al. 2021a) . From our erstwhile computational study, we showed that 3-O-(6-galloylglucoside) serves as potential inhibitor of SARS-CoV 2's main protease. However, this compound was found to have poor druglike properties that might affect it down the drug discovery pipeline even though it showed a good binding score with main protease of SARS-CoV 2 (Umar et al. 2021a ). In this current study, we designed 100 novel molecules through an artificial neural network-driven platform called LigDream (https:// playm olecu le. org/ LigDr eam/) (Skalic et al. 2019 ) using 3-O-(6-galloylglucoside) as parent molecule for design. These new compounds were subjected to inflexible druglikeness screening to select those that could serve as oral drugs. Then a virtual screening was implemented to get the best hit compounds against Mpro. The hit compounds were examined for their pharmacokinetic properties in silico. Finally, quantum mechanics/molecular mechanics study was carried out on the overall hit compound with controls to determine the stability and reactivity of the lead molecule. The ligands selected for this current in silico study are 3-O-(6-galloylglucoside) (seed molecule, PID: 73,157,749), remdesivir (control antiviral drug, PID: 121,304,016) and novel compounds (Fig S1 and Table S1 ). Three-dimensional conformers of the seed molecule and control drugs in structure data format (SDF) are sourced from a chemical repository server known as PubChem (https:// pubch em. ncbi. nlm. nih. gov/ compo und/). A total of 100 novel compounds were generated using an artificial neural network-driven platform called LigDream module of playmolecules server (https:// playm olecu le. org/ LigDr eam/) (Skalic et al. 2019) . The SMILE string of 3-O-(6-galloylglucoside) was uploaded to the server and run to generate 100 new SMILE strings for different compounds. The platform uses two networks, auto-encoders and captioning networks that could differentially design several compounds (100) starting with a lone seed compound (3-O-(6-galloylglucoside)). Furthermore, these 100 novel compounds were filtered for druglikeness using five rules, viz Lipinski's (Lipinski et al. 2012 ), Veber's (Veber et al. 2002) , Muegge's (Muegge et al. 2001 ), Egan's (Egan et al. 2000) with Ghose's et al. (1999) ; and bioavailability score via the SwissAdme server (http:// www. swiss adme. ch/ index. php) (Daina et al. 2017) . SMILE Strings of the novel molecules was uploaded onto the server and was run to evaluate their druglikeness. Those compounds which show no violation to the five rules and a bioavailability score of 0.55 and above were considered for the computer-aided molecular docking against main protease of SARS-CoV 2. Three-dimensional (3D) structure of main protease (Mpro/ nsp5) of SARS-CoV 2 (PDB ID: 6LU7) (Jin et al. 2020 ) was retrieved from the Protein Database (PDB) (http:// www. pdb. org/ pdb). Furthermore, the protein was free from all heteroatoms and consequently minimized using protein preparation and minimization tools in Cresset Flare© software, version 4.0 (https:// www. cress et-group. com/ flare/). The protein minimization was executed under the General Amber Force Field (GAFF), with gradient cutoff of 0.200 kcal/mol/A and iterations was set to 2000 iterations (Stroganov et al. 2011 ). Authentication of molecular docking step is required through step validation as done earlier (Umar et al. 2021a (Umar et al. , 2021b to corroborate its exactitude and consistency. Our intent is to replicate the binding posture of a re-docked ligand of a protein that was co-crystallised alongside it. Thus, the cocrystallised ligand (N-leucinamide) was detached from the Mpro and primed for re-docking using Cresset Flare software. The ligand was then re-docked back into Mpro's binding region using Auto Dock Vina integrated in Python Prescription (PyRx) (Trott and Olson 2010) . The docked complex was aligned with the cognate crystal structure of Mpro bearing the co-crystalized ligand to acquire the root mean square deviation (RMSD) value in PyMOL. The novel compounds that show druglikeness and good bioavailability score were sketched using MarvinSketch© (ver. 15.11.30) software and transformed into their best energetic and stable configurations using Merck molecular force field (MMFF94) (Halgren 1996) using Open babel integrated within Python Prescription (version 0.8). The molecular docking was achieved through flexible docking procedure (Trott and Olson, 2010) previously used by ( Umar et al.2021b) . PyRx 0.8, a suite integrated with Auto Dock Vina, was utilized for the molecular docking study. The specific target site for the receptor corresponding to the substrate-binding region was adjusted using the grid box with dimensions (18.08 × 26.45 × 26.30) Å, and the centre was attuned based on the site of substrate binding in the protein consisting of the following amino acids; Thr25, Thr26, His41, Cys44, Met49, Tyr54, Phe140, Leu141, Gly143, Cys145, Asn142, His163, His164, Met165, Ser144, Glu166, Pro168, His172, Val186, Asp187, Arg188, Gln189, Phe185, Thr190, and Gln192 (Dai et al. 2020; Jin et al. 2020; Umar et al. 2021a ). The compounds with docking score similar to that of the Seed molecule and control drug at the end of the experiment, were subjected to molecular interaction analysis with the aid of PyMOL© Molecular Graphics (version 2.4, 2016, Shrodinger LLC) and LigPlot + (Laskowski and Swindells 2011). ADMET (Adsorption, Distribution, Metabolism, Excretion and Toxicity) is important to analyze the pharmacokinetics of the proposed molecule which could be used as a drug. ADMETSar server was used to predict the ADMET properties of the compounds with the best hits after molecular docking analysis (Cheng et al. 2012; Yang et al. 2018 ). SMILES of the ligands from PubChem (https:// pubch em. ncbi. nlm. nih. gov/ compo und/) were uploaded onto the search bar of the servers and were predicted. Quantum reactivity descriptors of a lead molecule, seed molecule and control drug was calculated using the Molecular Orbital Package (MOPAC2016). The computations were executed through the semi-empirical method Parametric Method 7 (Kishor and Bhoop 2013) . The output file generated from the geometric optimization of the molecules was used to calculate quantum reactivity descriptors such as Highest Occupied Molecular Orbital (HUMO), Lowest Unoccupied Molecular Orbital (LUMO), Energy Gap, Chemical hardness and softness, Molecular surface electrostatic potential (MEP) and electronic energy. We used 3-O-(6-galloylglucoside) to produce 100 new compounds through LigDream platform (https:// playm olecu le. org/ LigDr eam/). The significance of this platform to solve common drug discovery problems is tied to their potency to prevent over utilization of resources (Skalic et al. 2019 ). These new compounds show new scaffolds and functional groups which covers new site of chemical space that upholds lead-like characteristics (Supplementary Figure S1 ). The outcome of the inflexible druglikeness screening indicated that eight (8) new molecules scaled through without violating any of Lipinski's, Ghose's, Veber's, Egan's and Muegge's rules; also showing an Abbot bioavailability score of 0.55 (Table 1 ). Druglikeness is defined as a qualitative valuation that offer the chance for a compound to be an oral drug with reverence to bioavailability (Daina et al. 2017 ). In the early stage of drug discovery voyage, these valuations are routinely deployed to filter chemical libraries [in our case our chemical library is made of 100 novel molecules from 3-O-(6-galloylglucoside)] to expunge those molecules with properties that are discordant with an acceptable pharmacokinetics profile. This is an indication that the 8 novel molecules can serve as orally active drug. We employed virtual docking to define potential binding and interactions between 8 new molecules, 3-O-(6-galloylglucoside) and remdesivir with SARS CoV 2's main protease. From our findings, 3O6G had a binding affinity of −8.4 kcal/mol while remdesivir displayed a binding affinity of −8.2 kcal/mol ( Fig. 1 and Table 2 ). Also, the novel molecules (C1, C2, C17, C23, C30, C33, C35 and C54) showed good binding (Figs. 1, 2) although molecules C33, C35 and C54 showed the utmost binding affinity of -8.3 kcal/mol. The molecular interaction studies using PyMOL and LigPlot + of our compounds showed that they have interactions with amino acid residues domiciled within the binding domain for substrates in Mpro (Figs. 1, 2). Of note is the ability of our compounds to be able to interact with His41 and Cys145 either by hydrogen bonding or hydrophobic interaction. This interaction can be key to stopping the activity of Mpro for processing the replicase polyprotein and subsequently blocking the maturation of the virus as these two amino acid residues are key in interacting with the enzymes normal substrate. The observation of 3-O-(6-galloylglucoside) binding to SARS-CoV 2 main protease from this current study is in agreement with the observation made in our previous study that looked at binding of gallic acid derivatives with five non-structural proteins (nsps) of SARS-CoV 2 (Umar et al. 2021a) . ( Das et al. 2020) have demonstrated that most molecules docked against SARS-CoV 2's main protease interacted with His41 and Cys145. Similarly, the in silico study of Umar et al. ( 2021) showed that Mangiferin, binded to Mpro by producing interactions with key amino acid residues significant in stopping the activity of Mpro especially the catalytic dyad residues, His41 and Cys145. Our findings in this current in silico work through the molecular interaction fingerprints of the lead molecules are in tandem with those of previous studies (Kamaz et al. 2020; Jin et al. 2020; Putu et al. 2020; Zhao et al. 2020 ). For a drug to be effective, a potent small molecule should reach its target in the body in ample concentration, and remain there in its bioactive nature for a long period to exact its therapeutic influence. In line with this, it is pertinent to assess molecules early on in the drug development stage for their absorption, distribution, metabolism, excretion, and toxicity (ADMET) parameters as this effort will considerably reduce attrition. In our current in silico work, we assessed the ADMET-linked parameters of three best hit molecules (C33, C35 and C54), 3-O-(6-galloylglucoside) and remdesivir computationally using the ADMETSAR server. Our findings are presented in Table 3 . The new compounds showed a good ADMET properties than our seed molecule (3O6G) and remdesivir especially C33. This feat could be linked to the earlier outcome from the inflexible druglikeness screening. Commendably, C33 shows none inhibition to some key liver enzymes (Cytochrome P450 isoforms), human-ether-a-go-go gene (i.e. might not cause prolonged QT interval), and P-glycoprotein transporter, in addition to non-carcinogenic and non-mutagenic aptitudes as predicted. Furthermore, C33 was predicted to be orally available and could be absorbed intestinally. The quantum reactivity descriptors of our lead molecule (C33), 3-O-(6-galloylglucoside) and remdesivir were obtained from MOPAC2016 software. We were able to generate the energy of the highest occupied molecular orbital ( E HOMO) and the energy of the lowest unoccupied molecular orbital ( E LUMO), which describes the charge distribution of frontier molecular orbitals of our compounds ( Fig. 3A-C) . From this, the blue color portrays the negative periods, while the red color indicates the positive periods. Thus, we observe that the HOMO and LUMO are frequently located on the aromatic ring, sugar moiety (remdesivir), the amino-, floro-, carbonyl-, phospho-and methylene groups. Similarly, the values of HOMO (the capacity of a molecule to give electrons to an acceptor species is favored by high values) and LUMO (the ability of a molecule to take electrons is preferred by the low values) can determine the stability and reactivity of our compounds (Benbouguerra et al. 2021; Mubarik et al. 2021) . C33 was seen to be able to donate electrons (HOMO = −8.898 eV) and also could take electrons (LUMO = −1.636 eV). The energy gap (ΔG) showed that the lead compound is reactive (−7.262 eV) than the seed molecule and control drug (Table 4 ) and moderately stable. This indicates that there is charge transfer between the atoms of C33. Other parameters termed the global reactivity descriptors such as chemical potential, electrophilicity index, and chemical hardness that is required to analyze the reactivity of the inhibitor molecules were also elucidated. In addition, descriptors such as ionization potential and electron affinity were also determined ( Table 4 ). The vertical ionization potentials Is and electron affinities as were obtained from the HOMO-LUMO values, which was used to define the global electrophilicity value ω (Table 4) . It is a measure of the energy stabilization of the system (Siddiqui et al. 2012) . Chemical hardness quantifies the resistance to change in the electron distribution in a collection of nuclei and electrons (Siddiqui et al. 2012; Kalaiarasi and Manivarman 2017) . The calculated values of the chemical potential, chemical hardness and global electrophilicity of C33 are − 5.267 eV, 3.631 eV and 3.820 eV, respectively. The small value of η for C33 indicates that it is relatively soft on the scale of hardness. By designation, the electrophilicity index is a degree of the susceptibility of chemical species to accept electrons (Siddiqui et al. 2012; Kalaiarasi and Manivarman 2017) . The active regions on our three compounds were elucidated by measuring their molecular surface electrostatic potentials (MEP) through quantum chemical calculations (Benbouguerra et al. 2021; Mubarik et al. 2021) . This evaluation will locate the site of chemical reactivity of our molecules which is represented by the 3D maps at optimized geometry in Fig. 3D -F. The electrostatic potential difference is indicated by color gradients that is valuable in investigating the link between molecular structure and physicochemical property correlation of molecules, biomolecules and drugs inclusive. Based on the color gradients, the positive potentials corresponding to the nucleophilic reaction sites are depicted in blue color, while the negative potentials regions relate to the electrophilic reaction sites illustrated in yellow and red colors. However, the areas of zero potential are presented in green color (Fig. 3D-F) . The variance that occurs in the electrostatic potential generated by our molecule is widely responsible for the binding of our molecule to Mpro binding domain, since the binding region cumulatively is expected to have opposing region of electrostatic potential. MEP of C33 clearly indicates the major negative potential sites cover the fluorine atoms of the phenyl and benzyl rings, and oxygen atoms of the morpholine ring, oxoacetamide chain and carboxylic group that were featured in yellow to red color, as they are the binding region for electrophilic attack. The hydrogen atoms of the amino and carboxylic groups bear the maximum level of positive potential while the rest of the compound seems to possess an almost neutral electrostatic potential (Fig. 4) . The IUPAC name of the lead molecule in our current computational study was generated using ChemSketch and MarvinSketch software and the query from both software returned similar name for the molecule which is 2-({[(2S)-2-(2-amino-3,4,5-trifluorophenyl)morpholin-4-yl](oxo) acetyl}amino)-4,5-difluorobenzoic acid (See supplementary information). We found out from this present study that eight (8) molecules from the 100 novel molecules generated from an artificial neural network-based platform showed the ability to serve as oral drugs. Also, C33, C35 and C54 out of these molecules displayed good binding affinity of −8.3 kcal/mol against Mpro. The ADMET profiling indicates that C33 was better than C35 and C54. Finally, the quantum chemical reactivity descriptors analysis showed that C33 was moderately stable and more reactive than the controls. However, there is an urgent need to carry out Molecular Dynamics Simulation (MDS) of at least 100 ns, synthesis of the lead Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. In silico virtual screening, characterization, docking and molecular dynamics studies of crucial SARS-CoV-2 proteins The coronavirus disease 2019 ( COVID-19) pandemic : a review and an update on cases in Africa New α-hydrazinophosphonic acid: synthesis, characterization, DFT study and in silico prediction of its potential inhibition of SARS-CoV-2 main protease admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties Structurebased design of antiviral drug candidates targeting the SARS-CoV-2 main protease SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules An investigation into the identification of potential inhibitors of SARS-CoV-2 main protease using molecular docking study Prediction of drug absorption using multivariate statistics A knowledgebased approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 Structure of M pro from SARS-CoV-2 and discovery of its inhibitors AB initio (DFT) and vibrational studies of the synthesized heterocyclic compound 2-6-oxo-2-thioxotetrahydropyrimidin-41H-ylidene hydrazine carbothioamide Screening of common herbal medicines as promising direct inhibitors of Sars-Cov-2 in silico Theoretical Studies of Vibrational Spectral modes and HOMO, LUMO Studies of Some Synthetic Organic Compounds LigPlot + : multiple ligand À protein interaction diagrams for drug discovery Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings Computational study of structural, molecular orbitals Simple selection criteria for drug-like chemical matter Computational Drug Design Methods-Current and Future Perspectives. In: In Silico Drug Design Active compounds activity from the medicinal plants against SARS-CoV-2 using in silico assay Informatics and computational methods in natural product drug discovery : a review and perspectives 2021) 5-amino levulinic acid inhibits SARS-CoV-2 infection in vitro Nantasenamat C (2020) Towards reproducible computational drug discovery Electronic structure, nonlinear optical properties, and vibrational analysis of gemifloxacin by density functional theory. Spectrosc an Shape-based generative modeling for de Novo drug design TSAR, a new graph-theoretical approach to computational modeling of protein side-chain flexibility: modeling of ionization properties of proteins AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading In-silico analysis of the inhibition of the SARS-CoV-2 main protease by some active compounds from selected African plants Molecular docking studies of some selected gallic acid derivatives against five non-structural proteins of novel coronavirus Siraj B (2021b) In silico molecular docking of bioactive molecules isolated from Raphia taedigera seed oil as potential anti-cancer agents targeting vascular endothelial growth factor receptor-2 Molecular properties that influence the oral bioavailability of drug candidates admetSAR 2. 0 : web-service for prediction and optimization of chemical ADMET properties The proteins of severe acute respiratory syndrome coronavirus -2 ( SARS CoV -2 or n -COV19 ), the cause of COVID -19 Exploration on Shufeng Jiedu capsule for treatment of COVID-19 based on network pharmacology and molecular docking The authors declare no known competing financial interests or no personal relationships that could have appeared to influence the work reported in this paper.