key: cord-0061263-q9w2je7l
authors: Broni, Emmanuel; Kwofie, Samuel K.; Asiedu, Seth O.; Miller, Whelton A.; Wilson, Michael D.
title: A Molecular Modeling Approach to Identify Potential Antileishmanial Compounds Against the Cell Division Cycle (cdc)-2-Related Kinase 12 (CRK12) Receptor of Leishmania donovani
date: 2021-03-18
journal: Biomolecules
DOI: 10.3390/biom11030458
sha: 08021742c2f79244b18d6e5d7a36686fe5204c7a
doc_id: 61263
cord_uid: q9w2je7l

The huge burden of leishmaniasis caused by the trypanosomatid protozoan parasite Leishmania is well known. This illness was included in the list of neglected tropical diseases targeted for elimination by the World Health Organization. However, the increasing evidence of resistance to existing antimonial drugs has made the eradication of the disease difficult to achieve, thus warranting the search for new drug targets. We report here studies that used computational methods to identify inhibitors of receptors from natural products. The cell division cycle-2-related kinase 12 (CRK12) receptor is a plausible drug target against Leishmania donovani. This study modelled the 3D molecular structure of the L. donovani CRK12 (LdCRK12) and screened for small molecules with potential inhibitory activity from African flora. An integrated library of 7722 African natural product-derived compounds and known inhibitors were screened against the LdCRK12 using AutoDock Vina after performing energy minimization with GROMACS 2018. Four natural products, namely sesamin (NANPDB1649), methyl ellagic acid (NANPDB1406), stylopine (NANPDB2581), and sennecicannabine (NANPDB6446) were found to be potential LdCRK12 inhibitory molecules. The molecular docking studies revealed two compounds NANPDB1406 and NANPDB2581 with binding affinities of −9.5 and −9.2 kcal/mol, respectively, against LdCRK12 which were higher than those of the known inhibitors and drugs, including GSK3186899, amphotericin B, miltefosine, and paromomycin. All the four compounds were predicted to have inhibitory constant (Ki) values ranging from 0.108 to 0.587 μM. NANPDB2581, NANPDB1649 and NANPDB1406 were also predicted as antileishmanial with Pa and Pi values of 0.415 and 0.043, 0.391 and 0.052, and 0.351 and 0.071, respectively. Molecular dynamics simulations coupled with molecular mechanics Poisson–Boltzmann surface area (MM/PBSA) computations reinforced their good binding mechanisms. Most compounds were observed to bind in the ATP binding pocket of the kinase domain. Lys488 was predicted as a key residue critical for ligand binding in the ATP binding pocket of the LdCRK12. The molecules were pharmacologically profiled as druglike with inconsequential toxicity. The identified molecules have scaffolds that could form the backbone for fragment-based drug design of novel leishmanicides but warrant further studies to evaluate their therapeutic potential.

Leishmaniasis is a worldwide menace that exists in all continents except Oceania and it is endemic in the tropical and subtropical areas in Eastern Africa, Southern Europe, the Middle East, South-eastern Mexico, and Central and South America [1] . Approximately one million new cases and between 26,000 to 65,000 deaths occur annually [2] . Leishmaniasis is a neglected tropical disease caused by the trypanosomatid protozoan Leishmania parasites transmitted to humans through the bites of infected phlebotomine sand flies [3] [4] [5] [6] [7] . The disease manifests in three major forms, namely cutaneous leishmaniasis (CL), mucocutaneous leishmaniasis (MCL), and visceral leishmaniasis (VL) [8, 9] . During the last decade, leishmaniasis has been observed with cases of co-infections in areas including the Mediterranean region, France, Italy, Portugal, Spain, Thailand, and Brazil [10] [11] [12] . Moreover, VL co-infection with HIV-infected patients living in Asia (especially India) and some African countries have been reported [13] .

Leishmaniasis mostly affects people living in poor areas and places further economic stress on scanty financial resources [14] [15] [16] . The savings of most households are depleted to get treatment, while the few others incur debt. Leishmaniasis impacts negatively on the psychological and social status of infected persons. The disfiguring scars lead to various forms of social stigmatization and exclusion from community activities [17] .

Currently, the dearth of effective and affordable drugs is a major problem hindering the eradication of leishmaniasis. Existing drugs are expensive, ranging from USD 30 to 1500 [17] . Paromomycin (PM) is the cheapest option in India, while liposomal amphotericin B (AmBisome) and miltefosine (Milt) costs USD 162-229 and USD 119 per patient, respectively [18] .

Drug resistance is also a major issue facing the existing therapeutic options, hence the need to identify new drug targets. The cell division cycle (CDC)-2-related kinases CRK3, CRK6, and CRK12, which are cyclin-dependent kinases (CDKs) have recently been identified as plausible targets [19, 20] . The overexpression of both CRK12 and the cyclin protein CYC9 have been reported to increase the resistance of L. donovani to pyrazolopyrimidines [20] . However, CRK12 has been reported to exist in a complex with CYC9 [19] [20] [21] . In bloodstream trypanosomes, both CRK12 and CYC9 are critical for proliferation in vitro [21] . Computational modelling studies showed that the most promising compound (GSK3186899), which inhibited the L. donovani parasites in a mouse model, binds to the CRK12 in the ATP binding pocket [19, 20] . Mutation studies also suggested that GSK3186899 binds to CRK12 and not CYC9 since the effectiveness of GSK3186899 was reduced in a mutant version of the CRK12 [19, 20] . The CRK12 is an essential gene for L. donovani and Leishmania mexicana promastigotes [20, 22] and critical in the bloodstream stage of Trypanosoma brucei [21] . It also plays an essential role in the survival of trypanosomatids of Trypanosoma brucei [21] , which corroborates CRK12 as a drug target for parasitic kinetoplastids belonging to the Trypanosoma genus [20, 22] . In addition, the depletion of CRK12 results in the expansion of the flagellar pocket and impairment of endocytosis [21, 23] .

Computer-aided drug design (CADD) has become more advantageous than the traditional approach of high-throughput screening (HTS) as it has helped reduce the wastage of resources in terms of cost, effort, and time by significantly decreasing the number of compounds and filtering out only hits for further HTS. Natural products remain an untapped reservoir of new drug candidates for combating various kinds of diseases. The African flora is rich in biodiversity [24] and can be exploited to produce novel drug candidates from their natural sources. Therefore, the identification of new bioactive compounds via in silico drug design is vital in unravelling novel leads that have the potential to inhibit the activity of L. donovani by targeting the Leishmania donovani cell division cycle (CDC)-2-related kinase 12 (LdCRK12) .

This study seeks to model a reasonably accurate 3D structure of the LdCRK12 and identify potential natural product-derived LdCRK12 inhibitory compounds through virtual screening. It also sought to characterize the mechanisms of binding between the LdCRK12 and potential inhibitory molecules using molecular dynamics (MDs) simulations integrated with molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) [25, 26] . In addition, it undertakes predictive pharmacokinetic and physicochemical profiling as well as the biological activity of compounds to identify potential novel drug-like leads.

A schematic pipeline detailing the step-by-step techniques employed in the study is described in Figure 1 . After modelling and validating the 3D structure of the LdCRK12, structure-based virtual screening (SBVS) was performed to identify compounds with high binding affinity to the LdCRK12 protein. Additionally, the selected hits were docked against the human cyclin-dependent kinase 9 (CDK9) since it is a homologue of the kinase domain of the LdCRK12. Molecular interactions between the receptors and the compounds were investigated using MD studies including MM/PBSA. Chemical absorption, distribution, metabolism, excretion, and toxicity (ADMET) predictions were performed to evaluate the toxicity of the compounds. Thereafter, the biological activity of identified biomolecules was predicted using machine learning-based Open Bayesian techniques [27, 28] . Methodology schema employed in this study for predicting potential antileishmanial compounds. Three modelling techniques comprising Modeller [29, 30] , I-TASSER [31] [32] [33] [34] and Robetta [35] [36] [37] were used to predict potential LdCRK12 structures. Evaluation of the predicted protein structures revealed the reasonably best model. Natural compounds from the African Natural Product Database (AfroDB), as well as the North African Natural Product Database (NANPDB) and known antileishmanial compounds, were docked against LdCRK12 and the human CDK9 receptors. The potential lead compounds were subjected to absorption, distribution, metabolism, excretion, and toxicity (ADMET), biological activity predictions, and molecular dynamics (MDs) computations.

Since the experimental 3D structure of the LdCRK12 does not exist, there was the need to employ modelling techniques to predict a reasonably accurate structure. The protein sequence of the LdCRK12 with UniProtKB ID: A0A3S7WQK2 was retrieved from UniProtKB, a repository for amino acid sequences of proteins [38] [39] [40] .

Three different modelling approaches were employed in this study, comprising I-TASSER Suite [31] [32] [33] [34] , Robetta [35] [36] [37] and Modeller 9.20 [29, 30] to predict the 3D structures of the LdCRK12 protein. The structure of the human CDK9 was retrieved from the protein data bank (PDB) with PDB ID 4BCF. The details used to generate a reasonably valid structure of the LdCRK12 via Modeller 9.20, I-TASSER, and Robetta are described.

The sequence of the LdCRK12 was uploaded into SWISS-MODEL which performed a basic local alignment search tool (BLAST) search to obtain suitable templates that were identical to the target sequence [41] . A BLAST search was also conducted on the kinase domain using the BLAST option in UniProtKB. The most suitable template was then selected for modelling.

EasyModeller 4.0 [42] , a graphical user interface (GUI) for Modeller was used to model the structure of LdCRK12. The sequence and the selected template (PDB ID 4BCF) were imported into EasyModeller 4.0. Sequence alignment was performed to predict the secondary structure of the protein by using the selected template and the sequence of the LdCRK12. Modeller then used the outcome to generate five models from which the best is selected based on their discrete optimized protein energy (DOPE) scores. DOPE is a statistical potential score used to evaluate homology models in protein structure prediction. For the same target, the model with the lowest DOPE score was chosen as the best [30, 43] .

I-TASSER (https://zhanglab.ccmb.med.umich.edu/I-TASSER/; accessed on 22 October 2019) was employed to predict the structure of LdCRK12. The amino acid sequence of the LdCRK12 protein was uploaded into the I-TASSER platform and 5 protein structures were predicted using default parameters.

The LdCRK12 amino acid sequence was also uploaded into Robetta (https://robetta. bakerlab.org; accessed on 27 February 2020) and the "comparative modelling (CM) only" option was selected. Robetta then parsed the sequence into putative domains and built models for the domains which are homologues to solved protein structures using comparative modelling [37] . Five protein structures were predicted using default parameters.

The quality of the generated models was assessed via SAVESv5.0 (http://servicesn. mbi.ucla.edu/SAVES/; accessed on 9 March 2020) along with Ramachandra plots from PROCHECK (https://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html; accessed on 6 February 2021) [44] . The z-score obtained from ProSA-web [45, 46] , an indication of the overall model quality of the structures, was also determined. The z-score determines whether the input model is of X-ray or NMR quality. The local model quality of the structures was also determined by plotting the energies as a function of protein residue position. The positive values signify problematic or erroneous parts of the input structure. The reasonably best structure was selected based on the quality assessments performed.

Computed Atlas of Surface Topography of proteins (CASTp) [47, 48] was used to predict potential binding sites of the LdCRK12 protein. Chimera and PyMOL were used to assess the features of the predicted binding sites [49] [50] [51] .

The ligands were obtained from the African Natural Product Database (AfroDB) and the North African Natural Product Database (NANPDB) [52, 53] . A total of 6842 compounds were obtained in 2D spatial data file (sdf) format from the NANPDB and were converted to 3D structures using Open Babel's "gen3d" option. Additionally, 880 compounds were retrieved from the AfroDB in 3D sdf format. A total of 7722 compounds were obtained for this study by combining the two databases and removing duplicates. Additionally, the compound libraries were filtered based on Lipinski's rule of five.

Compounds labelled 5, 7, and 8 which showed very good half-maximum effective concentration (EC 50 ) values in a mouse model of visceral leishmaniasis by inhibiting CRK12 were used in this study [20] . Amphotericin B, miltefosine, and paromomycin were also included in the study. GSK3186899 (also known as compound 7 or DDD85365), amphotericin B, miltefosine, and paromomycin were retrieved from PubChem with compound identifiers (CIDs) 122429808, 5280965, 3599, and 165580, respectively. MarvinSketch 17.17.0 was used to generate the 3D sdf of compounds 5 and 8. Additionally, a 2-amino-4-heteroarylpyrimidine inhibitor (Code: T6Q), an inhibitor of the human CDK9 was extracted from the complex and saved in sdf format. All ligand structures were then energy minimized using the universal force field (UFF) under the Conjugate Gradient algorithm in 200 steps before being converted to the partial charge and atom type (pdbqt) file format of the Protein Data Bank using Open Babel.

Both LdCRK12 and the human CDK9 were energy minimized using the Optimized Potentials for Liquid Simulations (OPLS)/All Atom (AA) force field in GROMACS 2018. PyMOL (PyMOL Molecular Graphics System, Version 1.5.0.4, Schrödinger, LLC) was used to visualize the energy minimized structures and to remove the water molecules surrounding the protein. The protein structures were then saved in the Protein Data Bank format (pdb) using PyMOL. The protein structures were then converted to AutoDock Vina's compatible pdbqt format using the "make macromolecule" option in PyRx.

Autodock Vina was employed for the virtual screening process [54] . The pre-filtered library and the known drugs were screened against the LdCRK12 using a grid box dimension of 91.21 × 93.45 × 78.24 Å 3 and centered at (74.47, 128.44, 81.76) Å to cover the kinase domain. Compounds that possessed binding energies higher than −8.5 kcal/mol were not selected. A more stringent threshold was used herein since a previous study showed that −7.0 kcal/mol which was defined for AutoDock users can significantly distinguish between putative specific and non-specific protein-ligand bonds [55] . The result was then inspected visually using PyMOL to select the best docked ligands.

The known ligands and the selected compounds were re-docked to the human CDK9 using AutoDock Vina. The CDK9 protein was remodelled using the existing CDK9 structure (PDB ID: 4BCF) as a template via Modeller before molecular docking studies due to missing residues. A grid box with the dimension of 80.86 × 62.73 × 91.07 Å 3 and center (81.89, 80.83, 70.34) Å was specified for the CDK9. Compounds that demonstrated a higher binding affinity to the human CDK9 than 2-amino-4-heteroaryl-pyrimidine were not considered for downstream analysis.

The interactions between LdCRK12 and the ligands were determined and analyzed via LigPlot + v1.4.5 using default parameters [56] . Additionally, the human CDK9-ligand interactions were investigated.

Selected compounds with high binding affinities with the LdCRK12 protein and low binding affinities to the human CDK9 were subjected to absorption, distribution, metabolism, and excretion (ADME) evaluation using SwissADME [57] . The toxicity profiles of the selected compounds were evaluated using OSIRIS Property Explorer in DataWarrior 5.0.0 [58] . DataWarrior uses features of chemical structures to predict physicochemical properties. The algorithm in the OSIRIS Property Explorer predicts the likelihood of a drug being a mutagenic, tumorigenic, irritant, and possessing a reproductive effect. Prediction of activity spectra for substances (PASS) was used to predict the biological activity of the compounds. PASS predicts the biological activity spectra of compounds using the simplified molecular input line entry system (SMILES) files of the structures based on the Bayesian approach [27, 28] .

The inhibitory constant (Ki) was calculated using the binding energies of the selected compounds along with other metrics consisting of ligand efficiency (LE), LE scale (LE_Scale), fit quality (FQ), and LE-dependent lipophilicity (LELP). The abovementioned metrics were determined using the method described previously [59, 60] .

A 10 ns MD simulation was performed for LdCRK12 and protein-ligand complexes using GROMACS 2018 [61, 62] . Xmgrace [63] was used to plot the graphs generated from the MD simulations. The binding free energies of the complexes were calculated using the MM/PBSA method [25] . MM/PBSA calculations of the complexes were carried out using g_MM/PBSA, which calculates binding energy components and the individual energy contributions of the residues [25] . The graphs from the MM/PBSA computations were generated using the R programming package [64] .

The results of the molecular modelling, molecular docking, ADMET evaluation, prediction of antileishmanial activity and MD simulations are presented.

There was the need to model the structure of the LdCRK12 since there is no available structure in the protein data bank. An earlier study modelled the structure of the LdCRK12 using Molecular Operating Environment (MOE version 2014.09; Chemical Computing Group, Inc.) [20] . MOE-Homology combines segment-matching and methods of inserting or deleting regions to model protein structures. Advanced knowledge-based loop searching and sidechain rotamer selection methods are then employed to build models by default. An average model is then generated by MOE for a user-controlled energy minimization [65] .

Studies have compared the quality of protein structures generated using different modelling techniques [65] [66] [67] . No technique has been found to be superior in every aspect to the others [65, 66] . The protein family and the sequence identity between the query and template structures influence the quality of a model built using a homology modelling technique [66] . A comparison study of various homology modelling algorithms including MOE, I-TASSER, Rosetta, PRIME, SWISS-MODEL, Composer, and ORCHESTRAR reported that all the techniques produced high quality models when the sequence identity between the query and the template is greater than 35% [66, 67] . However, for low sequence identities, it becomes difficult for the modelling algorithms to produce high-quality structures [66] . It is therefore imperative that different modelling techniques are used to build protein structures that have relatively low sequence identities to their templates. The quality of the modelled structures must be assessed to select the reasonably best model.

Herein, three freely accessible and widely used techniques comprising Modeller, I-TASSER and Robetta were employed to predict structures of the LdCRK12. The present study compares the structures from these three techniques to select the reasonably best model, as carried out previously [68] [69] [70] .

A BLAST search was performed to retrieve identical structures as suitable templates for modelling the LdCRK12 structure. The BLAST search via SWISS-MODEL revealed 5449 templates with a sequence identity lower than 30%. A further BLAST search was conducted on the kinase domain (amino acid residues 459-833) using the BLAST option (BLASTP 2.9.0+) by selecting BLOSUM62, the most commonly used scoring matrix in BLAST [71] . The search revealed six reviewed protein structures that were identical to the kinase domain of the LdCRK12 ( Table 1 ). One of the most widely used template selection criteria is to select the model with the highest sequence identity to the protein sequence. The quality of the experimentally determined structure is also an important factor to consider in the template selection. The reasonably best template was selected based on the E-value, sequence identity, query coverage, and the availability of a 3D structure. The human CDK9 was thus selected as the template for modelling the LdCRK12 via Modeller 9.2 as described previously [20] . Although, sequences O14098 and Q9TVL3-2 had sequence identities of 36% and 35% and BLAST scores of 356 and 348, respectively, they were not selected due to their relatively low coverage to the LdCRK12 (Table 1) . Cyclin-dependent kinase 9 (CDK9) of humans, rats, and mice had the same E-value, BLAST score, and sequence identity of 7.4 × 10 −34 , 345, and 31.3%, respectively. The three proteins also had better coverage of the LdCRK12. However, the human CDK9 was the only protein with a solved 3D structure. [20] . The human cyclin-dependent kinase 9 (CDK9) is a cdc2-like serine/threonine kinase whose related pathways have been associated with various human malignancies and cardiomyocyte hypertrophy. The sequence of the LdCDK12 was aligned to the template sequence and five structures were modelled using Modeller 9.2.

The qualities of the five generated models were evaluated using the DOPE and genetic algorithm 341 (GA341) scores. The GA341 score, which is derived from statistical potential, assesses the reliability of a model [72] . A model can be said to be reliable when the GA341 score is higher than the determined threshold of 0.7. The five generated models using Modeller 9.2 had a GA341 score lower than the 0.7 cut-off, thus the DOPE score was used to select the most suitable model. The DOPE score is also a statistical potential score used to assess predicted models. The reasonably best model is selected by choosing the structure with the least DOPE value [30, 43] . The DOPE and GA341 scores of the five predicted models from Modeller 9.2 are shown ( Table 2 ). For the Modeller generated structures, model MOD5 was selected as the most suitable structure of the LdCRK12 due to its very low DOPE score of −50486.88281 (Table 2 and Supplementary file 1). I-TASSER was used to generate five protein structures of the LdCRK12. Based on the magnitude regarding the threading template alignments and the convergence parameters of the structure assembly simulations, I-TASSER computed a confidence rating for each model, which is known as the C-score. A higher C-score value represents a model with higher confidence and is usually in the range of (−5, 2) [31] [32] [33] [34] . Out of the five generated I-TASSER structures, model ITAS5 was selected as the most suitable model due to its high C-score of −2.66 (Table 3 and Supplementary file 2). Table 3 . Predicted I-TASSER models and C-scores.

Robetta was also employed to model five structures of the LdCRK12. Robetta uses the ROSETTA to model protein structures either by comparative modelling or ab initio. For the LdCRK12, Robetta used comparative modelling to predict plausible structures (Table 4 ). ROB1 was considered as the reasonably best model since the predicted models are ranked based on the model quality assessment method available in ProQ2 after clustering. The predicted b-factors by color representation of the models were also visualized in Pymol. The b-factor, which influences the local quality of a model, shows the parts of the structure that were remodelled and not covered by a template. These regions are the least accurate and have the most variation between models. All five predicted structures showed similar b-factor coloration. Therefore, the five models were further evaluated using SAVES v5.0 (Table 4 ). ROB1 had a VERIFY score of 82.97%, which was the highest; ERRAT quality factor of 88.0579; PROVE score of 0.0% and four PROCHECK errors, three warnings, and two passes (Table 4 ). ROB1 was thus selected as the most acceptable structure from Robetta (Supplementary file 3). 

The quality of the best models from each of the three techniques was assessed using SAVES v5.0. Modelled protein structure MOD5 had poor values for all the quality metrics (Table 5) . MOD5 had VERIFY, ERRAT, and PROVE scores of 41.20%, 10.0536, and 16.1%, respectively. MOD5 was also predicted by PROCHECK to have five errors, two warnings, and one pass (Table 5 ). ITAS5 had very good VERIFY and ERRAT scores of 85.36% and 80.2158, respectively. Although ITAS5 had the highest VERIFY score, it was predicted using PROVE to be 9.5% erroneous ( Table 5) . PROCHECK also predicted ITAS5 to have six errors, two warnings, and one pass. ROB1 had the highest ERRAT quality factor of 88.0579 and 0.0% erroneous parts, as predicted by PROVE ( Table 5 ). The ERRAT error plots for MOD5, ITAS5, and ROB1 were generated ( Figure S1 ). MOD5 had the most erroneous or misfolded regions (Figures 2a and S1A) , while ROB1 had the lowest error rate for protein folding (Figures 2c and S1C) . Furthermore, the kinase domain of the LdCRK12 (residues 459-833) in the ROB1 structure was not predicted to have any misfolded or erroneous regions ( Figure  S1C ). ITAS5 was also observed to have few misfolded portions (Figures 2b and S1B ). The Ramachandran plots of all the three shortlisted models were obtained using PROCHECK which evaluates the stereochemistry of protein models by determining residue-by-residue geometry and overall structure geometry [44] . A protein structure is considered as quality based on the percentage of residues in the most favored (core), additionally allowed, generously allowed, and disallowed regions [73] . Protein structure MOD5 had 79.7%, 15.5%, 3.0%, and 1.8% of residues in the most favored, additionally allowed, generously allowed, and disallowed regions, respectively (Table 6 and Figure S2A ). ITAS5 was also predicted to have 61.0%, 29.8%, 5.9%, and 3.3% of residues in the most favored, additionally allowed, generously allowed, and disallowed regions, respectively (Table 6 and Figure S2B ). For the ROB1 structure, 82% of the amino acid residues were present in the most favored region, 17.1% residues were found in the additionally allowed regions, 0.4% of residues were in the generously allowed regions, and 0.4% in the disallowed region (Table 6 and Figure 3 ). The Ramachandran plots revealed that the model ROB1 had the most reasonably good structure (Table 6 , and Figures 3 and S2A,B) . Table 6 . Ramachandran plot statistics for the best models from the 3 modelling techniques. For all 3 models, the number of end residues (excluding Gly and Pro) = 2, Glycine residues = 65, Proline residues = 85, and the total number of residues = 881. The quality of the overall best model (ROB1) was evaluated using the z-score from ProSA-web [45, 46] . The overall best model was predicted to be of X-ray quality and had a zscore of −9.7 (Figure 4a ). The local model quality of the chosen model was also determined by plotting the energies as a function of amino acid residue position. Most of the residues were predicted to have negative energy values, signifying a very good model ( Figure 4b ). Generally, positive values signify problematic or erroneous parts of the input structure.

A binding site is a region on a protein that binds to a ligand or another macromolecule with specificity [74] . CASTp was employed to predict the binding sites of the LdCRK12. At the active site, a ligand or a substrate binds to an enzyme to induce a chemical reaction [75] . CASTp uses the Delaunay triangulation, alpha shape, and discrete flow methods to identify topographic features, measure areas and volumes [76, 77] .

CASTp predicted 127 binding sites for the chosen LdCRK12 protein model. The predicted binding cavities with no openings and with relatively small volumes and areas such that no ligand could fit were ignored [59, 78] . Since the modelled structure had many disordered regions from residues Met1 to about Ala400, only binding cavities predicted to border the kinase domain (459-833) were considered. A total of 14 binding sites were selected after visualization in Chimera 1.12 and Pymol. The residues lining each of the 14 binding sites are shown (Table 7 ). Pocket 7 was observed to overlap with pocket 3 (Table 7) . Aligning the 3D structures of the LdCRK12 and the human CDK9 in complex with T6Q revealed that pocket 1 is the ATP binding site of the kinase domain ( Figure 5 and Table 7 ). 

A total of 7722 African natural compounds were used as the screening library [52, 53] . Additionally, Lipinski's rule of five was used to filter the library to obtain 4409 compounds comprising 3872 and 537 ligands from the NANPDB and AfroDB, respectively.

Three known antileishmanial drugs, namely amphotericin B, miltefosine, and paromomycin, were also retrieved from PubChem with CIDs 5,280,965, 3599, and 165,580, respectively. Three inhibitors of LdCRK12 comprising compounds 5, 7, and 8, which had very good EC 50 values in mouse models of visceral leishmaniasis ranging from 0.005 µM to 2 µM, were also used. The 3D structure of GSK3186899 was downloaded from Pub-Chem with CID 122,429,808 whereas those of compounds 5 and 8 were generated using MarvinSketch 17.17.0. Additionally, a 2-amino-4-heteroaryl-pyrimidine inhibitor (Code: T6Q), complexed with the human CDK9 (PDB ID: 4BCF) was extracted from the complex and saved in sdf format. All ligand structures were energy minimized using the universal force field (UFF) under the Conjugate Gradient algorithm in 200 steps and converted to the partial charge and atom type (pdbqt) format using Open Babel before the virtual screening.

Autodock Vina was used for the virtual screening process [54] . The compounds were first screened against the LdCRK12. Compounds with good pose and low binding energies against the LdCRK12 were re-docked against the human CDK9 to select compounds that are less likely to interact with critical residues of the human CDK9.

The pre-filtered library comprising a total of 4409 compounds and the known inhibitors were screened against the energy minimized LdCRK12 using a grid box dimension of 91.21 × 93.45 × 78.24 Å 3 and centered at (74.47, 128.44, 81.76) Å to cover the kinase domain of the protein. A total of 4369 compounds were successfully screened against the LdCRK12. A stringent threshold of −8.5 kcal/mol was used to select the compounds after the virtual screening process. This threshold was used since it has been shown that an AutoDock score of −7.0 kcal/mol differentiates well between certain and uncertain protein-ligand interactions [55] . A total of 290 compounds had binding energies less than or equal to −8.5 kcal/mol. AutoDock Vina uses a negative function to rank the output in the order of decreasing binding affinity, thus, the higher the negativity, the more plausible the candidate as a potential lead compound.

The protein-ligand complexes were then inspected visually using PyMOL to select the best docked ligands. A total of 17 compounds were eliminated since they did not dock deep into the LdCRK12. Additionally, based on the generated protein-ligand interaction profiles, 27 compounds that did not exhibit any hydrogen bonding with LdCRK12 were excluded. A total of 246 compounds were thus selected from the virtual screening output. Of the 246 compounds, ZINC000095485940 demonstrated the least binding energy to the LdCRK12 with a value of −10.1 kcal/mol. NANPDB1406, NANPDB2581 and NANPDB6446 also demonstrated low binding energies of −9.5, −9.2 and −9.1 kcal/mol, respectively. ZINC000095485940, NANPDB1406, NANPDB2581, and NANPDB6446 demonstrated a higher binding affinity to the LdCRK12 than all the known inhibitors used in this study (Table 8 ). Among the known inhibitors, the compound 8 and TQ6 demonstrated the least binding energy of −9.1 kcal/mol to the LdCRK12. Compound 8 was reported to inhibit the Leishmania parasite with EC 50 values of 0.025 µM and 0.075 µM in the axenic and intramacrophage assays, respectively. GSK3186899, paromomycin, and compound 5 also had binding energies of −8.5, −7.9, and −7.2 kcal/mol, respectively (Table 8 ). These three compounds demonstrated binding energies lower than the −7.0 kcal/mol threshold defined for AutoDock users [55] . This implies that these compounds have the potential to demonstrate significant inhibitory activities against the parasite as exhibited by compounds 5 and 7 previously [20] . Miltefosine demonstrated the highest binding energy of −5.0 kcal/mol to the LdCRK12 (Table S1 ).

Since the kinase domain is conserved and the human CDK9 is homologous to the LdCRK12, there was the need to screen the shortlisted compounds against the CDK9. A total of 246 were re-docked against the human CDK9 to select compounds with a relatively low binding affinity to the CDK9, which were less likely to interact with critical residues of the human CDK9. Before the virtual screening, the CDK9 was remodelled with PDB structure 4BCF as a template using Modeller 9.2 due to missing residues. Residues 1-5, 89-96, 177-181, and 327-330 were missing in the human CDK9 structure. The complete sequence of the human CDK9 was retrieved from UniProt with ID P50750 [38] [39] [40] . The sequence was aligned to the 4BCF structure and five models were generated using Modeller 9.2. The qualities of the five models were assessed using the DOPE and GA341 scores. All the modelled structures had a GA341 score of 1, thus the structure with the lowest DOPE (−38809.11328) score was chosen.

Ligands that docked into the ATP binding site of the human CDK9 were not considered for downstream analysis. Additionally, compounds with similar binding energy against the CDK9 as T6Q were eliminated to prevent the likelihood of drug off-target binding. T6Q had a binding energy of −8.6 kcal/mol when docked into the ATP binding site of the human CDK9 (Table 8) .

Compounds 8, GSK3186899, and 5 were observed to have binding energies of −9.0, −8.8 and −8.6 kcal/mol against CDK9, respectively (Table 8 ). However, GSK3186899 had an IC 50 value higher than 20 µM against the human CDK9 [20] . Miltefosine had the lowest binding affinity to CDK9, with a binding energy of −5.6 kcal/mol (Table 8) .

ZINC000095486260 demonstrated the highest binding energy (−6.4 kcal/mol) against CDK9, followed by NANPDB4609 and NANPDB328, with both having a binding energy of −6.6 kcal/mol each (Table 8) . ZINC000095485940, NANPDB1406, NANPDB2581, and NANPDB6446, which had the highest binding affinity to LdCRK12, had binding energies of −7.7, −7.3, −7.5 and −7.3 kcal/mol with the human CDK9, respectively (Table 8) . A total of 133 compounds with a high binding affinity against the CDK9 were eliminated.

The protein-ligand interactions were determined for both LdCRK12and the human CDK9-ligand complexes using LigPlot + v1.4.5 [56] .

Most compounds were observed to dock into the ATP binding pocket, consistent with pocket 1 ( Table 7; Table 8 ) with paromomycin and T6Q docking into the ATP binding cavity. Compounds 5 and 8 docked into pocket 14 and formed hydrogen bonds with Leu723 ( Table 7; Table 8 ). Compound 5 formed 2 hydrogen bonds with Leu723 of lengths 2.98 and 3.07 Å, and interacted with Gly724, Pro725, Leu726, Pro727, Pro728, Val731, Tyr732, Leu743, Asn763, Trp764, Gln815, Leu816, Asp817, and Gln820 via hydrophobic bonds. Compound 8, which had a binding affinity of −9.1 kcal/mol with the LdCRK12 interacted via one hydrogen bond with Leu723 (2.83 Å), and formed hydrophobic contacts with Gly724, Pro725, Leu726, Pro727, Pro728, Val731, Leu743, Glu747, Asn763, Trp764, Gln815, and Leu816. The interactions between compounds 5 and 8 with these residues may account for their high L. donovani inhibitory activity.

GSK3186899, which docked into pocket 1, interacted with Ser466 (2.96 Å), Gly468 (3.19 Å), Lys488 (3.03 Å), Ser544 (3.27 Å), Thr625 (3.12 Å), Asp626 (3.31 Å, 3.3 Å), and Tyr691 (2.98 Å) via hydrogen bonding, and Gly468, Thr469, Tyr470, Val473, Ala486, Lys488, Phe563, Lys610, Asp612, Leu615, Asp626, and Tyr691 via hydrophobic bonding (Figures 6d and S3D , and Tables 7 and 8 ). The multiple hydrogen bonding formed between GSK3186899 and the LdCRK12 may be a key influencer of its activity [79] . Table 8 ). ZINC000095485940 also formed hydrophobic contacts with Leu465, Ser466, Thr469, Val473, Ala486, Lys488, Ser544, Phe563, Asp612, Asn613, Leu615, and Thr625 (Figures 6a and S3A , and Table 8 ). NANPDB1406 interacted with Lys488, Ala566, and Ser569 via hydrogen bonds and also formed hydrophobic bonds with Leu465, Ser466, Gly468, Val473, Ala486, Tyr565, Thr567, Ala568, Asp612, Leu615, and Asp626 (Figures 6b and S3B , and Table 8 ). NANPDB2581 formed a hydrogen bond with Lys610 with a bond length of 3.08 Å and hydrophobic contacts with Leu465, Ser466, Thr469, Tyr470, Ala568, Ser569, Asp612, Asn613, Leu615, and Asp626 (Figures 6c and S3C , and Table 8 ). NANPDB6446 also interacted with the LdCRK12 via hydrogen bonds with Ser569 and Arg575, and hydrophobic bonds with Leu465, Ser466, Ala568, Gly572, Asp612, and Asp626. The formation of multiple hydrogen bonds between an enzyme and a molecule influences the activity of the compound [79] . Leu465, Ser466, Thr469, Ala486, Ala568, Ser569, Asp612, Asn613, Leu615, and Asp626 are predicted as critical residues for ligand binding in the ATP binding pocket.

Amphotericin B docked into binding pocket 3 forming hydrogen bonds with Arg603, Pro635, Tyr845, Gln846, and Arg847 (Table S1 ). Amphotericin B also interacted with Pro635, Gly636, Thr642, His643, Glu669, Lys670, Thr823, Glu826, Tyr845, Gln846, Arg847, and Leu849 via hydrophobic contacts. These residues were found to line binding pocket 3 (Table 7) . NANPDB2521 and NANPDB1011 also formed interactions with the aforementioned residues.

Miltefosine formed 2 hydrogen bonds with Gly422 with bond lengths of 3.05 and 3.1 Å, and interacted with Leu181, Gly344, Ile345, Thr396, Arg397, Ala399, Pro401, Thr418, Pro419, Tyr420, Pro421, Gly422, Tyr428, and Arg432 via hydrophobic bonds (Table 8) , which lined pocket 9 ( Table 7) . Pockets 3 and 9 are worthy of further experimental exploration.

The human CDK9-ligand interactions were also investigated ( Table 8 and Table S1 ). Ligands which interacted with the critical residues of the human CDK9 (Ile25, Ala46, Lys48, Phe103, Glu107, Asp109, Asp145, Leu156, and Asp167) were not considered for downstream analysis due to the possibility of drug off-target activity [80, 81] . Previous studies have reported on the crystal structures of analogues of 4-(thiazol-5-yl)-2-(phenylamino)pyrimidine-5-carbonitrile bound to CDK9/cyclin T [82, 83] . The compounds demonstrated Ki values ranging from 6-43 nM with an increase in the thermal stability of CDK9/cyclin T [82] . It was reported that the thiazole, pyrimidine, and aniline moieties docked into the ATP binding site and formed a hydrogen bond with the hinge region of the kinase [82] . The pyrimidine ring was observed to lie between Ala46 and Leu156 while the C5-carbonitrile was reported to form a lone pair−π interaction with an average distance of 3.7 Å with the gatekeeper residue Phe103 [82] . Hydrogen-bonds were also formed between the compounds and residues Ile25, Lys48, Asp145, with Glu107 and Asp167 [82] . Other studies have corroborated the above listed residues as being critical to CDK9-ligand binding [84, 85] . A molecular docking study involving CDK9 and BAY-958 also reported BAY-958 to form a hydrogen-bond with Asp109 [84] .

A total of 94 compounds that interacted with the critical residues of the human CDK9 were eliminated from this study. A total of 19 compounds with a relatively high binding affinity to the LdCRK12 and did not interact with the critical residues of the human CDK9 were obtained.

Though the screening library was pre-filtered using Lipinski's rule, Veber's rule was further applied to the 19 identified compounds, of which two failed. NANPDB4609 and NANPDB3239 violated Veber's rule due to their high total polar surface area (TPSA) values of 151.96 and 145.91, respectively. Veber's rule requires a TPSA of no more than 140 Å 2 [86] . Compounds with a TPSA not more than 140 Å 2 are considered to have good oral bioavailability [86] . TPSA values are considered as good indicators of excellent human intestinal absorption (HIA) and Caco-2 permeability [87] . The calculated logP (cLogP) values were also determined using the OSIRIS DataWarrior 5.0.0 (Table S2) .

Most of the compounds were predicted to be moderately soluble, including compounds 5, 7, and 8 (Table S2) . Compound 5 was shown experimentally to have poor solubility and is metabolically unstable although it was the most potent against LdCRK12 with an EC 50 value of 0.014 µM in the intra-macrophage assay [20] . NANPDB6446 was predicted to be very soluble while NANPDB1406 was predicted to be moderately soluble. ZINC95485940, NANPDB1406, and NANPDB1649 were also predicted to be soluble (Table S2) .

The potential of a drug to move across the blood-brain barrier to the brain is referred to as BBB permeation. Only NANPDB2581, NANPDB2582, NANPDB3614, NANPDB1649 and ZINC000095485880 were predicted to have permeation into the brain-blood barrier (BBB) [ Table S2 ]. In the brain, the drug binds to specific receptors to activate certain signaling pathways. Additionally, for a drug to exhibit the desired pharmacological activities with the brain parenchyma, it needs to be able to permeate the BBB [88] . T6Q, compound GSK3186899, compound 5, NANPDB4609, NANPDB3239, amphotericin B, paromomycin, and miltefosine were predicted to have low gastrointestinal (GI) absorption, which suggests a low probability of successful absorption into the bloodstream (Table S2) . Another factor considered was the likelihood of the compounds to be non-Pglycoprotein (P-gp) substrates. P-gp aids in the removal of drugs or xenobiotics from the central nervous system (CNS) by functioning as a biological barrier by removing toxins and xenobiotics from cells. It is also crucial in the absorption and distribution of drugs [89] . All the inhibitors or drugs used in this study were predicted to be P-gp substrates (Table S2) . Of the top 19 hits, 10 compounds were predicted to be non-P-gp substrates (Table S2 ) and may likely have desirable distribution in the circulatory system upon administration.

The toxicity profiles of the 17 hits and the known drugs were determined using OSIRIS DataWarrior 5.0.0 (Table S3) . Of the 17 hits, 13 compounds were predicted to be non-tumorigenic, non-mutagenic and non-irritant, and to have no reproductive effects (Table S3 ). NANPDB6446 was predicted to be highly mutagenic, tumorigenic, and irritant. NANPDB6446 can serve as a scaffold for fragment-based drug design due to its relatively low molecular weight of 365.381 g/mol.

NANPDB3614 and ZINC000000828203 were also predicted to be highly tumorigenic while NANPDB3284 was predicted to have reproductive effects (Table S3 ). Compounds 5, 8, amphotericin B, miltefosine, paromomycin, and T6Q were also predicted to have no mutagenicity, tumorigenicity, irritancy, and reproductive effect (Table S3 ). GSK3186899 was predicted to possess low tumorigenicity though it was non-mutagenic, non-irritant, and had no reproductive effect (Table S3 ). GSK3186899 was selected as the preclinical candidate due to its effectiveness, efficacy, pharmacokinetics, and safety profile [20] . GSK3186899 was reported to possess L. donovani inhibitory activity in cidal axenic amastigote and intra-macrophage assays with EC 50 values of 0.1 and 1.4 µM, respectively [20] .

The biological activities of the 17 identified hits were determined using PASS, an Open Bayesian machine learning technique. Structure descriptors, which are also referred to as multilevel neighborhoods of atoms (MNAs) descriptors, were generated as inputs [27] .

A total of 13 compounds were predicted to possess antiprotozoal activity, of which 10 were predicted to be antileishmanial (Table S4) . NANPDB1406, NANPDB2521, NAN-PDB3435, NANPDB3284 and ZINC000095486260 were predicted as kinase inhibitors (Table S4 ). Since the LdCRK12 has a kinase domain, these predictions necessitate the in vitro testing of these compounds to validate their anti-LdCRK12 and antileishmanial properties. Fifteen of the hits were predicted to possess antineoplastic (anticancer) activity (Table S4) . A review on the in vitro leishmanicidal potential of anticancer compounds suggested the use of antineoplastic compounds for the treatment of leishmaniasis [90] . Protein kinase inhibitors such as sunitinib, sorafenib, and lapatinib which are used for treating cancers were reported to be active against Leishmania donovani amastigotes in murine macrophages with IC 50 values of 1.1, 3.7, and 2.5 µM, respectively, showing similar efficacy to that of miltefosine (IC 50 = 1.0 µM) [91] . Sunitinib, sorafenib, and lapatinib were also reported to be non-toxic to mammalian cells [91] . NANPDB1011, NANPDB3949, ZINC000095486260, NANPDB3435, NANPDB3284 and NANPDB2521 were predicted to possess dermatological activities. These compounds may be beneficial in treating post kala-azar dermal leishmaniasis (PKADL). NANPDB1649 (sesamin) has been reported to be active against Leishmania amazonensis with an IC 50 value of 15.8 µg/mL and was not cytotoxic to macrophage cells with CC 50 values greater than 100 µg/mL [92] . Additionally, ZINC000000828203 (diphyllin) isolated from Haplophyllum bucharicum (Rutaceae) has been reported to demonstrate antileishmanial activity against Leishmania infantum promastigotes and intracellular amastigotes with IC 50 values of 14.4 µM and 0.2 µM, respectively [93] . NANPDB3614 (justicidin B) has also been shown to be a potential antiprotozoal agent by showing antitrypanosomal activities against Trypanosoma brucei rhodesiense and Trypanosoma cruzi with IC 50 values of 0.2 and 2.6 µg/mL, respectively [94] . Since Leishmania and Trypanosoma are trypanosomatids, repurposing NANPDB3614 for the development of therapeutic agents for leishmaniasis can be explored.

Quality metrics for the top compounds such as the inhibitory constant (Ki), ligand efficiency (LE), fit quality (FQ), LE scale (LE_scale), and LE-dependent lipophilicity (LELP) were determined as described previously [59, 60] . The predicted Ki values ranged from 0.039 to 0.587 µM (Table 9 ). ZINC000095485940 demonstrated the lowest predicted Ki value of 0.039 µM while NANPDB1649 (sesamin) showed the highest Ki value of 0.587 µM against the LdCRK12 (Table 9 ; Table 10 ). Sesamin was shown to inhibit L. amazonensis with an IC 50 value of 15.8 µg/mL (44.588 µM) [92] . The relatively low Ki values indicate the potential inhibitory activities of the selected compounds [95] . Table 9 . Ligand quality assessment metrics for selected compounds. The metrics include inhibitory constant (Ki), ligand efficiency (LE), LE scale (LE_scale), fit quality (FQ), LE-dependent lipophilicity (LELP), and calculated logP (cLogP).

Binding The ligand efficiency (LE) of the selected compounds ranged from 0.327 to 0.413 (Table 9) which are very close to the average ligand efficiency values reported for fragmentlike compounds (0.38). LE is used to assess the binding affinity, taking into account the number of heavy atoms (NHA) of a molecule [96, 97] . Herein, NANPDB1649 demonstrated the lowest LE value of 0.327. ZINC000095485940, NANPDB6446, NANPDB2581 and NANPDB1406 had LE values of 0.347, 0.380, 0.404 and 0.416, respectively (Table 9;  Table 10 ). Similarly, these LE values are close to the average LE values of fragment-like molecules (0.38) [97] .

The LE_Scale takes into consideration size dependency, which is a limitation of the LE metric. The computed LE_Scale values ranged from 0.347 to 0.416 (Table 9 ), in concordance with the LE_Scale values of similar active compounds with the same number of heavy atoms [98, 99] . ZINC000095485940 had the lowest LE_Scale value of 0.347, while NANPDB1406 had the highest value of 0.416. NANPDB2581, NANPDB6446 and NANPDB1649 also had LE_Scale values of 0.404, 0.380 and 0.380, respectively ( Table 9) .

The fit quality (FQ), which is a more accurate metric used to assess ligand efficiency, is determined as a ratio of the observed LE to the LE_Scale of a compound. The closer the FQ to 1, the more ideal the ligand. The calculated FQ values ranged from 0.861 to 1.003 (Table 9 ), suggestive that the selected molecules have plausible binding to the LdCRK12 receptor [97] .

Another important metric, ligand-efficiency-dependent lipophilicity (LELP) was also computed for the selected molecules. For a promising compound, the recommended LELP should be between 0 and 7.5, although molecules that satisfy Lipinski's rule are reported to have LELP values less than 16.5 [100] . The LELP values of all proposed molecules ranged between 0.521 and 9.861, which suggests that the selected molecules have a good affinity to LdCRK12, considering lipophilicity. ZINC000095485940, NANPDB1406 and

The ligand efficiency (LE) of the selected compounds ranged from 0.327 to 0.413 (Table 9) which are very close to the average ligand efficiency values reported for fragment-like compounds (0.38). LE is used to assess the binding affinity, taking into account the number of heavy atoms (NHA) of a molecule [96, 97] . Herein, NANPDB1649 demonstrated the lowest LE value of 0.327. ZINC000095485940, NANPDB6446, NANPDB2581 and NANPDB1406 had LE values of 0.347, 0.380, 0.404 and 0.416, respectively (Table 9; Table 10 ). Similarly, these LE values are close to the average LE values of fragment-like molecules (0.38) [97] .

The LE_Scale takes into consideration size dependency, which is a limitation of the LE metric. The computed LE_Scale values ranged from 0.347 to 0.416 (Table 9 ), in concordance with the LE_Scale values of similar active compounds with the same number of heavy atoms [98, 99] . ZINC000095485940 had the lowest LE_Scale value of 0.347, while NAN-PDB1406 had the highest value of 0.416. NANPDB2581, NANPDB6446 and NANPDB1649 also had LE_Scale values of 0.404, 0.380 and 0.380, respectively ( Table 9) .

The fit quality (FQ), which is a more accurate metric used to assess ligand efficiency, is determined as a ratio of the observed LE to the LE_Scale of a compound. The closer the FQ to 1, the more ideal the ligand. The calculated FQ values ranged from 0.861 to 1.003 (Table 9 ), suggestive that the selected molecules have plausible binding to the LdCRK12 receptor [97] .

Another important metric, ligand-efficiency-dependent lipophilicity (LELP) was also computed for the selected molecules. For a promising compound, the recommended LELP should be between 0 and 7.5, although molecules that satisfy Lipinski's rule are reported to have LELP values less than 16.5 [100] . The LELP values of all proposed molecules ranged between 0.521 and 9.861, which suggests that the selected molecules have a good affinity to LdCRK12, considering lipophilicity. ZINC000095485940, NANPDB1406 and NANPDB6446 had LELP values of 0.521, 3.761 and 2.370, respectively (Table 9 ).

Molecular dynamics studies the motion of atoms along the course of time by the integration of Newton's equations of motions [101] . Molecular dynamics simulations were performed using GROMACS 2018 to elucidate the dynamic behavior of selected compounds within the active sites of the LdCRK12 protein. The root mean square deviation (RMSD), the radius of gyration (Rg), and root mean square fluctuation (RMSF) were analyzed for the unbound protein and the protein-ligand complexes (Figure 7a-c) .

To evaluate the stability of the LdCRK12-ligand complexes, the RMSD plots of the unbound protein and the LdCRK12-ligand complexes were analyzed (Figure 7a) . The RMSD is a frequently used measure of the differences between the structures sampled during the simulation and the reference structure [102] . MD simulations require systems to be close to their native conformation. The time trajectory of RMSD shows the deviation of a protein structure from a reference structure as a function of time [102] .

The RMSD values of all nine structures experienced a gradual rise from 0 to 3 ns. (Figure 7a ). The LdCRK12-NANPDB2581 complex experienced the longest stability with an average RMSD value of 0.9 nm until about 7 ns where a gradual rise was observed (Figure 7a ). LdCRK12-ZINC000095485940, LdCRK12-NANPDB1649, and LdCRK12-NANPDB6446 complexes were unstable from 0 to about 6 ns where they maintained stable RMSDs with averages of 1.5, 1.55, and 1.5 nm, respectively, until the end of the 10 ns simulation period (Figure 7a ).

This study analyzed the compactness and folding of the unbound protein and the protein-ligand complexes by plotting the radius of gyration over simulation time. The loss of compactness affects the stability of the complex by introducing weak intermolecular bonds. When the Rg of a complex is higher, the compactness of the protein-ligand complex is lower, causing the interactions between ligand and protein to be weaker [103] . A stably folded protein will maintain a relatively steady Rg while the Rg value is likely to change over time if the protein unfolds [104] . The RMSF trajectories of the unbound LdCRK12 structure and LdCRK12-ligand complexes were also investigated. The RMSF reveals the flexibility of different regions of a protein, which can be related to crystallographic B-factors [102] . Residues contributing to the complex structural fluctuation can be assessed by this stability profile analysis. Higher RMSF values imply greater fluctuations. Protein regions involved in ligand binding and catalysis are known to demonstrate greater fluctuations [105] . Adaptive variation in flexibility lies principally in these regions of the protein sequence that affect the conformational stabilities of the protein-ligand complex [105] .

The RMSF plots revealed that all eight compounds caused some degree of fluctuations in similar regions of the LdCRK12 (Figure 7c ). Fluctuations were observed at regions from residue index 30-50, 60-150, 280-350, and 700-800. The highest fluctuation was observed between residues 60-150 followed by residues 280-350, implying they could be involved in ligand binding.

The molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) computation was employed to determine the binding free energies of the LdCRK12-ligand complexes. At a quantitative level, simulation-based methods provide substantially more accurate estimates of ligand binding free energies than other computational approaches such as docking [106] . The calculation of the binding free energy ∆G bind , which is the free energy difference between the ligand-bound state and the corresponding unbound states of protein and ligand, is used to quantify the affinity of a ligand to its target. Assessing the ∆G bind of a series of ligands against a particular target can reveal those ligands with higher binding affinities to the target. Thus, the ∆G bind calculations are important to gain in-depth knowledge about the binding modes of the hits in drug design [107] .

The MM/PBSA calculations showed that compound 8 had the lowest binding free energy of −68.609 kJ/mol (Table 11 ). Compound 5 was also observed to have a binding free energy of −54.023 kJ/mol while GSK3186899 had −27.382 kJ/mol (Table 11) . Compounds 5 and 8 demonstrated better inhibitory activities against L. donovani than GSK3186899, although GSK3186899 was selected as the preclinical candidate due to pharmacokinetics and safety concerns [20] . NANPDB1649 had the lowest binding free energy of −50.434 kJ/mol among the five selected hits (Table 11 ). NANPDB2581, NANPDB6446 and NANPDB1406 also demonstrated low binding free energies of −49.374, −37.179 and −24.518 kJ/mol, respectively. These compounds exhibited binding affinities similar or better than that of the preclinical candidate (GSK3186899), thus are worthy of further experimental validation.

Even though ZINC000095485940 was predicted to have the lowest binding energy to the LdCRK12 (−10.1 kcal/mol) by Autodock Vina, it had the highest binding free energy of 0.593 kJ/mol from the MM/PBSA computations (Table 11) , thereby potentially limiting its lead-likeness. In a previous study, compounds with high binding free energies have been shown to demonstrate inhibitory activity against receptors due to their very low electrostatic energies and very high polar energies [108] . ZINC000095485940 demonstrated high polar solvation energy of 136.331 kJ/mol and electrostatic energy of −29.485 kJ/mol (Table 11 ). Previous studies have reported that electrostatic and van der Waals forces contribute predominantly and continuously to the binding energy along with simulations that favored the binding of complexes [26, 109] . All compounds demonstrated very low van der Waal's energies, ranging from −84.419 kJ/mol to −138.191 kJ/mol (Table 11 ).

The MM/PBSA method can be used to calculate free binding energies by per-residue decomposition. This involves the decomposition of each residue by including the interactions in which each residue is involved. These provide useful insight into important interactions of key residues in free energy contribution. Residues contributing binding free energy greater than 5 kJ/mol or less than −5 kJ/mol are worth considering as key residues for the binding of a ligand to a protein [110] . The per-residue energy decomposition computations for each complex were performed (Figures 8 and S4A-G) . From the protein-ligand interactions, residues Leu465, Ser466, Thr469, Ala486, Lys488, Ala568, Ser569, Asp612, Asn613, Leu615, and Asp626 were considered as key residues for ligand binding in the ATP binding site (Section 3.6). From the MM/PBSA per residue decomposition computations for the LdCRK12-GSK3186899 complex, it was observed that only Lys488 and Arg575 contributed individual energies beyond the ±5 kJ/mol threshold with energy values of 10.1287 and 5.8145 kJ/mol, respectively ( Figure S4B ). For the LdCRK12-NANPDB1406 complex, Val473, Lys488 and Leu615 contributed energies of −5.0135, 14.2430, and −7.2060 kJ/mol, respectively (Figure 8 ). Only Lys488 was observed to contribute individual energy above the ±5 kJ/mol threshold with values of 7.8042 and 13.3733 kJ/mol in the LdCRK12-NANPDB1649 and LdCRK12-NANPDB2581 complexes, respectively ( Figure S4D,E) . Additionally, Asp612 was the only residue that contributed individual energy beyond the ±5 kJ/mol with an energy value of 5.4536 kJ/mol in the LdCRK12-NANPDB6446 complex ( Figure S4F ). For the LdCRK12-ZINC000095485940 complex, Lys488 and Asp626 contributed 17.8578 and 9.9136 kJ/mol, respectively ( Figure S4G ). From the per-residue energy decomposition computations, it is suggested that Lys488 is a very crucial residue for ligand binding in the ATP binding site, which warrants further experimental validation to determine its role.

For the ligand binding in pocket 14, residues Leu723, Gly724, Pro725, Leu726, Pro727, Pro728, Val731, Leu743, Asn763, Trp764, and Gln815 were identified as key. From the MM/PBSA per residue decomposition of LdCRK12-compound 8, Pro728 and Trp764 were observed to contribute energies beyond the ±5 kJ/mol threshold with individual energy values of −6.7576 and −6.5709 kJ/mol, respectively ( Figure S4C ). No residue was observed to contribute energy beyond the ±5 kJ/mol threshold in the LdCRK12-compound 5 complex ( Figure S4A ).

This study modelled a reasonable structure of LdCRK12 with good quality parameters which has been made available to the scientific community to enrich work on structurebased drug discovery. Additionally, small molecules with the potential to inhibit the activity of LdCRK12 were identified, which could serve as the building blocks for the design of novel biotherapeutics. The study further proposed suitable molecules with negligible toxicity. Since the study is entirely computational, making available structures and compounds enable synthesis and screening to ascertain their potency as antileishmanial molecules. These predicted compounds can help stimulate the pace of searching for effective antileishmanial drugs globally.

In order to identify polypharmacological agents against leishmaniasis, it warrants investigating the inhibitory potential of the identified biomolecules against other CDC-2-related kinases of Leishmania, especially CRK3 and CRK6 [111] . CRK3 is essential for cell cycle progression and growth in Leishmania mexicana [112, 113] , while the role of CRK6 remains unclear [113, 114] , it has accessory functions in the cell cycle in T. brucei [114] .

Natural products have shown the potential to be repurposed as effective L. donovani CRK12 inhibitors. This study sought to identify potential Leishmania inhibitors from the African flora by targeting the LdCRK12. The study identified four potential bioactive compounds comprising NANPDB1406, NANPDB2581, NANPDB6446 and NANPDB1649 with binding affinities of −9.5, −9.2, −9.1 and −8.5 kcal/mol, respectively. NANPDB1406, NANPDB2581 and NANPDB6446 demonstrated higher binding affinities than the preclinical compound (GSK3186899) which had the binding energy of −8.5 kcal/mol [20] . This study suggests Lys488 as a very crucial residue for ligand binding in the ATP binding site. MD simulations, including MM/PBSA, corroborated the potential inhibition of LdCRK12 by the compounds. Physiochemical and toxicological profiling predicted the compounds to be drug-like and have insignificant toxicity concerns. Ligand quality metrics comprising inhibitory constant (Ki), ligand efficiency (LE), fit quality (FQ), LE scale (LE_scale), and LE-dependent lipophilicity (LELP) also indicated that the potential antileishmanial compounds could serve as templates for fragment-based drug design for Leishmania inhibitors. The predicted Ki values of the potential drug candidates ranged from 0.108 to 0.587 µM. Furthermore, the molecules were predicted as antileishmanial molecules, necessitating experimental evaluation to corroborate their bioactivity.

Supplementary Materials: The following are available online at https://www.mdpi.com/2218-2 73X/11/3/458/s1, Figure S1 : ERRAT error plots of the selected models: (A) ERRAT error plot for MOD5, (B) ITAS5, and (C) ROB1. Red bars represent the misfolded regions, yellow bars demonstrate the error region between 95% and 99%, and green bars indicate the region with a lower error rate for protein folding, Figure S2 Table S1 : The binding energies and intermolecular bonds between LdCRK12 and compounds, Table S2 : ADME Prediction of top 19 hits and known drugs for Gastrointestinal (GI); Blood Brain Barrier (BBB); Estimated Solubility (ESOL) class, P-glycoprotein (Pgp) and TPSA, Table S3 : Toxicological profiles of the 17 hits and the known drugs, Table S4 : Predicted biology activity of the lead compounds using Prediction of Activity Spectra for Substances (PASS). Pa and Pi represent probable activity and probable inactivity, respectively. Supplementary_file_1: 3D model of the LdCRK12 generated using Modeller (MOD5). Supplementary_file_2: The 3D model of the LdCRK12 generated using I-TASSER (ITAS5). Supplementary_file_3: The 3D model of the LdCRK12 generated using Robetta (ROB1). 

Leishmaniasis: A review

Incidence and Trends of Leishmaniasis and Its Risk Factors in Humera, Western Tigray

Deciphering the Leishmania exoproteome: What we know and what we can learn

Transmission of Leishmania metacyclic promastigotes by phlebotomine sand flies

Phlebotomine sand flies and Leishmania parasites: Friends or foes? Trends Parasitol

Leishmania development in sand flies: Parasite-vector interactions overview

Concomitant infection with leishmania donovani and L. major in single ulcers of cutaneous leishmaniasis patients from sudan

Protozoan Diseases: Leishmaniasis

The relationship between leishmaniasis and AIDS: The second 10 years

Prevalence and risk factors associated with Leishmania infection in Trang Province, southern Thailand

Report: First coinfection report of mixed leishmania infantum/leishmania major and human immunodeficiency virus-acquired immune deficiency syndrome: Report of a case of disseminated cutaneous Leishmaniasis in Iran

Serosurvey and factors associated with Leishmania donovani infection in febrile HIV infected individuals attending Abuja Teaching Hospital

The poorest of the poor: A poverty appraisal of households affected by visceral leishmaniasis in Bihar

Neglected tropical diseases in sub-Saharan Africa: Review of their prevalence, distribution, and disease burden

Social and economic burden of human leishmaniasis

Combination therapy for visceral leishmaniasis

Drug candidate and target for leishmaniasis

Cyclin-dependent kinase 12 is a drug target for visceral leishmaniasis

Identification and Functional Characterisation of CRK12:CYC9, a Novel Cyclin-Dependent Kinase (CDK)-Cyclin Complex in Trypanosoma brucei

Systematic functional analysis of Leishmania protein kinases identifies regulators of differentiation and survival

Cyclin-Dependent Kinase CRK9, Required for Spliced Leader trans Splicing of Pre-mRNA in Trypanosomes, Functions in a Complex with a New L-Type Cyclin and a Kinetoplastid-Specific Protein

African indigenous plants with chemotherapeutic potentials and biotechnological approach to the production of bioactive prophylactic agents

G-mmpbsa-A Gromacs tool for high-throughput MM-PBSA calculations

MMPBSA decomposition of the binding energy throughout a molecular dynamics simulation of amyloid-beta (Aß10-35) aggregation. Molecules

PASS: Prediction of activity spectra for biologically active substances

PASS biological activity spectrum predictions in the enhanced open NCI Database Browser

MODELLER: Generation and Refinement of Homology-Based Protein Structure Models

Comparative Protein Structure Modeling Using Modeller

New development for protein structure and function predictions

I-TASSER server for protein 3D structure prediction

A unified platform for automated protein structure and function prediction

The I-TASSER Suite: Protein structure and function prediction

Structure prediction for CASP8 with all-atom refinement using Rosetta

High-resolution comparative modeling with RosettaCM

Protein structure prediction and analysis using the Robetta server

The UniProt Consortium UniProt: A hub for protein information

The Universal Protein Resource (UniProt)

UniProt Knowledgebase: A hub of integrated protein data

The SWISS-MODEL Repository-new features and functionality

EasyModeller: A graphical interface to MODELLER

Statistical potential for assessment and prediction of protein structures

PROCHECK: A program to check the stereochemical quality of protein structures

ProSA-web: Interactive web service for the recognition of errors in three-dimensional structures of proteins

Recognition of errors in three-dimensional structures of proteins

Computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues

Computed Atlas of Surface Topography of proteins

UCSF Chimera-A visualization system for exploratory research and analysis

Introduction to PyMOL

Pymol: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr

NANPDB: A Resource for Natural Products from Northern African Sources

AfroDb: A Select Highly Potent and Diverse Natural Product Library from African Medicinal Plants

AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, EfficientOptimization, and Multithreading

Analysis of HIV wild-type and mutant structures via in silico docking against diverse ligand libraries

Multiple Ligand À Protein Interaction Diagrams for Drug Discovery

SwissADME: A free web tool to evaluate pharmacokinetics, druglikeness and medicinal chemistry friendliness of small molecules

DataWarrior: An open-source program for chemistry aware data visualization and analysis

Pharmacoinformatics-based identification of potential bioactive compounds against Ebola virus protein VP24

Identification of promising anti-DNA gyrase antibacterial compounds using de novo design, molecular docking and molecular dynamics studies

GROMACS: Fast, flexible, and free

Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

Version 5.1.19.; Center for Coastal and Land-Margin Research

R: A Language and Environment for Statistical Computing. R Found

Homology modeling in drug discovery: Overview, current applications, and future perspectives

Comparison of common homology modeling algorithms: Application of user-defined alignments

A comparative study of available software for high-accuracy homology modeling: From sequence alignments to structural models

Structural and Population-Based Evaluations of TBC1D1 p

Modeling of the human rhinovirus C capsid suggests a novel topography with insights on receptor preference and immunogenicity

Deciphering the molecular recognition mechanism of multidrug resistance staphylococcus aureus nora efflux pump using a supervised molecular dynamics approach

Where did the BLOSUM62 alignment score matrix come from?

Statistical potentials for fold assessment

cis-9-Hexadecenal, a Natural Compound Targeting Cell Wall Organization, Critical Growth Factor, and Virulence of Aspergillus fumigatus

How Proteins Work

Active Sites and their Chemical Properties

CASTp 3.0: Computed atlas of surface topography of proteins

Measuring proteins and voids in proteins

Cheminformatics-Based Identification of Potential Novel Anti-SARS-CoV-2 Natural Compounds of African Origin

Specific plant terpenoids and lignoids possess potent antiviral activities against severe acute respiratory syndrome coronavirus

Novel Computational Approach to Predict Off-Target Interactions for Small Molecules

Building the process-drug-side effect network to discover the relationship between biological Processes and side effects

Comparative structural and functional studies of 4-(thiazol-5-yl)-2-(phenylamino)pyrimidine-5-carbonitrile CDK9 inhibitors suggest the basis for isotype selectivity

Substituted 4-(thiazol-5-yl)-2-(phenylamino)pyrimidines are highly active CDK9 inhibitors: Synthesis, X-ray crystal structures, structure-activity relationship, and anticancer activities

Identification of Atuveciclib (BAY 1143572), the First Highly Selective, Clinical PTEFb/CDK9 Inhibitor for the Treatment of Cancer

The CDK9 tail determines the reaction pathway of positive transcription elongation factor b

Molecular Properties That Influence the Oral Bioavailability of Drug Candidates

Integrated computational approach for virtual hit identification against ebola viral proteins VP35 and VP40

Computational prediction of blood-brain barrier permeability using decision tree induction

Role of P-glycoprotein in pharmacokinetics: Clinical implications

Anticancer Compounds as Leishmanicidal Drugs: Challenges in Chemotherapy and Future Perspectives

Activity of anti-cancer protein kinase inhibitors against Leishmania spp

Antileishmanial activity of compounds isolated from sassafras albidum

In vitro antileishmanial activity of diphyllin isolated from Haplophyllum bucharicum

Antifungal, antiprotozoal, cytotoxic and piscicidal properties of justicidin B and a new arylnaphthalide lignan from Phyllanthus piscatorum

Group Additivity in Ligand Binding Affinity: An Alternative Approach to Ligand Efficiency

Ligand efficiency: A useful metric for lead selection

Ligand efficiency as a guide in fragment hit selection and optimization

Ligand efficiency based approach for efficient virtual screening of compound libraries

Ligand binding efficiency: Trends, physical basis, and implications

The influence of lead discovery strategies on the properties of drug candidates

Molecular dynamics: Survey of methods for simulating the activity of proteins

Ivanov, I. Molecular dynamics

Ligand-based and structure-based investigation for Alzheimer's disease from traditional Chinese medicine

Classification of VUS and unclassified variants in BRCA1 BRCT repeats by molecular dynamics simulation

Structural flexibility and protein adaptation to temperature: Molecular dynamics analysis of malate dehydrogenases of marine molluscs

Advances in free-energy-based simulations of protein folding and ligand binding

Molecular dynamics-driven drug discovery: Leaping forward with confidence

MM-PBSA and per-residue decomposition energy studies on 7-Phenyl-imidazoquinolin-4(5H)-one derivatives: Identification of crucial site points at microsomal prostaglandin E synthase-1 (mPGES-1) active site

Elucidating the energetics of entropically driven protein-ligand association: Calculations of absolute binding free energy and entropy

Molecular docking and dynamics simulation studies predict munc18b as a target of mycolactone: A plausible mechanism for granule exocytosis impairment in Buruli Ulcer Pathogenesis

In silico methods to address polypharmacology: Current status, applications and future perspectives

The CRK3 protein kinase is essential for cell cycle progression of Leishmania mexicana

Protein kinases as drug targets in trypanosomes and Leishmania

Pairwise knockdowns of cdc2-related kinases (CRKs) in Trypanosoma brucei identified the CRKs for G1/S and G2/M transitions and demonstrated distinctive cytokinetic regulations between two developmental stages of the organism

The authors are grateful to the West African Center for Cell Biology of Infectious Pathogens (WACCBIP) at the University of Ghana for the use of Zuputo, a Dell EMC high performance-computing cluster for this study.

The authors declare no conflict of interest.