key: cord-0002053-fw1cofsq authors: Mondal, Shakhinur Islam; Ferdous, Sabiha; Jewel, Nurnabi Azad; Akter, Arzuba; Mahmud, Zabed; Islam, Md Muzahidul; Afrin, Tanzila; Karim, Nurul title: Identification of potential drug targets by subtractive genome analysis of Escherichia coli O157:H7: an in silico approach date: 2015-12-08 journal: Adv Appl Bioinform Chem DOI: 10.2147/aabc.s88522 sha: 53f9b02ce4cc149507a9d402dcf9d4c191b3bb29 doc_id: 2053 cord_uid: fw1cofsq Bacterial enteric infections resulting in diarrhea, dysentery, or enteric fever constitute a huge public health problem, with more than a billion episodes of disease annually in developing and developed countries. In this study, the deadly agent of hemorrhagic diarrhea and hemolytic uremic syndrome, Escherichia coli O157:H7 was investigated with extensive computational approaches aimed at identifying novel and broad-spectrum antibiotic targets. A systematic in silico workflow consisting of comparative genomics, metabolic pathways analysis, and additional drug prioritizing parameters was used to identify novel drug targets that were essential for the pathogen’s survival but absent in its human host. Comparative genomic analysis of Kyoto Encyclopedia of Genes and Genomes annotated metabolic pathways identified 350 putative target proteins in E. coli O157:H7 which showed no similarity to human proteins. Further bio-informatic approaches including prediction of subcellular localization, calculation of molecular weight, and web-based investigation of 3D structural characteristics greatly aided in filtering the potential drug targets from 350 to 120. Ultimately, 44 non-homologous essential proteins of E. coli O157:H7 were prioritized and proved to have the eligibility to become novel broad-spectrum antibiotic targets and DNA polymerase III alpha (dnaE) was the top-ranked among these targets. Moreover, druggability of each of the identified drug targets was evaluated by the DrugBank database. In addition, 3D structure of the dnaE was modeled and explored further for in silico docking with ligands having potential druggability. Finally, we confirmed that the compounds N-coeleneterazine and N-(1,4-dihydro-5H-tetrazol-5-ylidene)-9-oxo-9H-xanthene-2-sulfon-amide were the most suitable ligands of dnaE and hence proposed as the potential inhibitors of this target protein. The results of this study could facilitate the discovery and release of new and effective drugs against E. coli O157:H7 and other deadly human bacterial pathogens. Enteropathogenic Escherichia coli and enterohemorrhagic E. coli (EHEC) infections in humans are a major source of morbidity and mortality in both developing and developed countries. 1 Among various pathogenic E. coli strains that cause intestinal or extra-intestinal diseases in humans, the most devastating are shiga toxins producing EHEC strains, because they cause not only diarrhea and hemorrhagic colitis but also life-threatening hemolytic uremic syndrome and encephalopathy. 2 Over 100 serotypes of shiga toxin-producing E. coli have been associated with human infections and the most common serotype is E. coli O157:H7. Several deadly outbreaks of E. coli O157:H7 were reported in Canada, United States, Great Britain, and Japan. [3] [4] [5] submit your manuscript | www.dovepress.com Mondal et al However, the massive outbreak in Sakai city, Japan, in 1996 is of great concern as a number of deaths from the infection were reported. 6 The genome of EHEC O157:H7 Sakai strain was sequenced in 2001. 7 The sequence analysis revealed that this strain contains 18 prophages (Sp1 to Sp18), six prophage-like elements (SpLE1 to SpLE6), and two plasmids (pO157 and pOSAKl). To know the mechanism underlying pathogenicity of this bacterium, a substantial number of virulence-related genes or functions associated with various stages of infection have been identified. 7 However, lack of details of functional annotations often limit the possibility to use them as targets for designing new drugs against this pathogen. The treatment of E. coli O157:H7 mostly relies on conventional antibiotic therapy although some studies have highlighted that there is no evidence that this improves the course of disease and antibiotic treatment of patients with E. coli O157:H7 infection increases the risk of hemolytic-uremic syndrome. 8, 9 Moreover, an increase of antibiotic resistance has been reported in E. coli O157:H7 over the last 30 years which is also alarming. [10] [11] [12] [13] [14] The accumulated results strongly suggest that there is an urgent and continuing need to find new drug and vaccine candidates to tackle this deadly pathogen. Drug target identification is the first step in the drug discovery process. 15 However, traditional drug discovery methods are time-consuming, expensive, and often yield few drug targets. In contrast, advances in complete genome sequencing, bioinformatics, and cheminformatics represent an attractive alternative approach to identify drug targets worthy of experimental follow-up. Because of the availability of both pathogen and host-genome sequences, it has become easier to identify drug targets at the genomic level for any given pathogen. 16, 17 In recent years, computational methods have been used widely for the identification of potential drug and vaccine targets in different pathogenic microorganisms. [18] [19] [20] [21] Subtractive and comparative genomics approach combined with metabolic pathway analysis was found to be an efficient way to identify the protein-set essential for the pathogen's survival but absent in the host. 22 Subtraction of the host genome from essential genes of pathogens helps in searching for non-human homologous targets which ensures no interaction of drugs with human targets. On the other hand, comparative genomics method emphasizes the selection of conserved proteins amongst several species as most favorable targets. [23] [24] [25] [26] The use of advanced bioinformatics tools with integrated genomics, proteomics, and metabolomics may ensure the discovery of potential drug targets for most of the infectious diseases. Once the target(s) have been identified, the in silico virtual screening of different chemical databases could provide unprecedented opportunity to select and design the best possible inhibitor(s). 27 In this study, we took an in-depth in silico approach to identify novel therapeutic targets in E. coli O157:H7 Sakai strain by combining analysis of metabolomics and genomics data. Instead of analysis of whole genome, we particularly considered the key essential or survival proteins of the pathogen which are non-homologous to the host. We elucidated a good number of novel targets in E. coli O157:H7 to design effective drugs against broad-spectrum pathogenic bacteria. Moreover, we provided a modeled 3D structure of DNA polymerase III alpha (dnaE) which was selected as the best possible target for inhibition and designing potential drugs. To the best of our knowledge this was the first in silico identification of drug targets in E. coli O157:H7. Pathway analysis and protein retrieval Figure 1 showed the strategies for identification of suitable drug targets and prediction of inhibitor used. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database 28, 29 was searched for metabolic pathways for both human genomes and E. coli O157:H7 Sakai strain. The identification numbers of all pathways from both organisms were listed. A manual comparison was made, and pathways that did not appear in the human genomes but were present in the pathogen, according to the KEGG database annotations, were selected as unique to E. coli O157:H7, while the remaining pathways were listed as common. Amino acid sequences of proteins from common and unique pathways were obtained from UniprotKB. 30 Identification of essential proteins and non-homologous proteins in humans Database of Essential Genes (DEG) 31 was used to identify the essential proteins involved in host-pathogen pathways. The DEG 6.8 database was retrieved from http://www.tubic. tju.edu.cn/deg/. The BioEdit Sequence Alignment Editor (version 7.1.3) was used for Protein Basic Local Alignment Search Tool (BLASTP) search to screen for and eliminate the probable essential proteins of the organism setting e-value cut off 10 −4 , sequence identity .35%, bit score .100 and others as default. All human protein sequences were retrieved from Refseq database ftp site (ftp://ftp.ncbi.nlm.nih.gov/genomes/H_ sapiens/protein/) and essential proteins were subjected to BLASTP search against the human proteins with BioEdit. Only the non-hit proteins at e-value cut off 10 −10 , were Drug target identification by subtractive genome analysis approach selected as non-homologous proteins to avoid any functional similarity with host proteome. Subcellular localization prediction of the essential non-human proteins was done by PSORTb version 3.0.2, 32 which predicts three types of localization such as: cytoplasmic, membrane, and extracellular proteins for Gram-negative bacteria. The potential drug targets were evaluated by several molecular and structural criteria 33 Mondal et al the 3D ligand binding site of any human protein structure on the web server SMAP-WS at a cut off value of 30% sequence identity. 35 Moreover, druggability is another important prioritization criterion for therapeutic targets; that is defined as the likelihood of being able to modulate the activity of the therapeutic target protein with a small-molecule drug. 36, 37 The druggability of identified drug targets was measured by mining DrugBank contents. BLASTP with default parameters was performed against the list of targets of compounds found within DrugBank to align the potential drug targets from E. coli O157:H7. Alignments with e-values less significant than 10 −25 were removed as described previously as selection criteria for filtering BLAST results in identifying drug targets of bacterial genomes. 38 Identification of novel targets and searching for common proteins To identify novel targets among the potential targets, databases DrugBank, SuperTarget, and Therapeutic Target Database, were searched for similarity with the cytoplasmic proteins. [39] [40] [41] Parameters were set as e-value ,10 −5 , sequence identity .35%, and bit score .100. The non-hit proteins at the threshold value were selected as novel drug targets. To search for the common proteins amongst pathogenic bacteria, all protein sequences of 73 different strains of pathogenic bacteria were retrieved from PATRIC database. 42 The novel targets were subjected to BLASTP against these proteomes at e-value cut off 10 −5 , sequence identity .35%, bit score .100 with BioEdit software. The proteins that were found to be common in at least 40 pathogenic strain proteomes were listed as broad-spectrum targets. Different bacterial species were used as references. As no exact protein data bank (PDB) structure was available for dnaE in PDB, it was subjected to BLAST search against PDB structures using 0.001 e-value cut off. The template for homology modeling was chosen considering X-ray diffraction resolution and highest sequence similarity. Homology modeling was done on ESyPred3D server. 43 The modeled structure was assessed through SWISS-MODEL structure assessment tool 44 and ANOLEA (atomic non-local environment assessment) 45 assessing the packing quality of the models. PROCHECK suite of programs 46 checked the stereochemical quality of protein structures. Energy minimization was carried out by GROMOS96 with default parameters implemented in Swiss PDB Viewer (version 4.0.4). 47 Active site of the modeled structure was determined by CASTp server. 48 Virtual screening, drug likeliness, and toxicity analysis For visual analysis and comparison of the active site interaction with the ligands, 24 ligand molecules of E. coli K-12 dnaE subunit were extracted from BindingDB database and docked with the subject receptor. 49 Virtual screening was done with a total 6,460 molecules, 5,040 experimental and 1,447 approved molecules deposited in DrugBank, based on selected active sites into the dnaE. Virtual screening was performed on Linux (Ubuntu 10.04) based cluster with 32 core systems. The top 100 molecules were selected based on lowest binding energy after being docked several times. The selected molecules were analyzed by Lipinski's rule of five. The ligand interaction analysis and visualization was done with the help of Pymol and Discovery Studio (Accelrys, San Diego, CA, USA). Absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction was carried out with PreADMET server. PreADMET predicts mutagenicity and carcinogenicity of a compound and helps to avoid toxic compound. Oral bioavailability was predicted with FAF-Drugs2 program of Mobyle@ RPBS server. 50 Here we report the first computational comparative and subtractive genomics analysis of different metabolic pathways from E. coli O157:H7, for the identification of potential drug targets. A systematic workflow was defined involving several bioinformatics tools, databases, and drug target prioritization parameters ( Figure 1) , with the goal of obtaining information about proteins that were involved in various metabolic pathways of E. coli O157:H7, but absent in its host, therefore avoiding any potential side effects. When we searched the KEGG database for pathogen metabolic pathways, a total of 105 different pathways appeared. In order to identify drug targets involved in pathogen-specific metabolic pathways, comparative analysis of the metabolic pathways of the host and pathogen was performed. Detailed pathways analyses revealed that a total of 35 pathways were present only in the pathogen and termed as pathogen-specific pathways, and the remaining 70 pathogen pathways were defined as common host-pathogen pathways as listed in Table 1 . Pentose phosphate pathway ecs00920 Sulfur metabolism ecs00040 Pentose and glucuronate interconversions ecs00970 Aminoacyl-tRNA biosynthesis ecs00051 Fructose and mannose metabolism ecs01040 Biosynthesis of unsaturated fatty acids ecs00052 Galactose metabolism ecs02010 ABC transporters ecs00053 Ascorbate and aldarate metabolism ecs03010 Ribosome ecs00061 Fatty acid biosynthesis ecs03018 RNA degradation ecs00071 Fatty acid metabolism ecs03020 RNA polymerase ecs00130 Ubiquinone and other terpenoid-quinone biosynthesis ecs03030 DNA replication ecs00190 Oxidative phosphorylation ecs03060 Protein export ecs00230 Purine metabolism ecs03410 Base excision repair ecs00240 Pyrimidine metabolism ecs03420 Nucleotide excision repair ecs00250 Alanine, aspartate and glutamate metabolism ecs03430 Mismatch repair ecs00260 Glycine, serine and threonine metabolism ecs03440 Homologous recombination ecs00270 Cysteine and methionine metabolism ecs04122 Sulfur relay system ecs00280 Valine, leucine and isoleucine degradation ecs00561 Glycerolipid metabolism ecs00290 Valine, leucine and isoleucine biosynthesis ecs00562 Inositol phosphate metabolism ecs00300 Lysine biosynthesis ecs00564 Glycerophospholipid metabolism ecs00310 Lysine degradation ecs00590 Arachidonic acid metabolism ecs00330 Arginine and proline metabolism ecs00592 alpha-Linolenic acid metabolism ecs00340 Histidine metabolism ecs00600 Sphingolipid metabolism ecs00350 Tyrosine metabolism ecs00620 Pyruvate metabolism ecs00360 Phenylalanine metabolism ecs00630 Glyoxylate and dicarboxylate metabolism ecs00380 Tryptophan metabolism ecs00640 Propanoate metabolism ecs00400 Phenylalanine, tyrosine and tryptophan biosynthesis ecs00650 Butanoate metabolism ecs00410 beta-Alanine metabolism ecs00670 One carbon pool by folate ecs00430 Taurine and hypotaurine metabolism ecs00730 Thiamine metabolism ecs00450 Selenocompound metabolism ecs00740 Riboflavin metabolism ecs00460 Cyanoamino acid metabolism ecs00750 Vitamin B6 metabolism ecs00471 d-Glutamine and d-glutamate metabolism ecs00760 Nicotinate and nicotinamide metabolism ecs00480 Glutathione metabolism ecs00770 Pantothenate and CoA biosynthesis ecs00500 Starch and sucrose metabolism ecs00780 Biotin metabolism ecs00511 Other glycan degradation ecs00785 Lipoic acid metabolism ecs00520 Amino sugar and nucleotide sugar metabolism ecs00790 Folate biosynthesis Pathogen-specific pathway ecs00281 Geraniol degradation ecs00627 Aminobenzoate degradation ecs00361 Chlorocyclohexane and chlorobenzene degradation ecs00633 Nitrotoluene degradation ecs00362 Benzoate degradation ecs00642 Ethylbenzene degradation ecs00363 Bisphenol degradation ecs00660 C5-Branched dibasic acid metabolism ecs00364 Fluorobenzoate degradation ecs00680 Methane metabolism ecs00401 Novobiocin biosynthesis ecs00903 Limonene and pinene degradation ecs00440 Phosphonate and phosphinate metabolism ecs00910 Streptomycin biosynthesis ecs01053 Biosynthesis of siderophore group non-ribosomal peptides ecs00523 Polyketide sugar unit biosynthesis ecs01110 Biosynthesis of secondary metabolites ecs00540 Lipopolysaccharide biosynthesis ecs01120 Microbial metabolism in diverse environments ecs00550 Peptidoglycan biosynthesis ecs02020 Two-component system ecs00621 Dioxin degradation ecs02030 Bacterial chemotaxis ecs00622 Xylene degradation ecs02040 Flagellar assembly ecs00623 Toluene degradation ecs02060 Phosphotransferase system ecs00624 Polycyclic aromatic hydrocarbon degradation ecs03070 Bacterial secretion system ecs00625 Chloroalkane and chloroalkene degradation ecs05130 Pathogenic Escherichia coli infection ecs00626 Naphthalene degradation To be an effective drug target, the protein should be crucial for the survival of pathogen in the host body but non-homologous to human proteins and this criterion is a prerequisite for avoiding cross binding of drugs with human proteins, and drug side effect probability. 51 Unique pathways are those that are specific to the pathogen but absent in its host. Proteins in these pathways can also be considered as unique to the pathogen and might serve as potential drug and vaccine targets. 34 Moreover, several unique proteins are known to be present in common pathways as identified during our analysis of E. coli O157:H7 (data not shown) and in several previous studies on other bacteria. 19, 20, 52 Additionally, we also identified that a single unique protein can also take part in multiple pathways. Proteins that are involved in more than one pathway could be more effective drug targets when, in addition, they are non-homologous proteins. Nevertheless, being unique or non-human and involved in metabolic pathways are not the sole criteria for selecting favorable drug targets. It is possible that a bacterial protein showing no similarity to host proteins might be involved in multiple metabolic pathways, but its disruption might be of no therapeutic benefit. The reasons may include presence of paralogs, isoenzymes, and most importantly, being nonessential for the pathogen's survival. Again, not all essential proteins are non-homologous in nature. Therefore, pathogen proteins that fulfill the criteria of being unique and essential at the same time represent more attractive drug targets. 34 Based on the criteria discussed above, we identified 780 probable essential proteins from host-pathogen common pathways, and 234 from pathogen unique pathways (Supplementary materials). These proteins showed good similarity with the experimentally proven essential proteins recorded in DEG database. BLASTP search against human proteome narrowed down the target proteome to only 220 and 130 proteins from common pathways and unique pathways respectively resulting in 350 proteins which were essential for the pathogen's survival and non-homologous to the host (Supplementary materials). Localization of the proteins in the cell is an important factor for identification of suitable and effective drug targets. Membrane localized proteins are difficult to purify and assay 53 and therefore, cytoplasmic proteins are more favorable as drug targets. Other major factors are: accessibility value of a target protein; preferably low MW (,100 kDa); whether a potential drug is a transmembrane protein; and availability of 3D structural information. 20 Based on these essential features, the identified non-homologous essential proteins of E. coli O157:H7 were further characterized. Most of the proteins had MW less than 100 kDa indicating the possibility to experimentally study these proteins for drug development. From the common pathways, 152 proteins were found to be cytoplasmic, 54 proteins to be membrane localized, and the other 14 proteins to be of unknown localization. From pathogen unique pathways, 70, 52, and eight proteins were found to be cytoplasmic, membrane localized, and of unknown localization respectively (Supplementary materials). Based on these results, 152 and 70 cytoplasmic proteins from common pathways and unique pathways respectively were considered for further analysis to identify suitable drug targets ( Figure 2) . The DrugBank is a unique bioinformatics and cheminformatics resource which combines detailed drug data with comprehensive information about drug targets. By utilizing the DrugBank database, druggability of non-homologous essential proteins of E. coli O157:H7 was measured by sequence similarity to the targets of small-molecule drugs and BLASTP search was performed to align the non-homologous essential proteins to the list of drug-targeted proteins from DrugBank. A total of 129 proteins of E. coli O157:H7 showed high similarities with the binding partners of US Food and Drug Administration (FDA)-approved drugs, experimental small-molecule compounds, or nutraceutical compounds supporting the potential of comparative genomics in drug discovery. Among these, 80 proteins and 49 proteins were from common and unique pathways respectively (Supplementary materials). By this comparison with drug-targeted proteins additionally a list of approved drug and drug-like compounds was identified that bind to proteins with similar sequences to those of E. coli O157:H7. It is reasonable that careful filtering Drug target identification by subtractive genome analysis approach of this set could reveal a number of potential compounds that were primed for optimization and derivatization using traditional medicinal chemistry although protein sequence similarity does not guarantee identical structures or binding pockets. 38 We searched the presence of 3D structures of the non-host essential proteins of E. coli O157:H7. Such information could greatly facilitate a structure-based drug design, including homology modeling, docking, virtual screening or pharmacophore-based screening. 54 The PDB and ModBase were used as sources for the 3D structural information. Out of 350 non-host essential proteins, ten proteins were identified as having experimentally determined 3D structures in PDB and 312 were found to have 3D models in ModBase (Supplementary materials). Proteins which showed significant similarity with the databases were discarded and the remaining protein sequences were taken as novel targets. Forty-four proteins from pathogen unique pathways and 76 proteins from common pathways totaling 120 proteins were defined as novel proteins (Supplementary materials). Metabolic pathway analysis indicated that these 120 proteins were involved in 12 biological processes unique to pathogens and 49 biological processes that were common in both host and pathogen. Moreover, all of these 61 biological processes were classified into 12 classes: amino acid metabolism, carbohydrate metabolism, energy metabolism, glycan biosynthesis and metabolism, lipid metabolism, metabolism of cofactors and vitamins, metabolism of other amino acids, nucleotide metabolism, genetic information processing, environmental information processing, cellular processes, and others (Supplementary materials). Figure 3 showed the percentage distribution of novel drug targets involved in different biological process. Our study revealed that 44 proteins were uniquely involved in pathogen-specific 12 unique pathways and these were lipopolysaccharide (LPS) biosynthesis, peptidoglycan biosynthesis, methane metabolism, C5-branched dibasic acid metabolism, nitrogen metabolism, phosphonate and phosphinate metabolism, bacterial secretion system, phosphotransferase system (PTS), flagellar assembly, two-component system, biosynthesis of siderophore group non-ribosomal peptides, and bacterial chemotaxis. Three enzymes of LPS core biosynthesis and three enzymes of lipid A pathway were found uniquely present in LPS biosynthesis pathway (KEGG Pathway: map00540). In LPS of the Enterobacteriaceae, the core oligosaccharide is responsible for many of the biological properties of the antigenic O-polysaccharide. 55 On the other hand, enzymes of lipid A pathway required for bacterial growth could be excellent targets for the development of new antibiotics. 56 Moreover, murF; EC 6.3.2.10 was the only enzyme present in peptidoglycan biosynthesis (KEGG Pathway: map00550). In Gram-positive bacteria, the cell wall composed of peptidoglycan macromolecules and many surface proteins of Gram-positive bacteria is thought to be important for survival within an infected host. 57 Advances and Applications in Bioinformatics and Chemistry 2015:8 submit your manuscript | www.dovepress.com In methane metabolism (KEGG Pathway: map00680), AckA; EC 2.7.2.1 was found to be uniquely present. Methanotrophs involved in the global methane cycle consume methane as their sole source of carbon and energy for growing 58 whereas methanogens can obtain energy for growth by converting a limited number of substrates to methane. 59 The protein ilvH, acetolactate synthase III small subunit, is a unique protein present in C5-branched dibasic acid metabolism (KEGG Pathway: map00660) which provides alternative sources of carbon and energy. 60 There was a unique presence of four proteins in nitrogen metabolism (KEEG Pathway: map00910). In the nitrogen cycle, different reductive or oxidative reactions are utilized by prokaryotes for energy conservation. 61 In case of oxygen deprived growth conditions for E. coli, anaerobic respiration nitrate is the preferred electron acceptor. 62 The putative resistance protein was identified as unique protein present in phosphonate and phosphinate metabolism (KEGG Pathway: map00440). Natural products containing carbon-phosphorous bonds, so-called C-P compounds have been found in many organisms, but only protists and bacteria, mostly actinobacteria, have biosynthetic capacity. Moreover, the secB; protein-export protein secB and etpN; type 4 prepilin-like proteins leader peptide-processing enzyme, that are the parts of bacterial secretion system (KEGG Pathway: map03070), were found as unique. Many proteins implicated in efflux of different toxins and drugs; virulence and biogenesis of different organelles (pili and flagella) are secreted. 63 The secB is involved in efficient export of proteins across the cytoplasmic membrane in E. coli. 64 In PTS (KEGG Pathway: map02060), n9 proteins were found to be uniquely present. It has been known that PTS is involved in transportation of more than 20 carbohydrates in bacteria and plays a major role in phosphorylation and uptake of carbohydrates and controlling their metabolism. 65, 66 Flagellar FliJ protein which is part of flagellar assembly (KEGG Pathway: map02040) as well as a putative general chaperone and cytoplasmic protein 67 was identified as unique in the pathogen. The bacterial flagellum extending from the cytoplasm to the cell exterior serves as both a motor organelle and a protein export/assembly apparatus. 68 A total of 19 proteins were identified as unique in the pathogen-specific pathway in a two-component system (KEGG Pathway: map02020). Bacterial two-component system is required for adaptation to external stimuli and can affect changes in cellular physiology. 69 A single protein was identified to be involved in biosynthesis of siderophore group non-ribosomal peptides pathway. Polyketide synthases and non-ribosomal polypeptide synthetases are known to be responsible for the biosynthesis of several siderophores such as enterobactin in E. coli. 70 Two proteins were detected in case of bacterial chemotaxis. In chemotaxis, bacteria sense chemical gradients in the environment and move toward favorable conditions. The pathway is arguably best characterized in the case of E. coli. 71 Novel targets in common host-pathogen pathways We also identified 76 proteins in 49 host-pathogen common metabolic pathways as novel targets. The pathways were grouped as metabolism of amino acid, carbohydrate, energy, lipid, other amino acids, nucleotide, cofactors and vitamins, and genetic information processing, environmental information processing, cellular processes, and others (Supplementary materials). Common proteins among several species would be well broad-spectrum antibiotic targets. 72 We used 73 species as reference and proteins which were common in at least 40 different species were listed as broad-spectrum targets (Supplementary materials). Forty-four proteins were identified as broad-spectrum targets (Supplementary materials). In addition, broad-spectrum proteins involved in multiple pathways would be better targets as their inhibition of activity will hamper more than one system in the pathogen. 73 dnaE subunit and AckA were involved in a maximum number of pathways. AckA catalyzes the reversible reaction of formation of acetyl phosphate from acetate and ATP. Whereas dnaE participates in some critical pathways of the pathogen like purine metabolism, pyrimidine metabolism, DNA replication, mismatch repair, and homologous recombination. There is no resolved X-ray crystallography structure for both dnaE (E. coli O157:H7 Sakai strand) and AckA (E. coli O157:H7 Sakai strand) as we intended to do homology modeling. However, dnaE subunit (Uniprot ID: Q8X8X5) was preferred over AckA (Uniprot ID: P0A6A5) based on suitability of homology modeling and docking studies. During the BLAST search against PDB structures (threshold e-value ,0.001), dnaE subunit showed 98% sequence identity with E. coli replicative dnaE subunit (PDB ID: 2HNH), whereas 94% for AckA with Salmonella enterica subspecies enterica serovar Typhimurium AckA (PDB ID: 3SK3 and 3SLC) (data not shown). Furthermore, an inter-domain motion during ligand binding as well as a ligand binding pocket located at the Advances and Applications in Bioinformatics and Chemistry 2015:8 Drug target identification by subtractive genome analysis approach dimeric interface of form-II AckA has been reported 74 that is very hard to address through in silico docking software. In case of dnaE subunit, literature searching helped us to identify the pocket for ligand binding as well as important and conserved residues within the pocket that are important for catalytic activity. 75 Moreover, we conducted BLAST searching for dnaE (Uniprot ID: Q8X8X5) with e-value cut off of 0.001 against UniprotKB. We selected the organisms whose proteins showed at least 75% sequence identity with dnaE. The organisms were searched in PATRIC database to check their host and pathogenicity. Only the human hosts were considered and predicted to be pathogenic if involved in disease(s) according to PATRIC database. 42 We found that all the organisms are pathogenic except Lelliottia, Kluyvera, Hafnia, Ewingella, Cedecea, and Yokenella. However, literature searching helped us to conclude that Kluyvera, Yokenella, Ewingella, Hafnia, and Cedecea are also involved in diseases. [76] [77] [78] [79] [80] [81] We did not find sufficient information about Lelliottia. Thus it is clearly demonstrated that E. coli O157:H7 dnaE does not share any sequence similarity with non-pathological bacterial dnaE. Therefore, due to the above advantages, dnaE was selected for homology modeling and subsequent structure-based drug designing. DNA polymerase III holo-enzyme has ten different peptides arranged in an asymmetric dimer and contains a 3′-5′ exonuclease activity. The alpha subunit is at the core enzyme and mainly functions in the polymerase activity. The best hit of similarity search identified the crystal structure of the catalytic alpha subunit of E. coli replicative dnaE DNA polymerase III (PDB ID: 2HNH). 82 It showed 99.98% sequence similarity with our target sequence and was used as the template for homology modeling. The modeled structure was shown in Figure 4 . Advancing toward the way of in silico drug design with protein models depends largely on the quality of the models. Inspection of the Psi/Phi Ramachandran plot analysis showed that the model built by ESyPred3D has residues in most favored regions 92.4%, residues in additional allowed regions 7.1%, and residues in generously allowed regions 0.3%. A good quality model would be expected to have over 90% in the most favored regions ( Figure 5A ). This structure was also verified with ANOLEA. The y-axis of the plot represents the energy for each amino acid of the protein chain and it showed that maximum residues are in favorable energy environment ( Figure 5B ). As a result the structure modeled was considered as a good quality model for further analysis. To obtain a better refined model energy minimization was done using Swiss PDB viewer. It minimizes energy using GROMOS96. The force-field energies of the overall structure before and after minimization were -13,816.546 KJ/mol and -36,224.391 KJ/mol respectively. Active site is the region on the surface of an enzyme to which a specific substrate (ligand) or set of substrates (ligands) binds. The properties of the active site are determined by the sequences of amino acids and the 3D arrangement of the polypeptide chains of the enzyme. Identification of the active site of E. coli O157:H7 Sakai strain dnaE was done by CASTp server. This server calculates the surface area and volume of the pocket of the given structure. In addition, it also shows the active site residues. The active site residues with a volume of 17,116 were shown in Figure 6 . Literature searching helped us to identify the important residues within the pocket. The structure of catalytic alpha subunit of E. coli is known to have three conserved residues of aspartate (Asp 402, Asp 404, and Asp 556 in our model, ensured by aligning with the reference structure, data not shown) and four conserved Virtual screening, drug likeliness, and toxicity analysis dnaE. 84 Information on these molecules' binding interaction was taken as reference to predict convenient inhibitors of dnaE of E. coli O157:H7 Sakai strand from the molecules of DrugBank ( Figure 7A ). The top 100 hits from DrugBank showing lower binding energies after being docked five times were filtered for Lipinski's rule of five 85 that reduced the compounds to 59. To avoid off target binding, the compounds having human targets were excluded. The top two hits showed stable or nearly stable binding energy and good clustering performances from the docking results (Table 3) and structures are shown in Figure 8 . Among the final high affinity binding molecules, DB04118 (N-coeleneterazine) and DB04698 (N-(1,4-dihydro-5H-tetrazol-5-ylidene)-9oxo-9H-xanthene-2-sulfonamide) were found to interact with important residues ( Figure 7B) and DB04698 respectively that are considered middle and low permeability respectively. Both drugs exhibit good oral bioavailability and mostly negative result in Ames test as predicted by FAF-Drugs2 and PreADMET toxicity prediction (data not shown) and no evidence of carcinogenic activity for only DB04698 as predicted by PreADMET rodent carcinogenicity prediction. We also analyzed other ADMET properties like blood-brain barrier penetration, skin permeability, MDCK cell, and P-gp inhibition, and in all of these cases results are positive for our proposed compounds (data not shown). The overall picture emerging from this study was the identification of broad-spectrum antibiotic target as well as prediction of potential inhibitors of DNA III alpha of deadly pathogen E.coli 0157:H7 using extensive in silico tools. Bacterial replicative dnaE is a member of C family of polymerases 87,88 that are unique in terms of sequence. Bacterial replicative DNA polymerase III share no sequence similarity and is strikingly different from canonical DNA polymerases including those of eukaryotic replicative polymerases. 89 It has been reported that DNA polymerases are responsible for pathogen survival and drug resistance so they have been considered to be a drug target in a broad group of Grampositive pathogens such as Staphylococcus, Streptococcus, Enterococcus, and Mycoplasma. 90 It has also been reported that DNA polymerase III inactivation would impede survival of Mycobacterium tuberculosis within the host. 91 Since bacterial evolution aims to acquire resistance against single or multiple antibiotics to ensure their survival in the environment, the development, not only of new conventional antibiotics but also of novel compounds and alternative strategies for the battle against bacterial infections, is becoming a topical and widely recognized need. In this study, a number of criteria such as essentiality of proteins, dissimilarity with host, conservation among pathogens, Table 3 Lowest docking energies, important residues of the binding site observed to be interactive with the ligands from DrugBank, percentage of human intestinal absorption and plasma protein binding, Caco-2 cell permeability and carcinogenicity in rats Drug target identification by subtractive genome analysis approach availability in drug databases, virtual screening etc, were utilized to explore potential drug targets as well as to predict a drug that can block dnaE of deadly E coli O157:H7 strain. This in silico strategy can be used for screening novel and alternative targets in a way to design and develop new drugs against other emerging human pathogens. Pathogenic Escherichia coli Virulence factors of Escherichia coli O157 and other Shiga toxin-producing E. coli A statewide outbreak of Escherichia coli O157:H7 infections in Washington State The epidemiology of infections caused by Escherichia coli O157:H7, other enterohemorrhagic E. coli, and the associated hemolytic uremic syndrome An outbreak of diarrhea and hemolytic uremic syndrome from Escherichia coli O157:H7 in freshpressed apple cider Massive outbreak of Escherichia coli O157:H7 infection in schoolchildren in Sakai City, Japan, associated with consumption of white radish sprouts Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12 Effect of subinhibitory concentrations of antibiotics on extracellular Shiga-like toxin I The risk of the hemolytic-uremic syndrome after antibiotic treatment of Escherichia coli O157:H7 infections Antimicrobial resistance of Shiga toxin (verotoxin)-producing Escherichia coli O157:H7 and non-O157 strains isolated from humans, cattle, sheep and food in Spain Antibiotic resistance of Escherichia coli O157:H7 and O157:NM isolated from animals, food, and humans Antibiotic resistance and distribution of tetracycline resistance genes in Escherichia coli O157:H7 isolates from humans and bovines Characteristics of antibioticresistant Escherichia coli O157:H7 in Washington State, 1984-1991 Antimicrobial resistance of Escherichia coli O157 isolated from humans, cattle, swine, and food Recent advances and method development for drug target identification Bacterial genome sequencing and drug discovery Deductive genomics: a functional approach to identify innovative drug targets in the post-genome era Putative drug and vaccine target protein identification using comparative genomic analysis of KEGG annotated metabolic pathways of Mycoplasma hyopneumoniae In silico identification of common putative drug targets in Leptospira interrogans In silico identification of candidate drug and vaccine targets from various pathways in Neisseria gonorrhoeae Design of an epitope-based peptide vaccine against spike protein of human coronavirus: an in silico approach Identification of potential drug targets by subtractive genome analysis of Bacillus anthracis A0248: An in silico approach Searching for drug targets in microbial genomes A genomic perspective on protein families Beyond complete genomes: from sequence to structure and function Essential Bacillus subtilis genes Virtual screening strategies in drug discovery: a critical review From genomics to chemical genomics: new developments in KEGG KEGG for representation and analysis of molecular networks involving diseases and drugs UniProt Knowledgebase: a hub of integrated protein data DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes Genomic-scale prioritization of drug targets: the TDR Targets database Comparative genomics analysis of Mycobacterium ulcerans for the identification of putative essential genes and therapeutic candidates SMAP-WS: a parallel web service for structural proteome-wide ligand-binding site comparison Structure-based maximal affinity model predicts small-molecule druggability A practical view of 'druggability' Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi DrugBank 3.0: a comprehensive resource for 'omics' research on drugs SuperTarget goes quantitative: update on drug-target interactions Therapeutic target database update 2012: a resource for facilitating target-oriented drug discovery PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species ESyPred3D: Prediction of proteins 3D structures The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling Assessing protein structures with a non-local atomic interaction energy PROCHECK: A program to check the stereochemical quality of protein structures SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling Computed Atlas of Surface Topography of proteins BindingDB: a webaccessible database of experimentally determined protein-ligand binding affinities RPBS: a web resource for structural bioinformatics A novel genomics approach for the identification of drug targets in pathogens, with special reference to Pseudomonas aeruginosa In silico analysis of Burkholderia pseudomallei genome sequence for potential drug targets Predicting conserved essential genes in bacteria: in silico identification of putative drug targets A comparative chemogenomics strategy to predict potential drug targets in the metazoan pathogen, Schistosoma mansoni Molecular basis for structural diversity in the core regions of the lipopolysaccharides of Escherichia coli and Salmonella enterica Discovery of new biosynthetic pathways: the lipid A story Surface proteins of gram-positive bacteria and mechanisms of their targeting to the cell wall envelope Facultative methanotrophs revisited Loss of the mtr operon in Methanosarcina blocks growth on methanol, but not methanogenesis, and reveals an unknown methanogenic pathway The conservation and evolutionary modularity of metabolism Nitrate reduction and the nitrogen cycle in archaea NapGH components of the periplasmic nitrate reductase of Escherichia coli K-12: location, topology and physiological roles in quinol oxidation and redox balancing Mechanisms of protein export across the bacterial outer membrane Escherichia coli SecB protein associates with exported protein precursors in vivo Identification of a phosphotransferase system of Escherichia coli required for growth on N-acetylmuramic acid Bacterial phosphotransferase system (PTS) in carbohydrate uptake and control of carbon metabolism Analysis of an engineered Salmonella flagellar fusion protein, FliR-FlhB How bacteria assemble flagella Functional characterization in vitro of all two-component signal transduction systems from Escherichia coli The myxochelin iron transport regulon of the myxobacterium Stigmatella aurantiaca Sg a15 The molecular basis of excitation and adaptation during chemotactic sensory transduction in bacteria Comparative genomic assessment of novel broadspectrum targets for antibacterial drugs Silico Subtractive Genomics for Target Identification in Human Bacterial Pathogens. Drug Development Research Structural and mechanistic investigations on Salmonella typhimurium acetate kinase (AckA): identification of a putative ligand binding pocket at the dimeric interface Crystal structure of the catalytic α subunit of E. coli replicative DNA polymerase III Yokenella regensburgei infection in India mimicking enteric fever Infections caused by Kluyvera species in humans Hafnia alvei in stool specimens from patients with diarrhea and healthy controls Gangrenous cholecystitis caused by Hafnia alvei: a case report and review of literature Ewingella americana: an emerging true pathogen. Case Rep Infect Dis Atrophic rhinitis caused by Cedecea davisae with accompanying mucocele Crystal structure of the catalytic alpha subunit of E. coli replicative DNA polymerase III Identification of the acidic residues in the active site of DNA polymerase III journal Advances and Applications in Bioinformatics and Chemistry is an international, peer-reviewed open-access journal that publishes articles in the following fields: Computational biomodeling; Bioinformatics; Computational genomics; Molecular modeling; Protein structure modeling and structural genomics Chemoinformatics and Drug Design; In silico ADME/Tox prediction. The manuscript management system is completely online and includes a very quick and fair peer-review system, which is all easy to use Antibacterial activity and mechanism of action of a novel anilinouracil-fluoroquinolone hybrid compound Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings Evaluation of human intestinal absorption data and subsequent derivation of a quantitative structureactivity relationship (QSAR) with the Abraham descriptors Structure and mechanism of DNA polymerases DNA polymerase III holoenzyme: structure and function of a chromosomal replicating machine Distinct double-and single-stranded DNA binding of E. coli replicative DNA polymerase III α subunit DNA polymerase III of Gram-positive eubacteria is a zinc metalloprotein conserving an essential finger-like domain In Silico Screening for Novel Inhibitors of DNA Polymerase III Alpha Subunit of Mycobacterium tuberculosis (Mtb DnaE2, H 37 R v) We acknowledge Professor Dr Muhammed Zafar Iqbal for his cooperation and support throughout the research. We would like to thank the Department of Genetic Engineering and Biotechnology, and the Department of Computer Science and Engineering, Shahjalal University of Science and Technology, Sylhet, Bangladesh for providing us research facilities. The authors report no conflicts of interest in this work.