key: cord-029957-q7v5gli8 authors: Prabhu, D.; Rajamanikandan, S.; Anusha, S. Baby; Chowdary, M. Sushma; Veerapandiyan, M.; Jeyakanthan, J. title: In silico Functional Annotation and Characterization of Hypothetical Proteins from Serratia marcescens FGI94 date: 2020-07-31 journal: Biol DOI: 10.1134/s1062359020300019 sha: doc_id: 29957 cord_uid: q7v5gli8 Serratia marcescens, rod-shaped Gram-negative bacteria is classified as an opportunistic pathogen in the family Enterobacteriaceae. It causes a wide variety of infections in humans, including urinary, respiratory, ocular lens and ear infections, osteomyelitis, endocarditis, meningitis and septicemia. Unfortunately, over the past decade, antibiotic resistance has become a serious health care issue; the effective means to control and dissemination of S. marcescens resistance is the need of hour. The whole genome sequencing of S. marcescens FGI94 strain contains 4434 functional proteins, among which 690 (15.56%) proteins were classified under hypothetical. In the present study, we applied the power of various bioinformatics tools on the basis of protein family comparison, motifs, functional properties of amino acids and genome context to assign the possible functions for the HPs. The pseudo sequences (protein sequence that contain ≤100 amino acid residues) are eliminated from the study. Although we have successfully predicted the function for 483 proteins, we were able to infer the high level of confidence only for 108 proteins. The predicted HPs were classified into various classes such as enzymes, transporters, binding proteins, cell division, cell regulatory and other proteins. The outcome of the study could be helpful to understand the molecular mechanism in bacterial pathogenesis and also provide an insight into the identification of potential targets for drug and vaccine development. Enterobacteriaceae family consists of more than 250 published bacterial species and Serratia is one of the most clinically important genera of the family (Alnajar and Gupta, 2017) . The bacteria belonging to the Enterobacteriaceae are most commonly encountered organisms which are isolated from water, soil and clinical specimens. Serratia species is frequently found associated with animals and plants either detrimental or beneficial. Among the genus, Serratia marcescens (S. marcescens), Gram-negative bacteria have been recognized as a crucial cause of healthcare associated infections (nosocomial infections) in humans (Parente et al., 2016) . Several strains of S. marcescens have been reported till date, and the complete genome sequence of Serratia strain FGI94 has been sequenced and its 16S rRNA has shown 99% highest nucleic acid identity with the pathogenic strain Serratia rubidaea JCM1240 (Aylward et al., 2013) . In recent times, the majority of the healthcare associated urinary, respiratory, eye infections, osteomyelitis, wound infections, endocarditis and pulmonary infections were reported by S. marcescens (Padmavathi et al., 2014) . Resistance to antimicrobial agents through intrinsic and acquired process is a notable feature in S. marcescens. A wide variety of gene cassettes containing resistance were identified in the chromosome and plasmids of S. marcescens. Certain strains of S. marcescens have shown resistance to antibiotics (Gentamicin, Cephalosporins, Fluoroquinolone, Cefotaxime and Ceftazidime) which already complicates the antibiotic therapy (Iguchi et al., 2014) . According to the World Health Organization (WHO) report 2018, Serratia spp. were classified under class-1 priority. The categorization was based on the antimicrobial resistance (AMR) mechanism of the pathogens (WHO Priority Pathogens List for R&D of New Antibiotics, 2018). AMR has emerged as a serious threat to the public health authorities at global level, particularly in intensive care and surgical units. Resistance leads to the long-term medication, higher medical cost, prolonged hospitalization, and on the severity, leading to death (Vazirianzadeh et al., 2013) . The rapid emergence of antibiotic resistance is THEORETICAL BIOLOGY occurring worldwide as at least 2 million people acquire resistance each year and around 23,000 deaths were documented by CDC Report (CDC-Antibiotic Resistant Threats Report, 2013) . According to the WHO report, AMR is one of the three major challenging problems of the human race (WHO Antimicrobial resistance, 2018) . Global distribution of resistance plasmids among several bacterial species is alarming the entire world on AMR (Finley et al., 2013) . The recent technological innovations in the Next Generation Sequencing (NGS) generates larger amount of genomic data from wide range of bacterial species. However, very limited bacterial species have their complete proteome information, but sound knowledge of the microbial proteome is essential to understand the disease pathogenesis, virulent determinants, and their survival and propagation (Singh et al., 2015) . For instance, in most bacterial species, nearly 30-40% of genes within in the genomes are marked as unknown or hypothetical (Hoskeri et al., 2010) . Proteins with unknown function or conserved putative proteins, which are showing limited connection with functionally annotated proteins, are termed as Hypothetical Proteins (HPs). HPs are translated portions of nucleic acid sequences based on their sequence similarity, but for experimental existence functional and biochemical characterization has to be evaluated (Shahbaaz et al., 2013) . Moreover, HPs also includes the low identity proteins, imprecisely described and vague functional proteins (Hoskeri et al., 2010) . In general, HPs are classified into two major classes: (i) Uncharacterized Protein Families (UPFs), and (ii) Domains of Unknown Function (DUFs). The ones whose protein structures are available but not been functionally characterized or linked to any known functional gene, are referred as UPFs. Whereas the ones whose experimental existence of protein is available although not related to any known functional or structural domain, are represented as DUFs (Varma et al., 2015) . Annotating the HPs have many advantages such as, deriving new structurefunctional relationship, novel protein structures, and also helps in decoding additional pathways for better understanding of the pathogens. Functionally annotated HPs may serve as potential biomarkers for the screening and diagnostics of diseases. They can also be used as a pharmacological target to design and discover novel drugs (Singh et al., 2015) . Function of the protein can be predicted using various strategies, such as phylogenetic profiling, mass spectrometry identification, SAGE analysis, lethal analysis, etc., (Varma et al., 2015; Gasperskaja and Kučinskas, 2017) . In order to minimize the time and investment cost, various computational tools were developed to aid the functional annotation process (Hawkins and Kihara, 2007) . In the current study, we have used the power of computational tools to annotate the possible functions for the HPs in S. marcescens. Functional predictions of HPs from various bacterial genomes (V. cholera, N. gonorrhoeae, C. difficile, S. aureus, M. tuberculosis, H. influenza) through computational approaches were widely used in successful proteome annotation (Shahbaaz et al., 2013; Singh et al., 2015; Costa et al., 2018) . Deciphering the function of entire gene coding regions in the genome is essential to merge the gaps in proteome to fully understand the pathogenicity and genome plasticity of S. marcescens. Sequence Retrieval The entire protein sequences of S. marcescens FGI94 were identified by searching in NCBI (http://www.ncbi.nlm.nih.gov/) database. S. marcescens genome possesses 4434 protein-coding genes out of which 690 proteins were marked as HPs. All the HPs sequences were retrieved for the biological functional assignment using various functional annotation servers. In order to minimize the misinterpretations in functional annotation pipeline, pseudo genes (proteins having less than 100 amino acid residues) were excluded in this study. The various tools used in the functional predictions of HPs in S. marcescens are tabulated in Table 1 . Physicochemical characterization of the HPs was performed using ExPASy proteomics tools (Gasteiger et al., 2005) . Parameters such as molecular weight (Mw), theoretical isoelectric point (pI), grand average of hydropathicity (GRAVY), aliphatic and instability index were computed for the 483 HPs in S. marcescens. Localization of the HPs was predicted using various tools including PSORTb (Yu et al., 2010) , PSLpred (Bhasin et al., 2005) and CELLO (Yu et al., 2004) . The signal peptide was predicted using SignalP server (Petersen et al., 2011) . SecretomeP (Bendtsen et al., 2005) was used to identify the presence of HPs in nonclassical secretory pathway. Transmembrane information about the HPs was predicted using TMHMM (Krogh et al., 2001) and HMMTOP (Tusnády and Simon, 1998) servers. Functions of the HPs were predicted using the web-based tools. NCBI-Blast (Altschul et al., 1990) , SMART (Letunic et al., 2012) , EBI-Interproscan (Hunter et al., 2011) and Motif (Kanehisa, 1997) were used to identify the functional domains/motifs present in the HPs. Family level classifications of the HPs were identified using Pfam (Finn et al., 2014) , SCOP-Superfamily (Gough et al., 2001) and PANTHER (Thomas et al., 2006) families. Structural level classification of protein super families was predicted using CATH database (Orengo et al., 1997) . Domain architecture of the HPs was predicted using SVMProt (Cai, 2003) , CDART (Geer et al., 2002) and ProtoNet tools (Rappoport et al., 2012) . Virulence activity of the HPs was predicted using Virulentpred (Garg and Gupta, 2008) and VCIM-pred (Saha and Raghava, 2006) . Characterization 483 HPs were selected, which are having more than 100 amino acids residues for the physicochemical characterization and functional annotation. On the basis of amino acid sequences, predicted physicochemical properties of the 483 HPs were tabulated in Supplementary Table 1. Molecular weight is an important criterion in protein functional characterization. It can be seen that the proteins AGB82318.1 and AGB83651.1 have shown the lowest and highest Mw of 106 48.04 and 137161.31 Da respectively. Isoelectric point (pI) is the pH at which the proteins have no net charge and the mobility based on charge will be zero. Prediction of pI is essential in the development of buffer system for purification and in isoelectric focusing. The predicted pI value of the HPs ranged from 3.53 to 11.83. Out of 483 HPs, 246 proteins were observed to be acidic and the predicted pI values ranged from 3.53 to 6.97. Similarly, 236 proteins have shown the pI values between 7.01 and 11.83, are considered basic and AGB83651.1 has shown neutral charge. Extinction coefficient of the HPs were measured in water at 280 nm based on the concentration of Cystine, Tryptophan and Tyrosine amino acid residues in the protein sequences. Higher occupancy of these amino acid residues in the protein results in higher extinction coefficient values. Small proteins with 100 plus amino acid residues containing minimal number of Cystine, Tryptophan and Tyrosine residues have shown very less extinction co-efficient values. Especially, AGB81485.1 and AGB84113.1 have not shown the extinction co-efficient values due to absence of Cys- tine, Tryptophan and Tyrosine amino acid residues. Computing the extinction co-efficient values helps in the quantitative analysis of protein-protein and protein-ligand interactions for the drug development process. Instability index of the HPs were found to be minimum value of -0.24 to the maximum of 119.78. Instability index value illustrates the stability of the proteins in test tube environment. The 228 HPs having the instability index values greater than 40 are classified as unstable and 225 proteins having the instability index less than 40 are classified as stable. Aliphatic amino acid residues present in the protein are directly proportional to the aliphatic index. Proteins with higher aliphatic index have shown higher thermal stability, especially in globular proteins. The computed aliphatic index of the 483 HPs falls between 38.03 and 157.52. The stability of the thermophilic proteins are mainly contributed by the aliphatic amino acids (A, V, I and L), which results in higher aliphatic index values to withstand the wide range of temperatures. GRAVY value represents the protein-water interactions and it illustrates the hydrophilic nature of the protein. The GRAVY values are computed based on the average sum of the hydrophilic and hydrophobic side chains in the amino acids. Least GRAVY value among the HPs is found to be -1.478 and the highest is 1.238. Proteins were characterized as drug and vaccine targets are mainly based on the subcellular localization. Proteins located in cytoplasmic matrix are capable to be potential drug targets, whereas both inner and outer membrane proteins can possibly act as potential vaccine targets. Crucial step in determining the function of the proteins is underlying on the knowledge of localization. Based on the knowledge of trained data sets, the HPs were predicted for their presence in any of these locations (cytoplasm, periplasm, extracellular, inner membrane and outer membrane) in the cell. We predicted the localization of 356 HPs out of 483 proteins subject for the study based on the concurrence hits. Among the predicted 356 HPs, 60% (214) of the proteins were shown to be present in Cytoplasm. Presence of HPs in inner membrane and outer membrane is 17.41% (62) and 2.52% (9) respectively. It is predicted that 14% (50) HPs are present in periplasm and 5.89% (21) in extra cellular matrix. However, locations of 127 proteins were not confirmed due to absence of concurrence results. Signal peptides are the key players in determining the transport of proteins to the target location. Hence, prediction of signal peptide is essential to know the transport system of the particular proteins and the cleavage sites. Apart from the certain cytoplasmic proteins, all other proteins have the signal peptides to facilitate the transport of proteins in and out of the membrane to reach the target cellular location or organelles. We predicted the presence of signal peptide sequences in 116 HPs out of 483. Similarly, 142 proteins were predicted to be involved in non-classical secretory pathway. Proteins which are secreted outside by the cells are secretomes, and these proteins are important to maintain the cell-cell communications, cell proliferation and pathogenesis. Membrane proteins are involved in various processes such as transport, signaling and energy transduction; hence half of the targets used for drug development are membrane proteins. The predicted 136 proteins have shown transmembrane helices from TMHMM server and 235 proteins from HMMTOP server. Prediction of membrane proteins is vital for the better understanding of drug targets to develop potent drug molecules. The details of the 483 HPs were shown in Supplementary Table 2 . Annotating the function of HPs is essential for the better understanding of the entire biological system towards the development of effective drugs. We have analyzed 483 HPs of S. marscences to predict their possible functions. Further, HPs were analyzed for the presence of functional domains and signature motifs in context with the biological function. The results of the domain and motif analysis are shown in Supplementary Table 3 . Results of the sequence and structural features of the HPs were shown in the Supplementary Table 4 . We have successfully annotated the function of 108 HPs with high confidence out of the 483 proteins. The list of 108 proteins with their functions assigned is illustrated in Table 2 . Based on the analysis, only 22.36% of HPs functions were predicted and the remaining HPs have not shown concurrence results, which indicates that suitable experimental strategy has to be coupled for better functional assignment. Annotated proteins ( Fig. 1) were majorly classified into the following six categories such as, enzymes, binding proteins, cell division proteins, cell regulatory proteins, transporters and the remaining proteins involved in different biological process. Bacterial enzymes play a catalyst role in all the metabolic and catholic process, leading to the supply of essential nutrient for the growth and also responsible for the pathogenesis of the organism (Gurung et al., 2013) . We have characterized 34 proteins which act as enzymes, out of which 9 were shown to be transferases. Transferases catalyze the transfer of a functional group (methyl group or glycosyl group) from one molecule (act as donor) to another molecule (act as acceptor). Six proteins were identified as endonucleases, which function in destroying the invaded foreign DNA ( Van den Broek et al., 2005) . Two proteins AGB81206.1 and AGB83112.1 were predicted as a member of the exonuclease-endonuclease-phosphate domain super family which plays a crucial role in the intracellular signaling activities in bacteria (Dlakic, 2000) . Disrupting the intracellular signaling activities either arrests the physiological role or even kills the organism, therefore it is widely considered as a potential target for drug development (Kohanski et al., 2010) . The protein AGB83394.1 is predicted to contain SMR domain like. It is observed that, SMR domain like protein is broadly classified into three sub-families. Family-1 closely relates to the C-terminal domain of the MutS, presumably found in Deltaproteobacteria, Firmicutes, Bacteroidetes and Epsilonproteobacteria phyla of bacteria and plants. The proteins under this family are responsible for the protection of cells from oxidative DNA damages. Family-2 closely relates to the C-terminal domain of MutS (eukaryotes), whereas Family-3 indicates the MutS that are found in E. coli. Overall, these three families of proteins are involved in endoculease activity and also responsible for the formation of the branched DNA structures (Fukui and Kuramitsu, 2011) . The protein AGB81307.1 is predicted to have acyl carrier protein phosphodiesterase and functions essentially in catalyzing the hydrolytic cleavage of the 4′ phosphopantetheine residue from acyl carrier protein phosphodiesterase with the generation of apo acyl carrier protein phosphodiesterase. It also plays a significant role in the regulation of fatty acid synthesis (Fischi and Kennedy, 1990) . The fatty acid metabolic regulation in bacteria is essential to maintain the lipid homeostasis in the growth and stationary phases as well as in various physical and nutrient states (Fujita et al., 2007) . AGB82203.1 is predicted as 2-methyl aconitate cis/trans isomerase PrpF. These proteins catalyze the inter-conversion of 2-methy CAA and 2methyl TAA in the 2-methylcitric acid cycles (Du et al., 2017) . We predicted AGB83389.1 as an elongation factor P. The factor P is required for the synthesis of proteins containing polyproline motifs and functions in ribosomal stalling (Hersch et al., 2013) . AGB83655.1 is predicted as P-loop containing nucleoside triphosphate hydrolase superfamily. These families of proteins function as kinases in many pathways and also work as motor to drive reactions through conformational changes (Leipe et al., 2004 AGB84572.1 is found to be oxidoreductase molybdopterin-binding domain and these proteins are reported to be involved in the H2 metabolism and bioenergetic pathways. The protein involved in these pathways help in the production of ATP molecules by using oxidation-reduction reactions and thereby providing energies to the bacteria for the normal cellular activities (Li et al., 2009) . The observed function of the HPs helps to understand the crucial role of new proteins in bacterial growth and can be targeted as a potential targets for drug discovery. We identified 11 HPs under binding proteins. AGB80434.1 is classified as PK beta barrel domain like protein which is reported to be involved in chromosome formation and prerequisite for the initiation of chromophore maturation. It is well understood that PK beta barrel domain like protein play a significant role in wide variety of cellular processes including DNA replication, DNA repair and horizontal gene transfer (Stepanenko et al., 2013) . Thus, such proteins are important for the survival of the pathogens in the environment. We have characterized AGB82209.1 and AGB84196.1 as Leptospira immunoglobulin like pro-tein B and it is observed that these proteins plays a major role in binding with fibrinogen, collagen, laminin and elastin and inhibit fibrin clot formation (Lin et al., 2011) . The molecular mechanism of protein binding plays a critical role in the coagulation cascade and platelet aggregation, tissue regeneration, and immune responses proves to be a potential target for many pathogens (Choy et al., 2011) . Cell division in bacteria is closely linked to bacterial multiplication. Knowledge on the mechanism of cell division is essential to explore novel targets in drug development. The AGB81074.1 and AGB83861.1 are predicted to be cell division protein (ZapD). It is understood that the members of the ZapD family of proteins share a common role in cell division machinery and mechanism (Durand-Heredia et al., 2012) . The protein AGB84389.1 is predicted to a capsule assembly Wzi family protein. This protein is classified as an outer membrane protein which is involved in extracellular capsule formation in many pathogenic bacteria (Bushell et al., 2013) , and can be used as novel drug target. Cell Regulatory Process Protein Gene regulation in prokaryotes and eukaryotes includes wide range of mechanisms for the production of desired gene product. This regulatory process is a complex network that controls the expression of the various transcriptional units in the bacteria, presumably maintains the microbial pathogenesis, growth and survival. The AGB80600.1 protein was predicted to be DNA recombination protein RmuC. Although the function of this protein is not clearly understood, it shows high level of sequence similarities with myosins and structural maintenance of chromosome proteins (Gaudermann et al., 2006) . Protein AGB80981.1 is a CreA (DNA binding protein), which function in the repression and control of alc regulon expression, that are necessary for the ethanol utilization pathways (Panozzo et al., 1998) . Protein AGB81139.1 is a recombination regulatory (RecA) protein and functions in DNA pairing, strand exchange and recombinational DNA repair (Cox, 1991) . RecA are multifunctional proteins that are found in all forms of living organisms. In the form of recombinase, the protein exhibits DNA dependent ATPase activity and also plays a role in the regulatory system that controls the induction of the SOS response. As in the case of nucleoprotein filament, the pair plays a role in DNA strand exchange (Gruenig et al., 2008) . We found AGB82748.1 protein in association with stage V sporulation protein R related protein. Beall and Moran (1994) in their work described the involvement of SpovR from Bacillus subtilis in spore cortex formation (Beall and Moran, 1994) . Proteins AGB83410.1 and AGB83544.1 are antitoxin systems (ReiI/ParE) composed in all bacterial genome and play a significant role in the formation of persistence cells, involvement in biofilm formation and in the pathogenesis of the organisms (Wen et al., 2014) . We identified AGB84710.1 as Der GTPase activating protein YihI; the families of these proteins are more conserved in Eubacteria and involved in the cell survival and bacterial growth (Verstraeten et al., 2011) . The findings are helpful to conclude that by inhibiting these proteins bang the normal cellular process and thereby reduce the bacterial pathogenecity. Bacteria contains various transport proteins for the import and export of substances such as nutrients, ions, metabolites, amino acids, etc., through the cell membrane, to exclude the unwanted by-products, and modify their cytoplasmic content of protons and salts needed for growth and development of the microorganisms. We predicted AGB81989.1 and AGB83037.1 as type VI secretion system and these proteins play a major role in virulence, antibacterial activity and also participate in metal ion uptake conferring an advantage during bacteria-bacteria competition (Gallique et al., 2017) . AGB82622.1 belongs to multi-drug resistance efflux transporter. EmrE belongs to the SMR family of small multidrug transporters (Ninio et al., 2004) . Recent data suggest that over expression of SMR causes bacteria to become resistant to wide range of antiseptics and antibiotics such as ethidium bromide, methyl viologen and intercalating dyes (Ma and Chang, 2004) . AGB82887.1 is predicted to be Manganese Efflux Pump MntP. Metals such as Manganese, Copper and Zinc are essential for all micro-organisms and appropriate maintenance of metal homeostasis is important to prevent toxicity and mismetallation for both eukaryotes and bacteria (Procheronet et al., 2013) . The identification of such pumps in various bacteria such as E. coli (MntP) (Martin et al., 2015) , Streptococcus pneumoniae (MntE) (Turner et al., 2015) and Neisseria spp. (MntX) (Veyrier et al., 2011) play an important role in removing excess manganese before levels become toxic. The protein AGB83371.1 is found to be tripartite tricarboxylate transporter permease. This kind of protein is most abundant in beta-proteobacteria and plays a significant role in the transport of carboxylate (Rosa et al., 2018) . We identified AGB84290.1 as Lipopolysaccharides export ABC transporter periplasmic protein LptC. It is found that the lipopolysaccharide acts as hydrophilic barrier to wide range of hydrophobic antibiotics and plays a counterpart in the pathogenicity of the organisms (Hicks and Jia, 2018) . We categorized 39 hypothetical proteins function in different biological process under one category named other proteins. The protein AGB80728.1 was predicted as Translocation and Assembly Module (TAM), which plays a major role in outer membrane biogenesis and virulence mechanisms in bacterial kingdom (Josts et al., 2017) . AGB8163.1 is found to be SH3 domain and it has been identified in various bacteria and plays a critical role in the targeting domains involved in bacterial cell wall recognition and metal binding (Kamitori and Yoshida, 2015) . We identified AGB82063.1 as BRCT domain, and it is recognized as a C-terminal region of BRCA1 gene which participates in the signal transduction and in protein targeting motif in the DNA damage response system (Woods et al., 2012) . Further, structural analysis of BRCT domain reveals that it may appear either in singleton (single BRCT) or tandem pair (double BRCT) and it also has a phosphate binding site in which the phosphate molecules are bound either in the DNA end or in the phosphopeptide (Sheng et al., 2011) . We found the protein AGB82090.1 belongs to Ycel protein and to the subgroup of lipocalin superfamily, and plays a role in isoprenoid quinine metabolism, transport and storage (Handa et al., 2005) . The proteins AGB81427. involving sugar modification and also play a role in oxalate metabolism. We predicted AGB82764.1 as outer membrane lipoprotein Slp family which is encoded by the gene XAC1113 (Ferreira et al., 2016) . Bacterial Lipoproteins are a class of membraneanchored proteins and play important role in bacterial physiology and pathogenesis (Zuckert, 2014) . De Souza et al., 2004 reported Slp involvement in the biofilm formation. The protein AGB83581.1 is found to be tetratricopeptide repeat like domain. This kind of proteins are found in wide variety of prokaryotic and eukaryotic organisms and plays an vital role in cell processes and associated with virulence mechanisms of bacterial pathogens (Cerveny et al., 2013) . AGB83707.1 is predicted to be outer membrane lipoprotein RcsF, presumably involved in signal transduction pathway (Shiba et al., 2012) . We identified AGB84181.1 protein as antibiotic biosynthesis monooxygenases. Increased production of reactive oxygen species (ROS) causes oxidative damages to the cells. The primary ROS generated within mitochondria are limited to flavoproteins and flavins that activate the molecular oxygens. The generated monooxygenases do variety of functions in all living organisms and catalyze a wide range of reactions such as participating in respiratory chain within the cytoplasmic membrane and protecting themselves against external reactive oxygen species (Khan et al., 2017 jer, 2005) . These lipoproteins are universally distributed in the bacterial kingdom and accounts to about 1-3% of the total genome. These lipoproteins play pivotal roles in many physiological and cellular processes such as cell wall metabolism, antibiotic resistance, nutrient uptake, cell division, signal transduction and virulence. Thus, the identified HPs are predicted to play an important role in the survival and pathogenesis of the pathogens and the identified functions of HPs may provide clue as a therapeutic target for the drug design and developments. Virulence factors are essential to invade the bacteria to colonize, causes disease and to overcome the defenses of the host. Therefore, understanding the molecular mechanisms of microbial virulence plays a central role in the pathogenesis of the bacteria. So, we used VICMpred and VirulentPred, an SVM based method to predict the bacterial virulence factor among the 483 HPs. The predicted results are shown in Supplementary Table 5 . We found that 29 HPs as virulent factors, 226 as a cellular process, 15 as information and storage and 120 as metabolism molecules. All these proteins can be used as potential targets for drug design and development. In general, about 30-50% of the sequenced genome are referred as hypothetical proteins, and the rapid accumulation of the genome data containing HPs is one of the emerging challenges in modern biology. HPs hamper the identification of novel drug targets which can specifically act on pathogens to combat the pathogenicity. In the case of bacterial pathogens, these HPs play a crucial role in the identification of potential drug targets and also enhance our understanding of their virulence capacity and pathogenicity. In this study, we have characterized the functions for the 108 HPs from S. marcescens with a high level of confidence using various bioinformatics approaches. Various types of enzymes, transporters, cell division, binding proteins were characterized which play an essential role in the growth, survival virulence and pathogenesis of S. marcescens. Characterization of proteins on the basis of physiochemical properties and subcellular localization helps in differentiating the vital drug targets from vaccine targets. In addition, we also identified 18 virulence proteins that are predicted to play a crucial role in the pathogenesis of the organisms. Hence, this study may facilitate future studies on the predicted HPs as novel therapeutic targets for the drug and vaccine development. The authors declare that they have no conflict of interest. This article does not contain any studies involving animals or human participants performed by any of the authors. Phylogenomics and comparative genomic studies delineate six main clades within the family Enterobacteriaceae and support the reclassification of several polyphyletic members of the family Basic local alignment search tool Complete genome of Serratia sp. strain FGI 94, a strain associated with leaf-cutter ant fungus gardens Cloning and characterization of spoVR, a gene from Bacillus subtilis involved in spore cortex formation Non-classical protein secretion in bacteria In silico functional annotation of a hypothetical protein from Staphylococcus aureus PSLpred: prediction of subcellular localization of bacterial proteins The molecular mechanism of bacterial lipoprotein modification-how, when and why Wzi is an outer membrane lectin that underpins group 1 capsule assembly in Escherichia coli Prot: web-based support vector machine software for functional classification of a protein from its primary sequence Antibiotic resistant threats report Tetratricopeptide repeat motifs in the world of bacterial pathogens: role in virulence mechanisms The multifunctional LigB adhesin binds homeostatic proteins with potential roles in cutaneous infection by pathogenic Leptospira interrogans The RecA protein as a recombinational repair system Functional annotation of hypothetical proteins from the Exiguobacterium antarcticum strain B7 reveals proteins involved in adaptation to extreme environments, including high arsenic resistance Gene expression profile of the plant pathogen Xylella fastidiosa during biofilm formation in vitro Functionally unrelated signaling proteins contain a fold similar to Mg 2+ dependent endoculeases Genetic and biochemical characterization of a gene operon for trans-aconitic acid, a novel nematicide from Bacillus thuringiensis Identification of ZapD as a cell division factor that promotes the assembly of FtsZ in Escherichia coli Unravelling potential virulence factor candidates in Xanthomonas citri subsp. citri by secretome analysis The scourge of antibiotic resistance: the important role of the environment Pfam: the protein families database Isolation and properties of acyl carrier protein phosphodiesterase of Escherichia coli Regulation of fatty acid metabolism in bacteria Structure and function of the small MutS-related domain The type VI secretion system: a dynamic system for bacterial communication, Front. Microbiol VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens The most common technologies and tools for functional genome analysis Protein identification and analysis tools on the ExPASy Server Analysis of and function predictions for previously conserved hypothetical or putative proteins in Biochmannia floridanus CDART: protein homology by domain architecture, Genome Res Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure RecA-mediated SOS induction requires an extended filament conformation but no ATP hydrolysis A broader view: microbial enzymes and their relevance industries, medicine, and beyond Crystal structure of a novel polyisoprenoidbinding protein from Thermus thermophilus HB8 Function prediction of uncharacterized proteins Divergent protein motifs direct elongation factor P-mediated translational regulation in Salmonella enteric and Escherichia coli Structural basis for the lipoplysaccharide export activity of the bacterial lipopolysaccharide transport system Functional annotation of conserved hypothetical proteins in Rickettsia massiliae MTU5 InterPro in 2011: new developments in the family and domain prediction database Genome evolution and plasticity of Serratia marcescens, an important multidrug-resistant nosocomial pathogen The structure of a conserved domain of TamB reveals a hydrophobic β Taco fold Structure-function relationship of bacterial SH3 Role of nanomaterials in plants under challenging environments How antibiotics kill bacteria: from targets to networks Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes STAND, a class of P-loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer SMART 7: recent updates to the protein domain annotation resource A molybdopterin oxidoreductase is involved in H 2 oxidation in Desulfovibrio desulfuricans G20 Leptospira immunoglobulin-like protein B (LigB) binding to the C-terminal fibrinogen αC domain inhibits fibrin clot formation, platelet adhesion and aggregation Structure of the multidrug resistance efflux transporter EmrE from Escherichia coli The Escherichia coli small protein MntS and exporter MntP optimize the intracellular concentration of manganese The membrane topology of EmrE-a small multidrug transporter from Escherichia coli CATH-a hierarchic classification of protein domain structures 1-dimethylethyl) of marine bacterial origin inhibits quorum sensing mediated biofilm formation in the uropathogen Serratia marcescens The CreA repressor is the sole DNA-binding protein responsible for carbon catabolite repression of the alcA in Aspergillus nidulans via its binding to a couple of specific sites Serratia marcescens resistance profile and its susceptibility to photodynamic antimicrobial chemotherapy, Photodiagn. Photodyn SignalP 4.0: discriminating signal peptides from transmembrane regions Iron, copper, zinc, and manganese transport and regulation in pathogenic Enterobacteria: correlations, between strains, site of infection and the relative importance of the different metals transport systems for virulence, Front ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree Tripartite ATP-independent periplasmic (TRAP) transporters and tripartite tricarboxylate transporters (TTT): from uptake to pathogenicity VICMpred: an SVM-based method for the prediction of functional proteins of gramnegative bacteria using amino acid patterns and composition Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20 Functional evolution of BRCT domains from binding DNA to protein Exploring the relationship between lipoprotein mislocalization and activation of the Rcs signal transduction system in Escherichia coli Functional annotation and classification of the hypothetical proteins of Neisseria meningitidis H44/76 Beta-barrel scaffold of fluorescent proteins: folding, stability and role in chromophore formation Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools Manganese homeostasis in group A Streptococcus is critical for resistance to oxidative stress and virulence Principles governing amino acid composition of integral membrane proteins: application to topology prediction DNAtension dependence of restriction enzyme activity reveals mechanochemical properties of the reaction pathway DNA binding: a novel function of Pseudomonas aeruginosa type IV pili The first report of drug resistant bacteria isolated from the brown-banded cockroach, Supella longipalpa The university conserved prokaryotic GTPases, Microbiol A novel metal transporter mediating manganese export (MntX) regulates the Mn toFe intracellular ration and Neisseria meningitidis virulence Toxin-antitoxin systems: their role in persistence, biofilm formation and pathogenicity Charting the landscape of tandem BRCT domain-mediated protein interactions Predicting subcellular localization of proteins for gram-negative bacteria by support vector machines based on n-peptide compositions PSORTb 3.0: Improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes Secretion of bacterial lipoproteins: through the cytoplasmic membrane, the periplasm and beyond Supplementary materials are available for this article at https://doi.org/10.1134/S1062359020300019 and are accessible for authorized users.