key: cord-0780948-up1maluy authors: Joshi, Amit; Krishnan, Sunil; Kaushik, Vikas title: Codon usage studies and epitope-based peptide vaccine prediction against Tropheryma whipplei date: 2022-03-07 journal: J Genet Eng Biotechnol DOI: 10.1186/s43141-022-00324-5 sha: 2d9aa87cb6e8c8b55958ab728506f5e2085af26e doc_id: 780948 cord_uid: up1maluy BACKGROUND: The Tropheryma whipplei causes acute gastroenteritis to neuronal damages in Homo sapiens. Genomics and codon adaptation studies would be helpful advancements of disease evolution prediction, prevention, and treatment of disease. The codon usage data and codon usage measurement tools were deployed to detect the rare, very rare codons, and also synonymous codons usage. The higher effective number of codon usage values indicates the low codon usage bias in T. whipplei and also in the 23S and 16S ribosomal RNA genes. RESULTS: In T. whipplei, it was found to hold low codon biasness in genomic sets. The synonymous codons possess the base content in 3rd position that was calculated as A3S% (24.47 and 22.88), C3S% (20.99 and 22.88), T3S% (21.47 and 19.53), and G3S% (33.08 and 34.71) for 23s and 16s rRNA, respectively. CONCLUSION: Amino acids like valine, aspartate, leucine, and phenylalanine hold high codon usage frequency and also found to be present in epitopes KPSYLSALSAHLNDK and FKSFNYNVAIGVRQP that were screened from proteins excinuclease ABC subunit UvrC and 3-oxoacyl-ACP reductase FabG, respectively. This method opens novel ways to determine epitope-based peptide vaccines against different pathogenic organisms. Tropheryma whipplei is an actinobacteria pathogen causing Whipple's disease in Homo sapiens. This pathogenic problem was discovered and found to be associated with gastroenteritis, endocarditis, and neuronal damages in Caucasian individuals [1] . Regardless of this, its lethal impact was additionally seen in canines [2] . The credit for its name and disclosure was connected with honorable Nobel laureate G. H Whipple, who performed many explorations for lipodystrophy (malfunctioned lipid biosynthesis and ingestion) brought about by T. whipplei [3] has a broad-spectrum infection. Caucasian populaces, kids, sewage, and farming specialists were discovered to be generally influenced by this illness. The bacterium causes immunomodulation with an extended IL-16 discharge, IL-10 synthesis, and dysregulation of mucosal T-helper cells. Further immunological irregularities were depicted because of Whipple's disease's multifaceted nature [4] . Clinical side effects of this infection were seen as extreme looseness of the bowels, loss of body weight, and weakness among patients [5] . T. whipplei assaults lamina propria of the gastrointestinal tract and targets macrophages for its replication [6] . Sequencing of two strains of T. whipplei (Twist and TW 08/27) was effectively led by the French researchers that already open scope for genomic examination and improvement of better treatment procedures for this lethal sickness; in their investigation, it was discovered that this actinobacterium has low GC content (46%) in correlations with other relatives of a similar order [7] . Current medicines like doxycycline, hydroxychloroquine, and trimethoprim/sulfamethoxazole must be used for almost 2 years and lifetime follow-up for patients [8, 9] . Later in silico concentrates on epitope-based vaccine design can become conceivable prophylaxis for Whipple's illness [10] . This actinobacterium has a huge encoding of surface proteins, while some are additionally connected with the enormous substance of noncoding redundant DNA. This genome additionally shows the fluctuation in genomic sets, including phase variations causing the modifications of cell proteins; this shows the importance of immune bypass and association with the host genome [1, 7] . Such uncommon genomic trademark highlights of bacterium open wide scope in discovering codon utilization patterns to uncover characteristic and mutational determination. Codons contained 3 nucleotides in sequence and coded for a particular amino acid or as a STOP codon for translation. The differences in codon usage are differences defined in codon usage bias. Equivalent codon utilization in numerous prokaryotic unicellular life forms is consistently connected with the directional mutational inclination and translational choice [11] . Other elements like replication-translation determination, protein hydropathy, can likewise have a critical impact [12] . In some microbial pathogen species, mutational predisposition was discovered to be strand explicit, and those living beings show differed interchangeable and nonequivalent codon utilization [13] . This examination not just give experiences about characteristic and mutational determination pressures acting at genomic levels of T. whipplei yet besides offer a superior cognizance of transformative improvements in this hostversatile bacterium. This computational examination uncovered the data concerning profoundly translated proteins and enzymes of this bacterium, and the conceivable amino acids that can be considered in epitope-based prophylaxis plan to get the inhibitory effect on bacterial action on its host or to create a better conceivable treatment like in immunoinformatics-based recent studies [14, 15] . Ribosomal RNA (16S and 23S) codon usage patterns were analyzed here to determine the changes associated with evolutionary or phylogenetic patterns of the bacterium. In this study, we also revealed epitope-based peptide vaccine candidates against Tropheryma whipplei. The aim of the study is to determine codon usage patterns in T. whipplei, and on the basis of that we predicted epitope-based vaccine candidate by deploying latest bioinformatics tools. To measure the codon usage bias, retrieved codon usage tables from codon and codon pair usage tables (CoCoP-UTs) database. This database showed the relative frequency that different codons are used in genes in T. whipplei RefSeq data. Similarly, codon-pair usage tables displayed the counts of each codon pair in the CDSs of T. whipplei genomic data (RefSeq) and calculated codonpair usage bias. The complete nucleotide sequences of T. whipplei strains. The selected FASTA sequences of Twist 16S ribosomal RNA and 23S ribosomal RNA were retrieved from the NCBI Refseq database (https:// www. ncbi. nlm. nih. gov/ nucco re). The codon usage dataset was retrieved from the Codon Usage Database (http:// www. kazusa. or. jp/ codon/). All codons in the original sequence of T. whipplei strains are replaced with the corresponding redundant codon having the highest codon usage frequency. ATGme tool [16] was used to identify rare codons and accordingly optimize genomic sequences (http:// www. atgme. org/). Genomic sequences in FASTA format pasted in the search box, and codon usage table pasted in the respective interface and processed the data for analysis of rare codons and sequence optimization. From the identified genomic sequences of ribosomal RNA, nucleotide composition was computed. The G + C composition of 1st, 2nd, and 3rd positions and GC1s, GC2s, and GC3s in the codons were discovered for the frequency and mean frequency identification. The frequency of synonymous third position codon and percentage, i.e., A3, T3, G3, and C3 and %A3s, %C3s, %T3s, and %G3s, respectively, was calculated. To measure the bias of synonymous codons, the effective number of codons (ENC) was identified. Additionally, codon usage, codon usage per thousand, and relat ive synon ymous codon usage (RSCU) were also calculated using "CAIcal" tool availed from https:// ppuig bo. me/ progr ams/ CAIcal/. Proteomic data for Tropheryma whipplei was accessed from NCBI GenBank database, and then allergenicity was estimated by deploying AllergenFP server [17] . Net-MHCIIpan-4.0 server [18] was used to screen epitopes from selected proteins that can interact with human leukocyte antigen (HLA) proteins. VaxiJen 2.0 tool [19] was used to reveal antigenicity of screened epitopes. Epitopes structure was predicted by using PEP-FOLD 3.5 [20] , and HLA allelic determinant HLA DRB1_0101 (PDB-ID:1AQD) was retrieved from RCSB-PDB database. Biochemical properties for epitopes were calculated by using ProtParam tool of ExPASy web server. Molecular docking between epitopes and HLA determinants was done by using PatchDock [21] , FireDock, and DINC web tool [22] . These tools not only assist in docking in user-friendly approach but also calculate The codon-pair usage table and dinucleotide usage data were identified from the CoCoPUTs database [23, 24] . The T. whipplei taxonomy ID or taxid (2039) was verified by NCBI's taxonomy tool, and the taxonomy was illustrated in Fig. 1 . The log-transformed codon-pair frequency heat map was discovered from the data analysis as illustrated in Fig. 2 . The degree of ENC values ranges from 20 to 61 [25] . If the value is 20, then one codon coding for each amino acid and value ranged to 61 means all the synonymous codon was used for each amino acid. The ENC value computed in our analysis was 56.138, which means more than one codon was used for each amino acid. The ENC value should be ≤ 35 for significant codon bias [26] . So, the higher ENC value indicates the low codon usage bias in T. whipplei. The ENC value details are demonstrated in Table 1 . The codon usage details are summarized in the Table 2 , and the codon usage frequency per 1000 codons is illustrated in Fig. 3 . The RefSeq (n = 859) of T. whipplei had 88597 CDSs and 28006357 codons. Table 2 Tropheryma whipplei strain Twist complete sequence of 23S and 16S ribosomal RNA genes were composed of 3102 base pairs and 1521 base pairs, respectively. Tropheryma whipplei Twist strain's CDS, codons, frequency per thousand, and the number of codons details are summarized in Tables 3 and 4 . These codon usage tables were used for the identification of rare codons and sequence optimization. The analysis resulted from usage data, original sequence, and optimized sequence. Tropheryma whipplei strain Twist 23S ribosomal RNA gene sequence analyzed usage data predicted GTT and GAT (36.7% and 36.3 %) had the high frequency in codon usage. TAA, TAG, and TGA code as "STOP" had the lowest usage frequency percentage ((0.9 %, 1.0 % and 1.1 %) and found these are the very rare codons. The rare codons are CGA, TGC, CGG, TGT, CAC, ACG, CCC, and TCG. The stop codons are terminating the protein translation process [27] . The details of rare codons and very rare codons (code as, count, and percentage of usage frequency) of 23s and 16S rRNA were summarized in Tables 5 and 6 . The calculated compositional properties for the coding sequences of the Figures 5 and 6 show rRNA characteristic features like length and nucleotide composition. In Fig. 7 , rRNA synonymous codons percentage is given, while in Fig. 8 , codon measurements were indicated. The in silico analysis reveals two epitopes of 15 amino acid residues (i.e., KPSYLSALSAHLNDK and FKS-FNYNVAIGVRQP) that hold perfect interaction with HLA-DRB-0101 (MHC class II allelic determinant). In Table 7 , retrieved sequences were shown with accession numbers, and allergenicity was also presented by deploying Allergen FP tool (this tool generates Tanimoto similarity index). Epitopes were determined by using NetMHCIIpan-4.0 server that gathers core information from IEDB database and uses artificial neural networks (ANN) to access interaction of peptidal stretches to HLA allelic determinants. Amino acids like valine, aspartate, leucine, and phenylalanine hold high codon usage frequency and also found to be present in these screened epitopes from excinuclease ABC subunit UvrC and 3-oxoacyl-ACP reductase FabG. In Table 8 , all 10 peptides are holding good VaxiJen score, and NetMHCIIpan-4.0 scores are provided, but there were a total of 2151 epitopes discovered. VaxiJen score indicates antigenicity for peptides. ProtParam results reveal only two finalized epitopes to be stable (Table 9) . Epitopes structure was predicted by using PEP-FOLD 3.5 [20] , and HLA allelic determinant HLA DRB1_0101 (PDB-ID:1AQD) was retrieved from RCSB-PDB database to perform molecular docking analysis. Molecular docking of selected epitopes with HLA-DRB0101 shows perfect interaction (Table 10 ). Figure 9 indicates docked complexes of selected epitopes with HLA-DRB-0101 visualized in PyMOL software. The Tropheryma whipplei causes acute gastroenteritis to neuronal damages in Homo sapiens. Genomics and codon adaptation studies would be helpful advancements of disease evolution prediction, prevention, and treatment of disease. The codon-pair usage table and dinucleotide usage data were identified from the CoCoP-UTs database [23, 24] . The ENC value computed in our analysis was 56.138, which means more than one codon was used for each amino acid. The ENC value should be ≤ 35 for significant codon bias [26] . Tropheryma whipplei Twist strain's CDS, codons, frequency per thousand, and the number of codons; for identification of rare codons and sequence optimization. The ratio of observed codon frequency to the expected synonymous codons usage for the amino acid i.e., relative synonymous codon usage (RSCU) [28] . The degree of bias towards estimated, i.e., Codon Adaptation Index, value was 0.73 and 0.725 for 23s and 16s rRNA respectively. The value ranged between 0 and 1; higher values indicate stronger bias in codon usage and high gene expression level. In previous studies, membrane proteins were considered to be associated with considerable biasness [29] , while in current study, we recognized rare codon biasness associated with entire genome of T. whipplei. The major requirement of codon biasness study assists in determining amino acids expressed patterns that can be linked to epitope-based vaccine predictions. In recent studies, for SARS-CoV2 [30, 31] , dengue [32, 33] , Nipah [34] , Candida fungus [35] , Canine circovirus [36] , and Zika virus [37] , vaccine predictions were found to be successful. So, codon usage pattern determination can be considered as [35] and human cytomegalovirus [38] . Similarly, drug repurposing was made easy against harmful pathogens by deploying bioinformatic approaches [39] . Similarly, for animal models, viral pathogenic proteomes were screened for vaccine designing by deploying immunoinformatics [33, 36, 40] . This study is unique in terms of saving time and money for peptide-based vaccine crafting. Considerable biases in codon usage and amino acid usage indicate clearly that T. whipplei has a low codon bias. T. whipplei genomic sets. The analysis could be targeted for disease evolution prediction, developing drugs, or vaccine candidates. We also found KPSYLSALSAHL-NDK and FKSFNYNVAIGVRQP, two epitopes, can possibly act as vaccine candidates against T. whipplei. A future development requires wet-lab validations for these epitopes that are highly expressed in this bacterium and have therapeutic peptide formation capability. Tropheryma whipplei Twist: a human pathogenic actinobacteria with a reduced genome Tropheryma whipplei as a commensal bacterium Clinical manifestations, treatment, and diagnosis of Tropheryma whipplei infections Impaired immune functions of monocytes and macrophages in Whipple's disease Systemic Tropheryma whipplei: clinical presentation of 142 patients with infections diagnosed or confirmed in a reference center Tropheryma whipplei, the Whipple's disease bacillus, induces macrophage apoptosis through the extrinsic pathway Sequencing and analysis of the genome of the Whipple's disease bacterium Tropheryma whipplei Evidence of lifetime susceptibility to Tropheryma whipplei in patients with Whipple's disease Resistance to trimethoprim/sulfamethoxazole and Fig. 9 Molecular docking results of epitopes with HLA-DRB-0101. A KPSYLSALSAHLNDK from protein excinuclease ABC subunit UvrC and B FKSFNYNVAIGVRQP from protein 3-oxoacyl-ACP reductase FabG In-silico proteomic exploratory quest: crafting T-cell epitope vaccine against Whipple's disease Trends in codon and amino acid usage in Thermotoga maritima Absence of translationally selected synonymous codon usage bias in Helicobacter pylori Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces Chikungunya virus vaccine development: through computational proteome exploration for finding of HLA and cTAP binding novel epitopes as vaccine candidates T-cell epitopebased vaccine designing against Orthohantavirus: a causative agent of deadly cardio-pulmonary disease ATGme: open-source web application for rare codon identification and custom DNA sequence optimization AllergenFP: allergenicity prediction by descriptor fingerprints NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines PEP-FOLD: an updated de novo structure prediction server for both linear and disulfide bonded cyclic peptides Patch-Dock and SymmDock: servers for rigid and symmetric docking DINC 2.0: a new protein-peptide docking webserver using an incremental approach Codon and codon-pair usage tables (CoCoPUTs): facilitating genetic variation analyses and recombinant gene design A new and updated resource for codon usage tables The 'effective number of codons' used in a gene Evolution of codon usage in Zika virus genomes is host and vector specific Localized context-dependent effects of the "ambush" hypothesis: more off-frame stop codons downstream of shifty codons Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare'codons Evolutionary constraints on codon and amino acid usage in two strains of human pathogenic actinobacteria Tropheryma whipplei Epitope based vaccine prediction for SARS-COV-2 by deploying immuno-informatics approach Immuno-informatics quest against COVID-19/SARS-COV-2: determining putative T-cell epitopes for vaccine prediction Immunoinformatics designed T cell multi epitope dengue peptide vaccine derived from non structural proteome 2020) T cell epitope designing for dengue peptide vaccine using docking and molecular simulation studies In silico identification of epitope-based peptide vaccine for Nipah virus In-silico design of a multivalent epitope-based vaccine against Candida auris An immunoinformatics study: designing multivalent T-cell epitope vaccine against canine circovirus In-silico prediction of peptide based vaccine against Zika virus Design of a novel and potent multivalent epitope based human Cytomegalovirus peptide vaccine: an immunoinformatics approach Molecular docking and simulation investigation: effect of beta-sesquiphellandrene with ionic integration on SARS-CoV2 and SFTS viruses In-silico designing of epitope-based vaccine against the seven banded grouper nervous necrosis virus affecting fish species Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations All the authors are thankful towards the school of bioengineering and biosciences, Lovely Professional University, Phagwara, Punjab, India. Authors' contributions AJ and VK, peptide identification using codon bias studies. VK, conception of idea of this article and gap identification in existing studies and editing of the paper. AJ and SKG, molecular dynamic simulation study and analysis. The authors read and approved the final manuscript. All data is provided in manuscript. Ethics approval and consent to participate Not applicable. There is no impact on ethical standards in this study, and there is no human or animal involvement. The authors declare that they have no competing interests.