key: cord-0316640-hsfdlcv1 authors: Gupta, Aeshna; Gangotia, Disha; Vasdev, Kavita; Mani, Indra title: In silico DNA codon optimization of the variant antigen-encoding genes of diverse strains of Nipah virus date: 2021-04-23 journal: bioRxiv DOI: 10.1101/2021.04.23.441071 sha: 9441168d57f57e93b329961e9b8f916641305107 doc_id: 316640 cord_uid: hsfdlcv1 Due to an increase in human and wildlife interaction, more and more zoonotic diseases are emerging. A prime example of this is the emergence of the Nipah virus (NiV). Due to high rate of mortality specifically in India and Bangladesh, there is an urgent need for accelerated research for NiV involving the development of vaccines or drugs. The genome of NiV consists of six genes (N, P, M, F, G and L) encoding yielding nucleoprotein, phosphoprotein, matrix, fusion, glycoprotein and large RNA polymerase. We have used these six genes for in silico assessment of DNA codon optimization in Escherichia coli. It was observed that the codon adaptation index (CAI) and GC content of the genes in optimized DNA were enhanced significantly as compared to wild-type strain. On an average, CAI and GC content of N gene in optimized DNA was enhanced by 2.3 (135.1%) and 1.2(9.9 %) fold respectively, while in P/V/C it was increased by 2.0 (98.3 %) and 1.1(7.8%) fold respectively. Further, the CAI and GC content in optimized DNA of M gene and F gene was enhanced by 2.0(99.0%) and 1.1(7.2%) fold respectively for gene M and 2.4(142.5 %), 1.2(15.4%) fold respectively for gene F. Gene G showed an increase of 2.1(114.8 %) fold for CAI, 1.1(11.2%) fold for GC content and gene L showed an increase of 2.4(143.7%) fold for CAI, 1.2(17.2%) fold for GC content. Our result demonstrates that the optimized genes could be useful for better expression in host without any truncated proteins and also useful for protein folding and function. Emerging zoonotic diseases are the products of socioeconomic and anthropogenic environmental changes, Nipah virus (NiV) being one of its best examples. Nipah is a zoonotic disease caused by Nipah virus. Fruit eating species such as Pteropus bats, popularly known as flying foxes, are supposed to be the natural hosts of the virus. NiV emerged as a new virus, causing severe morbidity and mortality in both humans and animals exactly 20 years ago and destroyed the pig-farming industry in Malaysia, and it continues to cause outbreaks in Bangladesh and India [1] . NiV is the second member of the genus Henipavirus in the family Paramyxoviridae [1] . Similar to other paramyxoviruses, NiV particles are pleomorphic, spherical to filamentous, having an RNA genome and range in size from 40 to 1,900 nm. Among the NiVs known to cause disease in humans, there are two major genetic lineages, i.e., NiV Malaysia (NiV-MY) and NiV Bangladesh (NiV-BD). Genome of the Malaysia NiV is 18,246 nucleotides (nt) in length, whereas that of the Bangladesh NiV is 18,252 nt [2] . The core of the virion contains a linear ribonucleoprotein (RNP) comprising of negative sense single stranded RNA. The genome consists of six genes (N, P, M, F, G and L) encoding nucleoprotein, phosphoprotein, matrix, fusion, glycoprotein and large RNA polymerase [3] . Nucelocapsid protein (N) is the most abundant protein present and necessary for capsid structure. Phosphoproteins (P) and large polymerase proteins (L) aid RNA polymerase in transcribing RNA to mRNA to antigenomic RNA. Traditional lipid bilayer envelopes the virion but it is "spiked" with fusion (F) and receptor-binding glycoproteins (G). Matrix proteins (M) are present on the underside of the lipid bilayer for structural support and regulating the budding process. The P gene encodes at least three nonstructural proteins (C, V, and W) in addition to the P protein. However, P protein is the only essential gene product for genome replication [3] . NiV entry and cell-to-cell spread are driven by two transmembrane glycoproteins, the attachment (G) and the fusion (F) proteins, that are exposed on the surface of viral particles and on infected cells to mediate attachment to the host cell receptor and membrane fusion, respectively [4] . As the world continues to struggle with the COVID-19 pandemic caused by SARS-CoV-2, it becomes all the more crucial to study the characteristics of Nipah virus that might increase its risk of causing a global pandemic in future. The route of infection of NiV from bats to humans is by ingestion and consumption of NiV-contaminated or partially eaten fruits, or by contact with infected animals such as pigs, cattle and goats [5] . Clinical presentation ranges from asymptomatic infection to fatal encephalitis [1] . As an RNA virus, it has an exceptionally high rate of mutation and if a human-adapted strain were to infect communities in South Asia, high population densities and global interconnectedness would rapidly spread the infection [6] . Hence there is a potential need for development of highly immunogenic vaccine against the virus. Currently, no drugs or vaccines exist for this virus, though many trials are in progress [7] . DNA vaccines also known as 'naked DNA' or nucleic acid vaccine, encode antigens of pathogenic organisms including viruses, bacteria, fungi and parasites [8] . However, these are shown to have low immunogenic properties in larger species such as primates and humans [9] . One promising approach for enhancing its immunogenicity is to maximize its expression in the immunized host [10] . Many organisms including viruses tend to have biases towards certain synonymous codon and codon pairs in their genes. A gene containing rarely used codons in one particular organism will show increased expression levels in heterologous system through codon optimization. There are many software tools and technologies which have been developed for gene expression studies and predicting the expression level of genes through computational methods. This is appealing as expensive and difficult experiments are not required [11] . DNA codon optimization is one such useful technology for improving the yields of expressed heterologous proteins. It is a technique to exploit the protein expression in living organism by increasing the translational efficiency of gene of interest by transforming DNA sequence of nucleotides of one species into DNA sequence of nucleotides of another species like plant sequence to human sequence, human sequence to bacteria or yeast sequences [12] . Variation in codon usage is considered as one of the important factors affecting protein expression levels, [13] since the presence of rare codons can reduce the translation rate and induce translation errors with a remarkable impact on the economics of recombinant microbe-based production processes [13, 14, 15] . Methods for optimizing genes are sophisticated and becoming increasingly popular for a variety of applications such as expression in prokaryotes, yeast, plants and mammalian cells [16] . The host specific epitopes have been earlier identified in influenza A virus [12] . Codon optimization of the Ag85B gene which encodes the secretory antigen of Mycobacterium tuberculosis has also proved beneficial [10] . Furthermore, the use of codon optimized genes has allowed notable increases in the production of many enzymes in a variety of hosts, including cellulases in Saccharomyces cerevisiae [17] , phytases in Aspergillus oryzae [18] , cutinases [19] , lignocellulases [20], and lipases [21] in Pichia pastoris and calf prochymosin in Escerichia coli [22] . Thus, it can be concluded that the list of products obtained by the expression of codon optimized genes in microorganisms is constantly growing and includes biofuels, pharmaceuticals, novel bio-based materials and chemicals, industrial enzymes, amino acids, and other metabolites [13] . Therefore, the aim of this study is to optimize the codons for over expression of all six target genes of NiV in E. coli using in silico tools for production of adequate amount of protein. The synonymous codons were specifically altered without any changes in the amino acid sequence so that antigenicity and functional activity of each protein remains exactly similar to its native type. The DNA codon optimization of the studied genes will be useful in increasing the expression level of desired proteins so as to ensure their efficient production for immunotherapy and immunodiagnostics purposes, without any bias. Nucleotide sequences (cds) of different genes of Nipah virus (Accession number: NC_002728.1, FJ513078.1, AY988601.1, AJ627196.1, AY029768.1, and MH523642.1) were retrieved from NCBI-GenBank (http://www.ncbi.nlm.nih.gov). Optimizer (http://genomes.urv.es/OPTIMIZER/) [23] is an on-line PHP application useful for predicting and optimizing the level of expression of a gene in heterologous gene expression host. It was used for optimization and calculation of codon adaptation index (CAI), G+C and A+T content of the retrieved DNA sequences with reference to E. coliK-12 MG1655 as it is a popular host for heterologous gene expression. CAI was also calculated for each gene of six different strains. GraphPad Prism (version 8.1) software was used for statistical analysis of genes to calculate mean, range and standard deviation. The values were tabulated and a graph was then plotted to compare the CAI of wild-type and optimized gene sequences among different strains of Nipah virus. The CAI, GC and AT of all 6 genes of Nipah virus were compared using Mann Whitney test. A twotailed probability p < 0.05 were considered to be statistically significant. Nucleotide sequence alignment was carried out using ClustalW between wild-type and optimized sequences for all 6 genes of strain NC_002728.1 FJ513078.1, AY988601.1, AJ627196.1, AY029768.1, and MH523642.1. Currently, no drugs or vaccines exist for Nipah virus, though many trials are in progress [7] . Treatment is limited to supportive care [24] . Ribavirin has shown some evidence for a reduction in mortality, but its efficacy against NiV disease has not yet been established [25] . Therefore, there is an urgent need for development of effective vaccines for which the current study, was undertaken using the DNA codon optimization method for producing adequate quantity of protein in the desired host. The codon usage for various genes of Nipah virus i.e. nucleocapsid protein, P/V/C, matrix protein, fusion protein, attachment glycoprotein and polymerase were summarized in Table 1 -6 respectively. Their codons were optimized with reference to E. coli. In the present study, we observed that the CAI of optimized sequences was more in comparison to the wild-type sequences. The CAI, GC and AT frequencies in six strains of wild- DNA was found to be significantly higher. The mean CAI and GC in optimized DNA were found to be 2.3 (135.1%) and 1.2(9.9%) fold higher than respective mean values of wild-type. However, mean of AT content in optimized DNA was decreased by 8.2 % compared to wild-type ( Table 1) . A graph was then plotted taking the CAI values on the x-axis while the number of strains studied on y-axis (Fig. 1) . The nucleocapsid gene sequences of the wild-type and codon-optimized were aligned as shown (Fig. S1) . Codon optimization did not change the amino acid sequence of nucleocapsid protein. Similarly, the CAI, GC and AT frequencies in strains of wild-type P/V/C range from 0. of AT content in optimized DNA was decreased by 6 % compared to wild-type ( Table 2) . A graph was then plotted for the same (Fig. 2) . The P/V/C gene sequences of the wild-type and codonoptimized were aligned as presented in (Fig. S2) . found to be 2.0(99.0%) and 1.1(7.2%) fold higher than respective wild-type strain. But, mean of AT content in optimized DNA was decreased by 5.4 % compared to wild-type ( Table 3) . A Graph was plotted similar to above mentioned graphs (Fig. 3) . The matrix protein gene sequences of the wild-type and codon-optimized were aligned as shown (Fig. S3) . Codon optimization did not show any modification in the amino acid sequence of matrix protein. found to be 2.4 (142.5 %) and 1.2 (15.4%) fold higher than respective wild-type strain. However, mean of AT content in optimized DNA was decreased by 9.5 % compared to wild-type ( Table 4) . Whereas the respective frequencies for attachment glycoprotein in optimized DNA range from 0.560 to 0.602, 44.2 to 44.9 and 55.1 to 55.8 with an average (±SD) of 0.582(±0.020), 44.5(±0.308) and 55.5(±0.308) respectively. The mean CAI and GC in optimized DNA were found to be 2.1(114.8 %) and 1.1(11.2%) fold higher than respective wild-type values. However, mean of AT content in optimized DNA was decreased by 7.5 % compared to wild-type ( Table 5 ). The graphs were then plotted for both the genes taking the CAI values on the x-axis and the number of strains studied on y-axis ( Fig. 4 and 5) . The fusion protein gene sequences of the wild-type and codon-optimized were aligned as shown in (Fig. S4) . The glycoprotein gene sequences of the wildtype and codon-optimized were aligned as presented (Fig. S5) . Codon optimization did not alter the amino acid sequence of glycoprotein. The mean CAI and GC in optimized DNA were found to be 2.4(143.7%) and 1.2(17.2%) fold higher than respective wild-type values. However, mean of AT content in optimized DNA was reduced by 10.3 % compared to wild-type ( Table 6) . A graph was plotted on similar basis for all strains (Fig. 6) . The polymerase gene sequences of the wild-type and codon-optimized were aligned as shown (Fig. S6) . It aims to increase the immunogenicity of epitope-based vaccines as it can enhance translational efficiency. Thus, modification of the codon bias of gene sequences is a promising tool of gene expression control. The applications of DNA codon optimization range from numerous animal tests to remove stop codons, to clone, in custom design of synthetic genes, to improve the functionality of genes, to increase protein expression level, for lower production costs, as well as in drug development. The However, AT content in all genes was significantly decreased, as compare to wild-type. A handful of vaccine candidates are in development that employ NiV glycoprotein (G) and fusion (F) proteins to stimulate a protective immune response in preclinical animal models. Some approaches target specific neutralizing antibody responses; others have been evaluated for both immune response and efficacy [31] . NiV envelope glycoprotein G was found to effectively induce specific antibody responses which could block NiV entry to susceptible cells [32] . The generation of neutralizing antibodies in response to G protein is suggestive of a robust adaptive immune response, which is an essential prerequisite of a good vaccine. While G may itself serve as a good vaccine antigen, the stable prefusion of both F and G antigens has shown to produce a higher and broader multivalent polyclonal antibody response in mice models, making it a more potent candidate for vaccine development [33] . Therefore, G protein or G/F prefusions , being more immunogenic, can serve as good vaccine candidates. Further, codon optimization can be used to increase the expression of these genes in cells in order to achieve high titres for large scale production. Thus, codon optimization coupled with immunology-based studies is important to produce effective vaccines, which have a potential of being upscaled at industrial level. Identifying and producing such vaccines will provide an excellent therapeutic strategy for fighting Nipah Nipah Virus Molecular epidemiology and phylogeny of Nipah virus infection: A mini review Genomic characterization of Nipah virus Hendra virus and Nipah virus animal vaccines Gene Electrotransfer of a Hepatitis C Virus DNA Vaccine in Nonhuman Primates Codon optimization of Candida rugosa lip1 gene for improving expression in Pichia pastoris and biochemical characterization of the purified recombinant LIP1 lipase Treatment of acute Nipah encephalitis with ribavirin Multiple effects of codon usage optimization on expression and immunogenicity of DNA candidate vaccines encoding the Human Immunodeficiency Virus Type 1 Gag Protein Rational design of peptide vaccines for the highly lethal Nipah and Hendra viruses. bioRxiv.pii: a011528 Nipah virus matrix protein influences fusogenicity and is essential for particle infectivity and stability Expression of codon optimized genes in microbial systems: current industrial applications and perspectives Rationals of Gene Design and de novo Gene Construction Codon bias and heterologous protein expression A family of thermostable fungal cellulases created by structure-guided recombination Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes Engineering DNA vaccines via co-delivery of co-stimulatory molecule genes Optimization of codon usage enhances the immunogenicity of a DNA vaccine encoding Mycobacterial antigen Ag85B Coding-sequence determinants of gene expression in Escherichia coli Nipah virus infection: current scenario Authors are thankful to the Principal, Gargi College for providing the infrastructural support.