key: cord-0077942-knrx8dos authors: Shafat, Zoya; Ahmed, Anwar; Parvez, Mohammad K.; Parveen, Shama title: Analysis of codon usage patterns in open reading frame 4 of hepatitis E viruses date: 2022-05-10 journal: Beni Suef Univ J Basic Appl Sci DOI: 10.1186/s43088-022-00244-w sha: 2c65213559048bfcf67faa70595580262ad56455 doc_id: 77942 cord_uid: knrx8dos BACKGROUND: Hepatitis E virus (HEV) is a member of the family Hepeviridae and causes acute HEV infections resulting in thousands of deaths worldwide. The zoonotic nature of HEV in addition to its tendency from human to human transmission has led scientists across the globe to work on its different aspects. HEV also accounts for about 30% mortality rates in case of pregnant women. The genome of HEV is organized into three open reading frames (ORFs): ORF1 ORF2 and ORF3. A reading frame encoded protein ORF4 has recently been discovered which is exclusive to GT 1 isolates of HEV. The ORF4 is suggested to play crucial role in pregnancy-associated pathology and enhanced replication. Though studies have documented the ORF4’s importance, the genetic features of ORF4 protein genes in terms of compositional patterns have not been elucidated. As codon usage performs critical role in establishment of the host–pathogen relationship, therefore, the present study reports the codon usage analysis (based on nucleotide sequences of HEV ORF4 available in the public database) in three hosts along with the factors influencing the codon usage patterns of the protein genes of ORF4 of HEV. RESULTS: The nucleotide composition analysis indicated that ORF4 protein genes showed overrepresentation of C nucleotide and while A nucleotide was the least-represented, with random distribution of G and T(U) nucleotides. The relative synonymous codon usage (RSCU) analysis revealed biasness toward C/G-ended codons (over U/A) in all three natural HEV-hosts (human, rat and ferret). It was observed that all the ORF4 genes were richly endowed with GC content. Further, our results showed the occurrence of both coincidence and antagonistic codon usage patterns among HEV-hosts. The findings further emphasized that both mutational and selection forces influenced the codon usage patterns of ORF4 protein genes. CONCLUSIONS: To the best of our knowledge, this is first bioinformatics study evaluating codon usage patterns in HEV ORF4 protein genes. The findings from this study are expected to increase our understanding toward significant factors involved in evolutionary changes of ORF4. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s43088-022-00244-w. Hepatitis E virus (HEV) is a small RNA virus, belonging to the Hepeviridae family. Hepatitis E is potentially a serious acute disease caused by the agent HEV [1, 2] . HEV is primarily transmitted through contaminated water sources or through the consumption of infected or undercooked meat products derived from animals (swine, deer, or wild boar) [3, 4] . The HEV contains a positive-sense, single-stranded RNA molecule of approximately 7.2 kB in length, flanked by 5′ and 3′ untranslated regions (UTR) [5] . The genome possesses a 7-methylguanine cap at the 5′ end and a poly(A) tail at the 3′ end and encodes three open reading frames (ORFs), i.e., ORF1, ORF2 and ORF3. ORF1 encodes the largest non-structural polyprotein having multifunctional domains, required for viral replication [6, 7] . The reading frame ORF2 codes for the capsid protein [8] . The ORF3 encodes the phosphorylated protein having multiple functions [9, 10] . HEV genotype 1 (GT 1) isolates have been recently identified with an additional reading frame (ORF4), which encodes ORF4 protein only during ER stress [11] . This newly identified ORF4 is exclusive to HEV GT 1 [11] . ORF4 has been demonstrated to play a significant functional role in the replication cycle of GT 1 HEV. Evidence suggests that ORF4 interacts with multiple viral and host proteins to enhance virus replication [11, 12] . The present study analyzed the compositional biasness in terms of nucleotide composition and synonymous codon usage patterns of the HEV ORF4 protein genes. The prevalence of degeneracy in the genetic code allows more than one codon to encode for a specific amino acid. Thus, alternative codons encoding the same amino acid are termed as synonymous codons. Interestingly, in viruses, the preference of some codons over the others has been well documented. This phenomenon refers to codon usage bias (CUB) [13, 14] . CUB is considered as an important force in the evolution of viral genomes. Factors influencing the CUB include mutational pressure, natural selection, G + C content, secondary protein structure and selective transcription replication [15] [16] [17] [18] . Previous reports have suggested that natural selection and directional mutation pressure are two major mechanisms that account for codon usage variation among viral genomes [15, [19] [20] [21] . However, mutational bias, rather than natural selection, found to be a dominant factor affecting the codon usage patterns in some RNA viruses [22] [23] [24] [25] . The development of a disease is caused by the complex interaction among various factors, which includes pathogen's virulence, host organism defense response and environmental aspects [26, 27] . These mentioned factors play role in addition to CUB decide the outcome of the host-pathogen interaction or relationship [28, 29] . The pathogens can better adapt to their hosts as well as its environment by allowing certain evolutionary changes which is reflected by their CUB patterns. Moreover, the efficiency of a pathogen to infect its host is significantly dependent on codon optimization process. This is because codon optimization affects the growth of a pathogen in its environment [28] . The similar codon usage pattern among virus and its hosts may overall influence the virus's fitness, evasion from host's immune system and evolution [30, 31] . Therefore, the study of codon usage in viruses can reveal important information about virus evolution, regulation of gene expression and protein synthesis. Irrespective of the ORF4 region's importance, its codon usage patterns have not been determined [32, 33] . In this regard, this investigation has been carried out to analyze the codon usage patterns of the HEV ORF4 protein genes. The codon usage analysis has been extensively carried out for protein genes of other reading frames of HEV, i.e., ORF1, ORF2 and ORF3 [34] . Baha and colleagues has evaluated the codon usage patterns of ORFs, but the codon compositional restrain in ORF4 has not been analyzed [34] . In this study, we performed comprehensive analysis of nucleotide composition and synonymous codon usage, based on available nucleotide sequences (on the NCBI GenBank) of the ORF4 protein genes, to determine the evolutionary factors that could play an important role in shaping the codon usage patterns. To the best of our knowledge, our comprehensive analysis for the first time provides insights into the codon usage patterns of ORF4 protein genes. This study will also shed lights on the distinguishing genetic features of HEV prevalent in the ORF4 sequences. Nucleotide sequences of the ORF4 protein genes were retrieved from GenBank database available at the National Centre for Biotechnology information (NCBI) (http:// www. ncbi. nlm. nih. gov). The retrieved sequences were selected based on the following inclusion criteria: (A) Selected sequences from same or different countries at varying time intervals were assembled in order to avoid repetition. (B) Sequences were included from different hosts encompassing human, rat and ferret. (C) Accumulated sequences from GenBank were categorized into different datasets. (D) Three datasets were prepared for each host organism (human, rat and ferret). (E) Multiple alignment was carried out for these datasets using ClustalW algorithm installed in the BioEdit Sequence Alignment Editor 7.2.5 [35] . The complete list of the sequences used for the present analysis in different host organisms are listed in additional files (Additional files 1-3: Tables S1-S3). The following nucleotide composition properties of the ORF4 sequences were calculated using Mega-X (Version 10.1.7): (1) occurrence of overall nucleotide frequencies (A%, C%, T/U% and G%); (2) occurrence of nucleotides at the third codon site (A3%, C3% U3% and G3%); and (3) occurrence of G + C content at different codon positions, i.e., first (GC1), second (GC2) and third synonymous codon positions (GC3). The five non-biased codons were omitted from the nucleotide composition analysis. It included three termination codons (UAG, UGA, UAA), i.e., as they do not code for any amino acid; and two codons AUG and UGG, as they code for particular amino acid Met and Trp, respectively, Therefore, these mentioned five codons do not exhibit any codon bias. The ratio between the observed and expected usage frequency of a codon is described as the Relative Synonymous Codon Usage (RSCU). RSCU value if all synonymous codons are used equally for any specific amino acid [36] . The RSCU index was determined as follows: where RSCU is the relative synonymous codon usage value and G ij is the observed number of the ith codon for the sjth amino acid that has an "ni" type of synonymous codon. Codons with RSCU values (> 1.6) and (< 0.6) were considered as "overrepresented" and "underrepresented" codons, respectively, whereas codons having the RSCU values (1) were regarded as not biased (average level codon) [37] . The mean RSCU values of the ORF4 protein genes were calculated using Mega-X (Version 10.1.7), in order to reveal the codon usage patterns without the effect of amino acid composition and sequence length. The correlation between A, T, G, C, GC and 3rd codon position of its counterparts (A3, T3, G3, C3, GC3) was assessed to analyze whether natural selection/mutation pressure individually contributed or both collaboratively influenced the evolution of ORF4 in HEV natural hosts. The nucleotide compositions of the ORF4 protein genes were calculated to analyze the effect imposed by compositional constraints on codon usage. The results of the nucleotide composition analysis are mentioned in Table 1 ( Fig. 1) . Human: The nucleotides C and G were found to be most abundant in these coding sequences, with an average of 35.597% and 27.966%, respectively, compared with U (21.341%) and A (15.094%). The most frequent nucleotide at the third position was G3S (39.245%), followed by C3S (31.194%), A3S (16.352%) and U3S (13.207%). Thus, synonymous codons at the third position followed the same trend (G3S > C3S > A3S > U3S). The overall GC content was higher than that of AU, with 63.563% observed, compared with 36.441%, respectively, which indicated a GC-biased composition. The overall GC content and GC% at different positions GC1, GC2 and GC3 were with an average of 63.563%, 52.829%, 67.421& and 70.439%, respectively (Additional file 1: Table S1 ) ( Table 1) . Rat: The nucleotides C and U were found to be most abundant in these coding sequences, with an average of 29.451% and 27.122%, respectively, compared with G (27.070%) and A (16.356%). The most frequent nucleotide at the third position was G3S (34.782%), followed by C3S (31.754%), U3S (17.003%) and A3S (16.459%). Thus, synonymous codons at the third position followed the trend (G3S > C3S > U3S > A3S). The overall GC content was higher than that of AU, with 56.498% observed, compared with 43.478%, respectively, which indicated a GCbiased composition. The overall GC content and GC% at different position GC1, GC2 and GC3 were with an average of 56.498%, 50.387%, 52.639% and 66.536%, respectively (Additional file 2: Table S2 ) ( Table 1) . Ferret: The nucleotides C and U were found to be most abundant in these coding sequences, with an average of 28.768% and 27.119%, respectively, compared with G (26.358%) and A (17.753%). The most frequent nucleotide at the third position was G3S (32.717%), followed by C3S (30.597%), U3S (20.706%) and A3S (15.978%), Thus, synonymous codons at the third position followed the trend (G3S > C3S > U3S > A3S). The overall GC content was higher than that of AU, with 55.126% observed, compared with 44.872%, respectively, which indicated a GC-biased composition. The overall GC content and GC% at different position GC1, GC2 and GC3 were with an average of 55.126%, 51.63%, 50.434% and 63.314%, respectively (Additional file 3: Table S3 ) (Table 1) . Therefore, initially it could be interpreted that nucleotide C was overrepresented, whereas the nucleotide A was underrepresented in HEV ORF4 protein genes. The nucleotides G and T (U) were distributed randomly. In addition to this, it was observed that the GC content (> 50%) was significantly higher than AU content (since AT content was < 50%) in ORF4 protein genes. RSCU measure was undertaken to evaluate the codon usage pattern of ORF4 protein gene sequences. The RSCU values were computed for every codon in each gene sequence to decrypt the extent to which C-ended codons were preferred. The results are mentioned in Table 2 (Fig. 2) . Human: Out of 18 preferred codons (UCC, UCA, UCG, AGU, AGC, CCU, CCC, CCA, CCG, ACC, ACG, GCU, GCC, GCG, CAG, UGC, GGC and GGG), 13 were C/G-ending (C-ending: 7; G-ending: 6) and 5 were U/A -ending (U-ending: 3; A-ending: 2) (Additional file 4: Table S4 ) ( Table 2 ). This indicated preference of C-and G-ended codons over U and A-ended codons in gene sequences. Among these preferred ones, 3 had RSCU value > 1.6, i.e., overrepresented codons (CAG, UGC and GGC), while the remaining 14 had RSCU values > 0.6 and < 1.6 (UCC, UCA, UCG, AGU, AGC, CCU, CCC, CCA, CCG, ACC, ACG, GCU, GCC, GCG and GGG). Presence of one underrepresented (RSCU < 0.6) synonymous codon was revealed (CCU). Rat: Out of 25 preferred codons (UUU, UUC, UUA, UUG, CUC, CUA, CUG, AUU, AUC, AUA, GUG, UCC, UCG, AGC, CCU, CCG, ACA, ACG, GCC, GCA, UGC, CGC, CGG, AGG and GGC), 17 preferred codons were C/G-ending (C-ending: 9; G-ending: 8) and 8 were U/Aending (A-ending: 5; U-ending: 3) (Additional file 5: Table S5 ) ( Table 2 ). This indicated preference of C-and G-ended codons over U-and A-ended codons in gene sequences. Among these preferred ones, 6 had RSCU value > 1.6, i.e., overrepresented codons (GUG, AGC, GCC, CGC, AGG and GGC), while the remaining 18 had RSCU values > 0. 6 Table S6 ) ( Table 2 ). This indicated preference of C-and G-ended codons over U and A-ended codons in gene sequences. Among these preferred ones, 7 had RSCU value > 1.6, i.e., overrepresented codons (UUG, UCU, GCC, CAC, CAG, CGC and AGG), while the remaining 15 had RSCU values > 0.6 and < 1.6 (UUU, UUC, CUA, CUG, AUU, AUC, AUA, GUC, UCA, AGC, CCU, CCC, UAC, GAG and CGG). Presence of an optional underrepresented (RSCU < 0.6) synonymous codon was not revealed. The overall/host-specific RSCU analysis revealed that C/G-ending codons were preferred over U/A-ending codons in the ORF4 coding sequences across all host organisms. The number of preferred codons in each host followed the order: 25 (rat) > 22 (ferret) > 18 (human). Thus, our results clearly suggested the common attributes and differences among the usage of preferred codons, i.e., in the case of overrepresented and underrepresented codons in each host. Thus, our RSCU findings clearly revealed both similarities and discrepancies in the codon usage patterns among HEV-hosts. A specific amino acid is encoded by more than one codon. It has been documented that the usage of synonymous codons is not random [38] . By exploiting RSCU values of the HEV-hosts, we computed the preferred codon frequency for each amino acid. The frequency was determined to analyze the influence of selection pressure from hosts on codon usage patterns of HEV. A list of preferred codons encoding amino acids with higher frequency as compared to other synonymous codons for HEV-hosts is mentioned in Table 3 . (Additional files 4-6: S4-S6 Tables). The observed 10 amino acids Iso (I), Ala (A), Glu (Q), Asn (N), Lys (K), Asp (D), Glu (E), Cys (C), Arg (R) and Gly (G) showed similar usage of preferred codons, i.e., AUU for Iso, GCC for Ala, CAG for Gln, AAC for Asn, AAG for Lys, GAU for Asp, GAG for Glu, UGC for Cys, CGC for Arg and GGC for Gly, among all three natural HEV-hosts, which implicated a phenomenon of "mutual codon preference". Therefore, the codons (AUU, GCC for Ala, CAG, AAC, AAG, GAU, GAG, CGC and GGC) indicated coincident codon usage portion, i.e., these mentioned preferred codons were commonly shared between all the natural HEV-hosts. In addition to this, within some preferred codons, discrepancies were observed between host organisms, i.e., preferred codons showed dissimilar usage among HEV-hosts For instance, HEVhosts (human, rat and ferret) shared different usage of preferred codon for Ser (UCG for human, AGC for rat and UCU for ferret). Moreover, this phenomenon was also observed in specific hosts, i.e., preferred codons encoding amino acid were different in specific host in comparison with other two host organisms, such as HEV-hosts (human and rat) shared evidence of preferred codon for UUC encoding Phe, except ferret, which preferred UUU over UUC; hosts human and ferret shared evidence of preferred codon for UUG encoding Leu, except rat, which preferred CUG over UUG; human and rat shared evidence of preferred codon for GUG encoding Val, except ferret, which preferred GUC over GUG; hosts human and ferret shared evidence of preferred codon for CCC encoding Pro, except rat, which preferred CCU over CCC; human and rat shared evidence of preferred codon for ACG encoding Thr, except ferret, which preferred ACC over ACG; rat and ferret shared evidence of preferred codon for UAC encoding Tyr, except human, which preferred UAU over UAC. Our results clearly indicated that codon usage patterns in ORF4 gene sequences showed a mixture of coincidence and antagonism among HEV-hosts. Moreover, the top most frequent used codons, least frequent used codons and unused codons also showed common attributes and differences in codon usage patterns among HEV-hosts as represented in Table 4 . These observations further emphasized occurrence of mutual codon preference and lack of shared codon preference among host-pathogens. It has been suggested that the frequencies of nucleotides A and U/T should be equal to that of C and G at the third position of the codon if mutational pressure affects the synonymous codon usage bias [17] . However, we observed huge variations in the nucleotide composition in the overall ORF4 gene sequences as observed in Table 1 . This indicated that other mechanisms including natural selection influenced the codon usage bias in HEV. Thus, these findings concluded that compositional constraints under mutational bias in combination with natural selection shaped up the codon usage patterns in ORF4 coding sequences across all hosts. As HEV exhibits enormously high genetic diversity in addition to lack of appropriate culture system for its propagation, these factors pose a major challenge in the improvement of treatment methods. HEV has been identified with multiple genotypes and subtypes via nucleotide sequence analysis [39, 40] . Characterizing genetic properties to figure out common regions and possible differences between genotypes is expected to assist and contribute to the process of a development of effective preventive measures against HEV infection. Our previous investigations have elucidated the ORF4 protein structure in different host organisms [41] in addition to its role as a probable drug target [42] . In this context, we conducted bioinformatics study of different ORF4 sequences of HEV by analyzing its codon usage patterns in different host organisms to provide insights into common attributes and differences among usage of amino acid in virus's structure. Using these findings, it is hoped that more efficient and precise approaches could be identified and selected for treatment protocols. The genetic code encompasses 64 codons, separated into 20 distinguishable groups. Each individual group consists of one to six codons and encodes the same amino acid. Thus, each standard amino acid is often encoded by alternative codons belonging to the same group. These alternative codons are termed as 'synonymous' codons. CUB is a phenomenon wherein one codon (over its synonymous partners) is preferred [15, 43] . CUB is considered as a distinctive property and appreciably differs among genes as well as genomes [36, 44] . Investigations have reported that codon usage patterns in organisms assist in the understanding of molecular organization of genomes. Due to improvement in sequencing technologies, CUB has gained more attention as codon usage patterns in several prokaryotic and eukaryotic have been studied [45] . As viruses are obligate parasites, they require a set of proteins and enzymes to colonize the host by counteracting the host's defense mechanism [46] . The establishment of an association between a host and viruses depends on translational accuracy [47] , which is largely affected by synonymous codon usage patterns [45, 48] . Mutational bias and natural selection are the two major forces that govern the overall codon usage variation in the genomes. It is well known that mutation pressure rather than translational selection is the primary determining factor of codon bias is in human RNA viruses [49] . On combining, these forces help us in decoding the selection of preferred codons that whether it has been influenced by mutational pressure or natural selection. Thus, in the presented study, we performed an orderly survey of the evolutionary pressures (i.e., mutational bias and natural selection) across the ORF4 to gain insights into its codon usgae patterns. The codon usage pattern of the reading frames (ORFs), such as ORF1, ORF2 and ORF3 protein genes have been elucidated [34] ; however, our understanding of codon patterns in ORF4 remains to be determined. This study is the first in its kind to describe the codon usage of patterns of ORF4 genome of HEV in three different host organisms (human, rat and ferret). Nucleotide composition constraints impose an effect on the codon usage patterns, and thus we performed the nucleotide composition analysis of the HEV ORF4 protein genes. The analysis revealed an overrepresentation of C nucleotide and underrepresentation of A nucleotide in the overall nucleotide composition. This is in agreement with the previous investigation carried out by Baha and colleagues in HEV isolates encompassing different genotypes and hosts [34] . The investigation revealed C as the most-represented nucleotide, while A as the least-represented nucleotide [34] . Similarly like previous observation, our nucleotide analysis also showed the random distribution of G and T (U) nucleotides [34] . Our analysis revealed that ORF4 genes were highly endowed with GC content which is again in agreement with the previous report which suggested that all the ORF coding sequences of HEV had overall high value of GC content (exceeding 50%) [34] . Our compositional characteristics revealed C/G-rich nucleotide pattern in humans, while hosts rat and ferret were observed with C/(T)U richness. These results further substantiate our findings as ORF1 and ORF3 showed C/G-rich genome, while ORF2 showed prevalence of C/T(U) nucleotides [34] . However, the observed pattern in ORF4 is different to the pattern observed in most of the RNA viruses (HIV, hepatitis C, rubella viruses), which revealed high prevalence of A rather than C [50] . This opposite nucleotide pattern biasness could be due to adaptation of a common ancestor of modern HEV strains to their host (in terms of nucleotide composition) during the process of evolution [51] . Our observed opposite patterns to majority of RNA viruses further show consistency with earlier report on other reading frames (ORF1, ORF2 and ORF3) [34] . Thus, it is interesting to mention that our findings from initial compositional analysis show consistency with the previous report on HEV ORFs codon usage patterns [34] . Next, we examined the role of selection forces in determining the codon usage patterns of ORF4 genes. In viruses, it has been suggested that their AU or GCrich composition show correlation with RSCU patterns, such as, AU or GC-rich genomes preferred codons ending with either A/U or G/C, respectively. This trend supports the influence of mutational pressure [49] . As ORF4 revealed that nucleotide compositional bias is in line with its RSCU patterns in the case of human, mutation pressure is found to be a major driving factor in shaping its codon usage pattern. However, in the case of hosts rat and ferret, despite these regions had higher percentage of C and U nucleotides, their RSCU pattern showed preference toward C-and G-ended codons, i.e., RSCU results were not consistent with the initial nucleotide composition. This suggested the involvement of other factors besides nucleotide composition in shaping the synonymous codon usage patterns in these two host organisms (rat and ferret). In context with this, we observed huge variations in the nucleotide composition in the overall ORF4 gene sequences, which indicated that other mechanisms including natural selection influenced the codon usage bias in HEV. Thus, it could be interpreted that both mutation and natural selection forces shaped the codon usage patterns of ORF4 coding sequences. Our findings show consistency with the previous codon usage analyses carried out in HEV that demonstrated the predominance of mutation pressure [51] and natural selection, respectively [34] . Then, we next analyzed the relationship between codon usage patterns of ORF4 in its natural hosts. The common attributes and differences among HEVhosts were scrutinized by computing the frequency of amino acids using their RSCU values. The number of preferred codons varied among different natural hosts and maximum usage was found to be in rat and least in human. Additionally, it was revealed that the number of overrepresented and underrepresented codons in each host organism also varied. Thus, a noteworthy variation in the usage for preferred codons among HEVhosts implied that the codon usage patterns in ORF4 in different host organisms were subjected to different selection pressures. Furthermore, we observed that the frequency of the most used and least used codons also showed similarities and differences between hosts. Thus, it was revealed that HEV ORF4 showed a mixture of two codon usage patterns: coincidence and antagonism. This is similar to previous studies carried out in other viruses, such as HCV [52] and enterovirus [53] . A recent investigation on HEV has also shown both similarities and discrepancies in the ORF1 Y-domain region codon usage patterns which further substantiate our present findings [54] . It has been proposed that codon usage similar portions assist in effective translation of the corresponding amino acids between viruses and their respective hosts [55, 56] , whereas the antagonistic portions of codon usage encourage in correct folding of viral proteins, even though decrease in the corresponding amino acids translation efficiency is observed [57] [58] [59] . On summing up these criteria, our findings revealed that none of the hosts showed complete resemblance or complete discrepancy to the other HEV-host. The findings from such bioinformatics codon usage studies can be validated using experiments and further could be utilized for clinical trials to envisage our understanding of HEV biology. Such type of investigations on other viruses can shed some new lights in its behavioral biology. The presented study documents the codon usage analysis in HEV ORF4 for the first time. This novel bioinformatics approach is expected to strengthen our understanding on the common attributes and differences in the codon usage patterns among ORF4 protein genes. The nucleotide compositional analysis showed overrepresentation of C nucleotide while revealed A as the leastrepresented nucleotide. The synonymous codon usage analysis revealed that the preferred codons mostly ended with C and G nucleotides. Moreover, it was observed that codon usage pattern among HEV-hosts was a mixture of coincidence and antagonism. The study reveals that synonymous codon usage in ORF4 is an evolutionary process, perhaps reflecting a dynamic process of mutation and selection forces to adjust its codon usage to different hosts and conditions. Investigation of the codon usage patterns is essential for evolution and efficient expression of viral proteins so that they generate efficient immune response. Such strategies of codon optimization for preferred codon usage are very useful in vaccine development. The presented study here is anticipated to increase our knowledge regarding the mechanisms influencing codon usage and evolution of ORF4. Hepatitis E pathogenesis Viruses Hepatitis E virus infection Occupational exposure to hepatitis E virus (HEV) in swine workers Prevalence of hepatitis E virus antibodies in workers occupationally exposed to swine in Portugal Hepatitis E virus (HEV): molecular cloning and sequencing of the full-length viral genome Cloning, sequencing, and expression of the hepatitis E virus (HEV) nonstructural open reading frame 1 (ORF1) Molecular characterization of hepatitis E virus ORF1 gene supports apapain-like cysteine protease (PCP)-domain activity Structure of hepatitis E viral particle Hepatitis E virus ORF3 is a functional ion channel required for release of infectious particles The ORF3 protein of genotype 1 hepatitis E virus suppresses TLR3-induced NF-κB signaling via TRADD and RIP1 Endoplasmic reticulum stress induced synthesis of a novel viral factor mediates efficient replication of genotype-1 hepa-titis E virus hepatitis E virus Codon catalog usage and the genome hypothesis Variation in G+C-content and codon choice: differences among synonymous codon groups in vertebrate genes Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons Expression pattern and surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis Correlation between codon usage and thermostability Analysis of codon usage in bovine viral diarrhea virus Genomewide analysis of codon usage bias in four sequenced cotton species Codon usage bias and the evolution of influenza A viruses. Codon usage biases of influenza virus A comparison of synonymous codon usage bias patterns in DNA and RNA virus genomes: quantifying the relative importance of mutational pressure and natural selection Selective pressure dominates the synonymous codon usage in parvoviridae Analysis of synonymous codon usage patterns in torque teno sus virus 1 (TTSuV1) Synonymous Codon Usage in TTSuV2: analysis and Comparison with TTSuV1 Plant-microbe interactions facing environmental challenge Microbial invasions in terrestrial ecosystems Codon usage bias analysis of Citrus tristeza virus: higher codon adaptation to citrus reticulata host Von Der Haar T (2020) Hidden patterns of codon usage bias across kingdoms A detailed comparative analysis on the overall codon usage patterns in West Nile virus Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses Physicochemical attributes of hepatitis E virus ORF4: a general perspective Understanding Hepatitis E viruses by exploring the structural and functional properties of ORF4 Comprehensive analysis of genetic and evolutionary features of the hepatitis E virus BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT Codon usage pattern of genes involved in central nervous system Compositional properties and codon usage pattern of mitochondrial ATP gene in diferent classes of Arthropoda Codon usage and tRNA content in unicellular and multicellular organisms Genetic variability and evolution of hepatitis E virus Genetic variability of HEV isolates: inconsistencies of current classification Role of ORF4 in Hepatitis E virus regulation: analysis of intrinsically disordered regions Sequence to structure analysis of the ORF4 protein from Hepatitis E virus Codon usage is an important determinant of gene expression levels largely through its effects on transcription Codon usage diferences among genes expressed in diferent tissues of Drosophila melanogaster Comprehensive profling of codon usage signatures and codon context variations in the genus Ustilago Codon optimization underpins generalist parasitism in fungi Mutational drift prevails over translational efficiency in Frankianif operons Codon usage pattern and predicted gene expression in Arabidopsis thaliana The extent of codon usage bias in human RNA viruses and its evolutionary origin Composition bias and genome polarity of RNA viruses Genetic characterization and codon usage bias of full-length hepatitis E virus sequences shed new lights on genotypic distribution, host restriction and genome evolution The characteristic of codon usage pattern and its evolution of hepatitis C virus The characteristics of the synonymous codon usage in enterovirus 71 virus and the effects of host on the virus in codon usage pattern Decoding the codon usage patterns in Y-domain region of Hepatitis E viruses The frequency of translational misreading errors in E. coli is largely determined by tRNA competition The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications Synonymous codons direct cotranslational folding toward different protein conformations Evidence of evolutionary selection for cotranslational folding Genetic code optimization for cotranslational protein folding: codon directional asymmetry correlates with antiparallel betasheets, tRNA synthetase classes Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations HEV: Hepatitis E virus; ORF4: Open reading frame 4; RSCU: Relative synonymous codon usage. The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s43088-022-00244-w.Additional file 1: Table S1 . Nucleotide composition analysis of HEV host Human in ORF4 coding sequences.Additional file 2: Table S2 . Nucleotide composition analysis of HEV host Rat in ORF4 coding sequences.Additional file 3: Table S3 . Nucleotide composition analysis of HEV host Ferret in ORF4 coding sequences. Table S4 . RSCU values of the HEV host Human in ORF4 coding sequences. Table S5 . RSCU values of the host Rat in ORF4 coding sequences. Table S6 . RSCU values of the HEV host Ferret in ORF4 coding sequences. Author contributions SP conceptualized the research. SP and ZS designed the manuscript. ZS was a major contributor in writing the manuscript and performed the biocomputational analysis of the protein. KP and AA proofread the manuscript. All the authors read and approved the final manuscript. Not applicable. Not applicable. Ethics approval and consent to participate Not applicable. Not applicable. The authors declare that they have no competing interests.