key: cord-0687442-w46gv23d authors: Chen, Ye; Xu, Quanming; Tan, Chen; Li, Xinxin; Chi, Xiaojuan; Cai, Binxiang; Yu, Ziding; Ma, Yanmei; Chen, Ji-Long title: Genomic analysis of codon usage shows influence of mutation pressure, natural selection, and host features on Senecavirus A evolution date: 2017-11-30 journal: Microbial Pathogenesis DOI: 10.1016/j.micpath.2017.09.040 sha: e8d10b88b19b3c3868df62947e2a71aa6378aee0 doc_id: 687442 cord_uid: w46gv23d Abstract Senecavirus A (SVA) infection was recently confirmed in pigs in Brazil, United States of America and Canada. To better understand the molecular characteristics of isolated SVA genomes, we first reported genome-wide comprehensive analyses of codon usage and various factors that have contribute to the molecular evolution in SVA. The effective number of codons (ENC) ranged from 54.51 to 55.54 with an average of 54.87 ± 0.285, which reveals a relatively stable nucleotide composition. We found that codon usage bias of the SVA was low. Mutational pressure acted as an increasingly dominant factor for the evolution of the virus compared with the natural selection. Notably, codon usage bias was also affected by the geographic distribution and isolated time. The first systemic analysis on the codon usage bias of the SVA provides important information for the understanding of the evolution of the SVA and has fundamental and theoretical benefits. Senecavirus A (SVA), commonly known as Seneca Valley virus (SVV), is the only member of the genus Senecavirus within the family Picornaviridae. SVA is a single-stranded, positive-sense, non-enveloped RNA virus with an approximately 7.2 kb genome [1] . This virus was first discovered as a contaminant in 2001 (and named Seneca Valley virus 001 [SVV-001]) while cultivating viral vectors in the PER.C6 cell line [2, 3] . While, SVA was isolated in previous cases collected from various pig farms in the USA from 1988 to 2001 [4] . One polyprotein of SVA is post-translationally processed by virus-encoded proteases into 4 structural (VP1-4) and 7 non-structural (2A-2C, 3A-3D) proteins [2, 5] , among which VP1 is considered to be the most immunogenic protein in the family Picornaviridae [6, 7] . The main clinical symptoms of animals infected with SVA were vesicles on coronary bands or the snouts, sometimes exhibited acute lameness, anorexia, lethargy, and transient fever. Infected breeding herds had an increase of neonatal morbidity and mortality ranging from 30% to 70%, mainly piglets less than 7-day-old [8, 9] . The clinical symptoms of SVA resembles foot-and-mouth disease, swine vesicular disease, vesicular exanthema of swine, and vesicular stomatitis, which are four vesicular diseases [10] . SVA was also found in lesions in pigs suffering from porcine idiopathic vesicular disease in Canada and USA in 2008 and 2012, respectively [11] . In 2014 and 2015, SVA infection was associated with outbreaks of vesicular disease in sows as well as neonatal pig mortality in Brazil and USA [12] . In China, SVA (SVV CH-01-2015) was also first isolated in a pig farm in 2015 [13] . Except for methionine and tryptophan, other amino acids can be coded by more than one codon due to redundancy in the genetic code, also known as synonymous codon usage. However, the usage of various codons to code amino acids is not random and some are used more often, which is known as codon usage bias [14] . Codon usage bias has been reported for some RNA viruses, but the rate can vary depending on the identity of the virus. For instance, rubella and rotavirus have strong codon usage bias, whereas porcine circovirus type 2 (PCV2) and porcine epidemic diarrhea virus (PEDV) have weak codon usage bias [15, 16] . Natural selection, gene length, mutation pressure, abundance of tRNAs and RNA structure all affect codon usage bias. The relation of codon usage among viruses and their hosts is expected to affect viral survival, fitness, evasion from the host immune system and evolution [17] . Considering the recent increase in the worldwide prevalence of SVA and its potential risk for the pig industry, in this study, we first report genome-wide comprehensive analyses of codon usage and various factors that have contributed to the molecular evolution of SVA. In this study, 23 complete genome and complete coding sequences of SVA isolates were retrieved from the National Center for Biotechnology (NCBI) GenBank database (http://www.ncbi.nlm. nih.gov). To maintain the statistical significance of codon usage bias, artificial sequences were not included. Detailed information of the 23 strains, including the accession number, the location and date of isolation were listed in the supplemental materials (Table S1 ). The data set comprised of 14 complete genome sequences and 9 complete coding sequences. The nucleotide content (A%, U%, G%, C%) of each SVA strain was calculated using BioEdit (version 7.0.9.0) software. Each nucleotide at the third position of the synonymous codons (A3%, U3%, G3%, C3%) was analyzed using Codon W program online (http://mobyle. pasteur.fr/cgi-bin/portal.py?#forms::CodonW). The G þ C at the first (GC1s), second (GC2s) and third codon positions (GC3s) of each isolates were calculated using the cusp program online (http:// emboss.toulouse.inra.fr/cgi-bin/emboss/cusp). The relative synonymous codon usage (RSCU), proposed in 1986 (Sharp and Li, 1986 ) is widely used to evaluate the codon usage bias between genes or sets of genes that differ in their size and amino acid composition [15] . The RSCU values are not confounded by amino acid composition, with the values are the ratio of its observed number to its standard number on all codons for a particular amino acid are used randomly [18] . It was calculated using the following equation: where g ij is the observed number of codons for the amino acid, which has ni kinds of synonymous codons. A higher RSCU value indicates a stronger codon usage bias. It's considered that codon is used equally when RSCU value is 1.0; if RSCU is more than 1.0, the codon usage bias is positive; if it is less than 1.0, the codon usage bias is considered to be negative. In addition, codons with RSCU values ! 1.6 are over-represented, and codons with RSCU values 0.6 are under-represented [19] . EMBOSS: cusp was used for the RSCU analysis. (http://emboss.toulouse.inra.fr/cgi-bin/ emboss/cusp). To quantify the magnitude of the codon usage bias of each gene, the ENC value of each strain was calculated. The ENC is the best estimator of absolute synonymous codon usage bias [20] and was calculated using the following formula: where F ði¼2;3;4;6Þ is the mean of F i for the i-fold degenerate amino acids. The F i values were calculated using the formula below: where n is the total number of frequencies of the codons for that amino acid and n j is the total number of frequencies of the codon for that amino acid. In contrast to the RSCU, a lower ENC value indicates a higher codon usage bias. Normally, the ENC values range from 20 to 61 [21] .If only one of the possible synonymous codons is used for the corresponding amino acid, the ENC is 20. While there is no codon usage bias, the ENC value is 61 [21] . Therefore, if the ENC is equal to or less than 35, the codon usage bias is considered extremely strong [20, 21] . To determine the major factors that affect the codon usage bias, an ENC-plot was generated in which the ENC was plotted against the GC3s. When the codon usage is only constrained by the GC3s, the observed ENC is on or close to the null model (standard curve). If other factors such as natural selection also play role in the codon usage pattern, the observed values are far below the standard curve [22] . The expected ENC was calculated using the equation: where s is the frequency of G þ C at the third codon position of synonymous codons. To improve the understanding of the influence of natural selection on shaping the codon usage bias, the Gravy and Aroma scores were determined in this study. These indices were obtained from CodonW 1.4.4 (http://codonW.sourceforge.net//), which can signify the frequencies of hydrophobic and aromatic amino acids. Therefore, the variation of the two indices indicates the amino acid usage. A higher Gravy or Aroma value suggests a more hydrophobic or aromatic amino acid product, respectively. Principal component analysis (PCA), a multivariate statistical approach in codon usage analysis, was widely used to analyze the major trends in codon usage patterns among different SVA strains [23] . In the PCA, a 59-dimensional vector corresponds to the RSCU of each strain, excluding the codons of AUG, UGG and termination codons, which transform RSCU values into uncorrelated variables. The PCA combined with the correlation analysis effectively demonstrated the factors influencing the codon usage bias. To investigate the varying role of mutational pressure and natural selection in shaping the codon usage bias of porcine SVA, a neutrality plot was drawn using GC12s as the ordinate and GC3s as the abscissa [24] . In the neutrality plot, each dot represents an independent SVA isolates. In general, if the linear regression has a slope of 1, the effect is completely neutral, whereas a slope of 0 is indicative of complete selective constraints [24] . In addition, the GC12s or the GC3s values are plotted against the date of isolation to estimate the evolutionary pattern of the porcine SVA. The slope of a simple regression line explains the evolutionary speed of the natural selection pressure and the mutational pressure, respectively. Using the statistical software SPSS (version 22.0) and GraphPad Prism6.0 with one-way analysis of variance (ANOVA) methods, a correlation analysis was performed. The figures related to this analysis were drawn by GraphPad Prism 6.0. In the present study, 23 sequences of porcine SVA, including 14 complete genome sequences and 9 complete coding sequences, were analyzed. The composition properties of the SVA strains are shown in Table 1 . The results reveal that the mean C% was 28.59%, and the mean G% was 23.07%, which were the highest and lowest, respectively. The mean GC% composition was 51.66%. Thus, it appears that there may be more usage bias of C nucleotides among SVA codons. To gain further insight into the influence of nucleotide content in shaping the SVAs codon usage pattern, the nucleotide composition at the third codon position (A3, U3, G3, C3, GC3) were calculated and listed in Table 2 , which suggests that the mean C3% (41.06%) is the highest. The GC3 fluctuated from 55.41% to 57.15% with a mean of 56.6%, indicating that G/C-ended codons may be preferred over A/U-ended codons in SVAs sequences. The RSCU of the overall 59 sense codons were calculated to explore the extent of the preferred G/C-ended codons, and are listed in Table 3 . The results showed that the most frequently used codon is GAC (Asp, 1799 times), and the least frequently used codon is AGG (Arg, 186 times). Among the hydrophobic amino acids, the most frequently used codon is GAC (Asp, 1799 times), and the least frequently used codon is AGG (Arg, 186 times). Among the hydrophilic amino acids, the most frequently used codon is GCC (Ala, 1627 times), and the least frequently used codon is AUA (IIe, 186 times). Among the 18 most abundantly used codons, 14 codons were G/C-ended, and 4 codons were A/U-ended, which evidently confirmed that there exists high codon usage bias in SVA sequences. Furthermore, it is quite interesting to note that mostly overrepresented and under-represented codons are C-ended and A/Uended, respectively, which also reveals that the preferred codons are influenced by compositional constraints. Additionally, the ENC was also used to estimate the degree of codon usage bias of the SVA strains [20] . The value of ENC ranged from 54.51 to 55.54 with an average of 54.87 ± 0.285, which reveals a relatively stable nucleotide composition. The higher ENC of all 23 SVA strains indicate a slightly lower (ENC>40) codon usage bias. ENC-plot. To further investigate the pattern of the synonymous codon usage, a plot of the ENC against the GC3s was generated (Fig. 1) . The results showed that all the points were clustered together and had few changes among them, indicating that the variation of the ENC was slight. It was also in accordance with the small ENC SD (0.285). Moreover, the ENC are all under the standard curve, which implies that the mutational pressure combined with other factors contributed to the codon usage bias in SVA [24] . The correlation analysis between the nucleotide contents (A%, U %, G%, C%, GC%) and the codon compositions (A3s, U3s, C3s, G3s, GC3s) showed that each had a significant correlation with the others. Moreover, the ENC correlated with the nucleotide contents with p values lower than 0.01, indicating that mutational pressure impacts the codon usage pattern of porcine SVAs. To investigate the codon usage pattern trends of porcine SVA, the PCA was performed, which significantly reveals the corresponding distribution of synonymous codons [15] . The PCA results displayed that the first principal axis accounted for 48.71% of all variations, which had a major impact on codon usage bias [25] . The second, the third and the fourth axes accounted for 17.6%, 10.79%, 7.64% of all variations, respectively. The first two axes are considered to be a major role in the variation of the RSCU. Therefore, a plot of the 1st and 2nd axes of the isolated strains according to the date and country of isolation were drawn (Fig. 2) . It was showed that SVAs isolated from different countries were dispersed and that the strains isolated from the same countries were clustered together, except for the USA strains isolated from 2008, indicating that location could dramatically influence the codon usage bias. The correlation analysis between the codon compositions and the 1st and the 2nd axes signified the codon compositions were significantly correlated with the 1st axis (Table 4) , thus confirming the mutational pressure contributed to the codon usage bias of the SVAs. The correlation analysis was employed to estimate the relationship between the codon usage bias and the Gravy and Aroma (Table 4) , which demonstrated the influence of the natural selection. The results revealed that the Gravy and Aroma scores had no correlation with the codon compositions; only the Aroma score was slightly correlated with the ENC, indicating that natural selection plays role in shaping the codon usage bias of the SVAs. The preferentially used codons and RSCU values for the SVA are in bold. The neutrality plot relationship between the GC12s and the GC3s (Fig. 3A ) was employed to investigate the mutation-selection equilibrium that shapes codon usage in SVAs. The analysis showed that the GC3s was positively correlated with the GC12s (r ¼ 0.2239, p ¼ 0.3045). The slope of the neutrality plot was 0.03608, meaning that mutation pressure accounted for 3.608% of shaping the codon usage pattern and the influence from other factors was 96.392% [26] . Additionally, the mean GC12s and the GC3s were plotted against the isolation date (Fig. 3B) . The results showed that both of the GC3s and the GC12s were positively correlated with the date (r ¼ 0.8662, r ¼ 0.7493), with slopes of 0.001474 ± 6.01E-04 and 0.0003158 ± 1.97E-04, respectively. This suggests that the speed of change of the GC3s is far greater than that of the GC12s, indicating that mutational pressure is increasing in shaping the SVA codon usage bias compared with the natural selection [15] . In this study, the possible role of geographic distribution and time of isolated in shaping the codon usage bias of the SVAs were investigated. The average ENC of the different isolation locations and dates are shown in Fig. 4A and B, respectively, and the differences in ENC by isolation location and date were analyzed by oneway ANOVA, which showed that the mean ENC had a significant correlation with the location and date of isolation (p < 0.01). Therefore, it demonstrated that the geographic distribution and the isolation date contributed to the codon usage pattern of SVAs. Compared with DNA virus, it's discovered that RNA virus, such as influenza virus, coronaviruses, exists a highly evolution rate [27e30] . Previous studies revealed that the prototype SVA isolate, namely SVV-001, was isolated in 2001 [31] , and due to its selective tropism for human tumor cells, the virus was developed as an oncolytic agent [32] .However, the basic viral pathogenesis was not detected accurately. In this study, the synonymous codon usage of the SVA was analyzed to better understand the SVA evolution, especially the interplay between the viruses and the immune response [33] . In the analysis, the ENC and RSCU were calculated. The ENC ranged from 54.51 to 55.54 with a mean ENC 54.87, which indicates a low codon usage bias. This result is in accordance with other Picornaviridae viruses, such as poliovirus type A9 (ENC ¼ 54.2), foot-and-mouth disease virus (mean ENC ¼ 51.53), rhinovirus type 89 (ENC ¼ 45.9), and enterovirue 71 (ENC ¼ 56.6) [34] . The results of overall RSCU of all the codons which revealed that most of the preferentially used codons are G/C-ended codons, combined with the mean GC3s (56.6%) suggested that (G þ C) compositions constrain the codon usage bias of the SAV. This discovery is in agreement with the previous reports that G þ C contents are the dominant factors influencing the codon usage in virus genomes [35, 36] . Although the codon usage bias of SVA is low, the factors exist in it were analyzed. Normally, mutational pressure and natural selection are considered to be the main factors influencing the codon usage variation among genes. In this study, as shown in Fig. 1 , all the strains are clustered together, and the observed ENC are just below the standard curve, revealing that mutational pressure affects the codon usage bias. Additionally, the PCA suggested that the different isolated strains were diverged from each other and did not locate on or around the origin of coordinate, which also explained that there exists mutational pressure in shaping the codon usage pattern. Furthermore, the results were confirmed by the correlation analysis between the nucleotide content (A%, TU%, G%, C%, GC%) and the codon compositions (A3s, U3s, C3s, G3s, GC3s), which each other had significant differences and existed positively correlation. For natural selection, only the Aroma score was slightly correlated with the ENC, and other indicies had no correlation, demonstrated that natural selection plays a role in the codon usage bias of SVS. Actually, the mutational pressure is considered as the major factor in shaping the codon usage variation compared with natural selection [37] . Thus, to determine which was responsible for the extreme codon usage bias in SVA, the GC12s and GC3s were plotted against the evolution time in an evolutional neutrality analysis, and the results revealed that the mutational pressure plays an increasing role in the SVA evolution compared with natural selection. In addition, the neutrality plot showed that the relationship between GC3s and GC12s was not significant, and the correlation coefficient indicated that compared with mutation pressure, natural selection is more important in codon usage bias of SVA. As shown in Fig. 2 , the 23 strains isolated from different location or dates were separated, suggesting that the geographic distribution and date of isolation may contribute to the codon usage pattern of SVA. To confirm this hypothesis, the relationship between the date or location of isolation and the ENC were analyzed, respectively. The significant difference among ENC value and isolated country and year indicated that the geographical distribution and isolated year are essential to some extent. In general, this study first identified a low codon usage bias in the SVA codon usage pattern and that the mutational pressure is an increasingly dominant factor in the evolution of this virus compared with natural selection. Moreover, codon usage bias was affected by the geographic distribution and date of isolation. The systemic analysis of the codon usage bias of the SVA is beneficial to understand the evolution of SVA, which provides scientific proof for the directorial department to make preventive measures against SVA. Not applicable. Not applicable. The datasets and materials used and/or analyzed during the current study available from the Genebank by the corresponding author on reasonable request. Clinical manifestations of Senecavirus A infection in neonatal pigs Complete genome sequence analysis of Seneca Valley virus-001, a novel oncolytic picornavirus EPIDEMIOLOGY of Seneca Valley virus (SVV-001), a novel oncolytic picornavirus for the systemic treatment of patients with solid cancers with neuroendocrine features Epidemiology of Seneca Valley Virus: Identification and Characterization of Isolates from Pigs in the United States Structure of Seneca Valley Virus-001: an oncolytic picornavirus representing a new genus Immune and antibody responses to an isolated capsid protein of foot-and-mouth disease virus The Picornaviruses Identification and complete genome of Seneca Valley virus in vesicular fluid and sera of pigs affected with idiopathic vesicular disease, Brazil Senecavirus A: an emerging vesicular infection in brazilian pig herds Yoon Identification of novel Senecavirus A from pigs with vesicular disease in the US Senecavirus A: an emerging pathogen causing vesicular disease and mortality in pigs? Veterinary pathol Neonatal mortality, vesicular lesions and lameness associated with Senecavirus A in a U.S. Sow farm The first identification and complete genome of Senecavirus A affecting pig with idiopathic vesicular disease in China Variation in G þ C-content and codon choice: differences among synonymous codon groups in vertebrate genes Characterization of the porcine epidemic diarrhea virus codon usage bias First analysis of synonymous codon usage in porcine circovirus A detailed comparative analysis of codon usage bias in Zika virus Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons The effect of multiple evolutionary selections on synonymous codon usage of genes in the Mycoplasma bovis genome An evaluation of measures of synonymous codon usage bias The 'effective number of codons' used in a gene Revelation of influencing factors in overall codon usage bias of equine influenza viruses Gene expressivity is the main factor in dictating the codon usage variation among the genes in Pseudomonas aeruginosa Directional mutation pressure and neutral molecular evolution The strength of translational selection for codon usage varies in the three replicons of Sinorhizobium meliloti Genome-wide analysis of codon usage bias in epichloe festucae The epidemiology, evolution and recent outbreaks of avian influenza viruses in China: a review Epidemiology, genetic recombination, and pathogenesis of coronaviruses Evolutionary and genetic analysis of the VP2 gene of canine parvovirus Epidemiology, evolution, and pathogenesis of H7N9 influenza viruses in five epidemic waves since 2013 in China Real-time reverse transcription PCR assay for detection of Senecavirus A in swine vesicular diagnostic specimens Construction and characterization of a fulllength cDNA infectious clone of emerging porcine Senecavirus A Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses Analysis of synonymous codon usage in hepatitis A virus Analysis of synonymous codon usage in 11 human bocavirus isolates Synonymous codon usage in adenoviruses: influence of mutation, selection and protein hydropathy The extent of codon usage bias in human RNA viruses and its evolutionary origin