key: cord-1000720-ryt81ky7
authors: Mateu, E.; Díaz, I.; Darwich, L.; Casal, J.; Martín, M.; Pujols, J.
title: Evolution of ORF5 of Spanish porcine reproductive and respiratory syndrome virus strains from 1991 to 2005
date: 2006-02-28
journal: Virus Research
DOI: 10.1016/j.virusres.2005.09.008
sha: 95bf976733f9c57f311d139e310c5c223bbc4d43
doc_id: 1000720
cord_uid: ryt81ky7

Abstract ORF5 sequences of porcine reproductive and respiratory syndrome virus (PRRSV) were analysed to determine genetic diversity, codon usage, positive and negative selection sites and potential changes in the predicted glycoprotein 5 (GP5). A hypothetical GP5 containing all selected sites was constructed to determine its characteristics. These sequences corresponded to isolates obtained 10 years apart (1991–1995, 18 strains) and a second set (n =46) from 2000 to 2005. Similarity to Lelystad virus (LV) decreased from 95.5% in 1991–1995 to 89.5% in 2000–2005. Three highly variable regions were found in ORF5. Codon usage was different in both sets for leucine, glutamine, serine and proline. Thus, 2000–2005 sequences used codons more similar to those present in highly expressed pig genes compared to the 1991–1995 set. Twenty four sites of positive selection and 20 sites of negative selection were found in GP5, most of them in transmembrane regions. Additional glycosylation in N37 of GP5 was common in 2000–2005 but some sequences lack a glycosylation site in N46. The hypothetical GP5 was only 88.1% similar to LV and was less hydrophobic. Taking together these results suggest that PRRSV is still adapting to pig cells.

(PRRSV) belongs to the genus Arterivirus. Two different genotypes are currently known, the European one and the American one. Genetic similarity between both genotypes ranges from 50 to 70% depending on the open reading frame (ORF) examined (Meng et al., 1994) . PRRSV genome is composed of nine ORFs of which ORF1a and 1b encode the viral polymerase, ORFs 2-6 encode envelope and membrane proteins (called GP2a, GP2b, GP3, GP4, GP5 and M, respectively) and ORF7 encodes the non-glycosylated viral nucleocapsid protein (Snijder and Meulenberg, 1998; Wu et al., 2005) . Among the envelope proteins, GP5 seems to be one of the key viral structures. It is thought that attachment and entry to the target cells is mediated by GP5 or GP5-M heterodimers (Snijder et al., 2003) . In addition, the neutralisation epitope of PRRSV is located in the middle of the GP5 ectodomain (Gonin et al., 1999; Ostrowski et al., 2002; Plagemann, 2004a,b) .

From early studies, it became evident that ORF5 was one of the most variable regions in the PRRSV genome although other parts also show a considerable degree of variability as occurs in ORF3 and in non-structural protein 2 (nsp2) (Fang et al., 2004; Oleksiewicz et al., 2000) . ORF5 heterogeneity was initially reported for American strains but nowadays is evident that European isolates are diverse as well (Forsberg et al., 2002; Mateu et al., 2003; Pesch et al., 2005; Pirzadeh et al., 1998) . Genetic variability observed in ORF5 is consistent with the well known fact that RNA-polymerases of RNA viruses have a relatively poor fidelity (Castro et al., 2005) and with the notion that selective pressures act favouring viral variants better fitted for spread and persistence in the target hosts. As GP5 is exposed in the viral envelope, participates in viral attachment to cells and contains a neutralisation epitope, it is a potential target for these selective pressures. Interestingly, the adaptative sites in GP5 seem not to be restricted to the known B-epitopes but also to other regions (Hanada et al., 2005) .

In the present study, a large set of ORF5 sequences from Spanish PRRSV strains obtained 10 years apart were analysed 0168 to determine the changes in ORF5 and GP5 and to figure out the potential impact of those changes.

ORF5 sequences of Spanish PRRSV strains were randomly selected from isolates available in our laboratory obtained from sera of pigs of Spanish epidemiologically unrelated commercial farms or were retrieved from Spanish sequences deposited at Genbank. For PRRSV isolates, viral RNA was extracted, amplified by PCR and sequenced as described before (Mateu et al., 2003) . The final set of sequences included the Lelystad virus (LV) (Genbank accession number M96262), the first Spanish isolate from 1991 (Genbank accession number X92942), a second Spanish isolate from 1991 (strain CReSA-VP21, GenBank accession number DQ009647), 15 isolates from 1991 to 1995 (Suárez et al., 1996) and 46 unrelated isolates retrieved in our laboratory from 2000 to 2005 (Genbank accession numbers AF495499-AF495502, AF495504-AF5521, DQ009625-DQ009646). For comparative purposes, one set of 16 non Spanish European-type ORF5 sequences representing strains isolated in the period 1991-1995 in several countries of Europe were also analysed. This set comprised sequences from Belgium (n = 2), Denmark (n =2), France (n = 2), Germany (n = 2), The Netherlands (n = 3), Poland (n = 2) and United Kingdom (n = 1) (GenBank accession numbers: AY035900-AY035903, AY035918, AY035919, AY035921, AY035922, AY035926, AY035927, AF378799, U40696, U40696 and M96262).

ORF5 sequences were initially aligned using ClustalW (Thompson et al., 1994) and a similarity matrix was constructed. Alignments were retrieved using Bio-edit software v.7.0.5 (available at http://www.mb.mahidol.ac.th/Downloads/Mol-Bio/Bioedit/Bioedit.htm) and entropy plots were constructed to determine genetic variability in ORF5. Entropy was calculated as H(l) = − f (b, l) × ln(f (b, l)) where entropy H(l) is equal to the summatory of f(b,l), namely the frequency at which a given residue (b) is found at a given position (l), multiplied by the neperian logarithm of f (b,l) . With this formula, the higher the value of H(l), the higher the variability at a given position. Lelystad virus was used as a reference for alignments.

Codon usage for each set of sequences was analysed using GCUA software v.1.0 (available at http://bioinf.may.ie). In a first step, relative codon usage was calculated for each set of sequences by means of the synonymous codon usage measures (RSCU) and taking into account the effective number of codons (ENC) in the gene. Then, a correspondence analysis (CA) was done in order to determine trends in the variation of codon usage. A linear regression analysis was used to evaluate correlation between codon usage bias and nucleotide composition. p-Values lower than 0.05 were considered to be significant. To ascertain the possible significance of changes in codon usage, 10 sequences of genes highly expressed in pigs were also analysed. Genes included were: creatin kinase (Genbank accession number AY754869); interferon-beta (NM 001003923), pyruvate dehydrogenase (X52990), myosin heavy chain (NM 214136); haptoglobin (AF492467), hemoglobin epsilon (NM 214447); plasminogen activator (AF364605), albumin (NM 00100528), alpha amylase (AF064742) and interleukin-1 beta (M86725).

In a subsequent step, aligned sequences were examined to determine the codons corresponding to each aminoacid in the GP5 protein. For each nucleotide position the rate of mutation (percentage of strains having a different nucleotide) was calculated compared to LV. Also, for each codon the ratio between synonymous and non-synonymous mutations (S/NS) was determined. As an initial criterion, codons where at least 25% of the examined strains had a mutation and had an S/NS higher than 3 were considered as potentially negatively selected while ratios below 0.3 were potential points for positive selection. These thresholds were set arbitrarily. In the third step, the probability that n strains from a set of N sequences shared the same mutation was calculated. For this calculation it was assumed that each sequence was epidemiologically unrelated to the others. Considering the degeneracy of the genetic code, the probability of a synonymous mutation (Ps) for a given aminoacid at a given point was calculated as Ps = (codons coding the same aminoacid-1)/60. The probability of a non-synonymous mutation was 1-Ps. The probability that n unrelated sequences shared a mutated codon encoding the same aminoacid was calculated according to a binomial distribution. Positive or negative selection at a given point was arbitrarily considered to occur when the probability was ≤1 × 10 −9 .

Predicted GP5 aminoacid sequences were aligned, similarities to LV were calculated and a bootstrapped phylogenetic tree was constructed using the neighbor-joining method (1000 iterations) using LV as the outgroup. An entropy plot of predicted GP5 was constructed to determine conserved and highly variable regions of the protein.

The sequence of a hypothetical strain containing all positively selected mutations was written and analysed using Bio-edit. The hypothetical GP5 was compared with other available European GP5 sequences using the Blastp utility (http://www.ncbi.nlm.nih.gov/BLAST). Finally, for this hypothetical strain transmembrane regions and N-glycosylation sites were evaluated using TMPred and NetNGlyc utilities at Expasy server (http://www.expasy.org) and the hydrophobic profile (Kyle and Dolittle method) was determined using Bio-edit.

For the Spanish 1991-1995 set of nucleotide sequences, the percentage of similarity to LV ranged from 99.1 to 87.2% (average, 95.5 ± 3.6%). For the 2000-2005 set, similarity to LV ranged from 94.9 to 81.7% (average 89.5 ± 2.8%). Entropy analysis showed three highly variable regions. The first one was located between nucleotide residues 165 and 189; the second one between residues 315 and 339 and the third one was located between residues 360 and 369. Other points of high variability were found at residues 36-39, 429-432 and 480-489. Regions located between nucleotides 99-120 and 123-165 were the less variable part.

As expected, the CA globally showed that codon usage was not significantly different in 1991-1995 and 2000-2005. However, when specific aminoacids were examined, codon usage differed among 1991-1995 sequences and those of 2000-2005 for leucine, glutamine, serine and proline (Table 1) . Thus, predominant codon for leucine in the 1991-1995 set was CTC (RCSU = 2.08) while in 2000-2005 was TTG (RCSU = 1.99). For glutamine, the only codon present in the older set of Table 1 Codon usage in the ORF5 gene of Spanish sequences of porcine reproductive and respiratory syndrome strains (1991-1995 and 2000-2005) AA: aminoacid; N: number of effective codons; RSCU: synonymous codon usage measure. In bold is shown the most frequent codon for each aminoacid. Boxes show the main changes between 1991-1995 and 2000-2005 sequences. sequences was CAA (RSCU = 2.0) while in the set 2000-2005 appeared the codon CAG (RSCU = 0.28). For serine, older strains preferentially carried AGC (RSCU = 2.16) while newer strains preferentially used TCC (RSCU = 1.46) and for proline, older strains preferentially had CCG (RSCU = 1.41) instead of the CCC codon of the newer strains (RSCU = 1.30). Results for 1991-1995 non-Spanish European-type sequences were similar. The most frequent codons used for leucine were TTG (RSCU = 2.09) and CTC (RSCU = 1.97); for glutamine CAA (RSCU = 1.81) and CAG (RSCU = 0.19), for serine AGC (RSCU = 1.87) and TCC (RSCU = 1.41); and for proline CCG (RSCU = 1.84) and CCA (RSCU = 1.05). Some differences related to the country of origin were observed. Polish strains and the British strain preferentially coded leucine with the codon CTC. The codification of glutamine with the codon CAG was only found in Belgian strains.

Ten highly expressed pig genes were also analysed for codon usage. In those genes, the most commons codons for leucine, glutamine, serine and proline were TTG (RSCU = 0.81); CAG (RSCU = 1.45); TCC (RSCU = 1.36) and CCC (RSCU = 1.26). (Table 2) . Interestingly, 12 strains changed Asn-46 to Asp-46, losing thus one glycosylation site. Also, 11 of those 12 strains gained a glycosylation site by changing Asp-37 to Asn-37. For the 1991-1995 set, the number of positively selected sites was seven. Another 20 codons showed the characteristics of a negative selection and were mainly distributed in three segments of the predicted protein (residues 73-89; 108-113 and 153-169) ( Table 3) . Codon usage significantly changed (p < 0.05) in 14 of these negatively selected positions between strains of 1991-1995 and those of 2000-2005 (Table 4) .

Average similarity of predicted GP5 proteins with regards to LV was 83.8% for the 2000-2005 set and 94.4% in the 1991-1995 set. The bootstrapped tree of the predicted aminoacid sequences of GP5 a high diversity in GP5 (Fig. 1) . Although, in general, strains from 2000 to 2005 tended to cluster together while 1991-1995 sequences were scattered along the tree, bootstrap values only supported small clusters and did not provide evidence for a clear evolutionary line between modern and older strains.

The entropy analysis showed three highly variable regions located between aminoacids 56-63; 105-113 and 120-130. These segments corresponded to the parts with higher entropy values for the nucleotide sequence. The most conserved region was found between residues 38-55.

A hypothetical GP5 containing all positively and negatively selected sites was analysed to make a prediction of its characteristics. This hypothetical GP5 had a similarity of 88.1% compared to LV. As shown by BLAST comparison, the 10 sequences closest to the hypothetical GP5 (besides those included in the study) had a similarity ranging from 91 to 88% (average 89.4%). The older strain included in this set of 10 was a Spanish sequence of 1991. Interestingly, of the 24 predicted positive selection sites, 16 were present in sequences from other European countries and eight were predominant (>70% frequency) regardless of the country of origin of the PRRSV strain. Regarding negative selection sites, 19 out of 20 were present in all sequences (Fig. 2) .

For the hypothetical strain, the signal peptide comprised residues 1-34 (1-32 in LV); transmembrane regions were pre- dicted to exist at the following segments: 69-90 and 108-127 (same segments in LV). Potential N-glycosylation sites were predicted at residues 37 and 53 (46 and 53 in LV). Comparison of the hydrophobicity profiles of LV and the hypo-thetical GP5 showed that the latter was less hydrophobic (not shown).

The study of the evolution and adaptation of viruses to their hosts is a question of relevance because provides insight on the mechanisms by which a viral variant gains prevalence in a population. A large endemic population with a high replacement rate is a suitable frame to study such phenomena. This is the case of Spain for PRRSV. Spanish pig population is the second largest in Europe with some 24 million pigs and, according to FAO statistics, imports every year about 1.2 million live pigs (http://faostat.fao.org).

The present study was conducted with two sets of PRRSV sequences, one corresponding to the period 1991-1995 and the other to 2000-2005. In this lapse of years, average similarity to LV changed from above 95% in 1991-1995 to below 90% in 2000-2005 . These values suggest an increase in divergence of about 0.5% per year. If divergence increased at a constant rate and sequences from 1991 to 1995 shared an average similarity of 95% to LV, original PRRSV strains in pigs could have originated some 10 years before; namely, about 1981-1985. This is the predicted date in which PRRSV is thought to have entered the domestic pig population (Forsberg et al., 2001; Hanada et al., 2005; Plagemann, 2003) .

The entropy analysis showed that this divergence arise from mutations scattered in ORF5 although hypervariable regions could be recognised. This has been described before (Pesch et al., 2005; Pirzadeh et al., 1998) and it is thought that these hypervariable regions can correspond to potentially immunogenic sites. Actually, the neutralisation epitope of GP5 is located in the middle of the ectodomain (Plagemann, 2004b) and the first hypervariable region flanked this epitope.

Codon usage was different for leucine, glutamine, serine and proline in either set of Spanish sequences. When compared with the codon usage of other early European PRRSV strains, results were similar for glutamine, serine and proline while Table 4 Significant changes (p < 0.05) in codon usage for negatively selected sites of ORF5 of porcine reproductive and respiratory strains isolated in 1991-1995 or 2000-2005 leucine was preferentially coded with TTG as did the most recent Spanish strains. Most frequent codons for these aminoacids in 2000-2005 sequences were similar to the codon usage in some highly expressed swine genes. These results can be interpreted as a sign of either a selection or an adaptation of PRRSV to the codon usage most adequate for an efficient replication in the pig host. Also, this adaptation can have other implications. Several authors (Cook et al., 2005; Kheyar et al., 2005) have shown that optimising codon usage of arterivirus genes to that of mammalian cells results in an increase of the levels of expression of viral genes as well as increases immunogenicity of viral proteins. In the present case, our results suggest that PRRSV is still adapting to the swine host. This should be taken into account when designing attenuated vaccines because adequate The codon analysis revealed and average S/NS of 1.41. This ratio was similar to that determined by Hanada et al. (2005) for Coronaviruses but lower than reported by others (Pesch et al., 2005) for PRRSV. This high rate of non-synonymous mutations may have important implications for the design of vaccines since these variable points may constitute inefficient targets for the immune system.

The examination of positively and negatively selected sites showed 24 potential sites for positive selection and 20 for negative selection. Ten of the 24 sites for positive selection were located in transmembrane sections of the predicted GP5, a fact that suggests that many of these adaptations were not selected because of a pressure of the neutralising antibodies. In contrast, the negatively selected sites were concentrated in the last 100 aminoacids of GP5 (14/20 sites) for which no neutralising antibodies have been detected so far (Plagemann, 2004a,b) . These negatively selected sites clustered in three segments of the predicted GP5. The first two of these clusters (residues 73-89 and 108-113) corresponded to predicted transmembrane regions while the function of the third segment is still unknown. Since variability of aminoacids in those points is restricted in spite of the presence of several possible codons, it is reasonable to think that these sites are probably crucial for virus integrity or functionality.

The phylogenetic analysis of GP5 sequences did not support a clear line of evolution from older to newer Spanish strains. However, clustering was evident for some newer sequences. With the available data it is impossible to know whether the newer strains represent a PRRSV type that slowly evolved from older strains or if they represent an old type that gained predominance. Unfortunately, we were not able to obtain Spanish sequences from 1996 to 1999 and therefore we could not fill this gap; however, the analysis of 153 GP5 European-type sequences from different countries and periods yielded similar results (not shown). The closest matches to the hypothetical GP5 included strains from different countries and periods but the oldest known match was a Spanish strain from 1991 not included in this study. This fact would support the hypothesis that similarity to the hypothetical GP5 has some benefit because this profile has become predominant in Spain all over the years.

The hypothetical strain was similar to LV but had two additional characteristics that lacked in LV. The first was a change in the sites of glycosylation compared to LV. One was located at the start of the ectodomain (Asp-37) and the second was located at the known neutralisation epitope (Asp-53). This hypothetical GP5 would lose thus the glycosylation site at Asp-46. In contrast, positive selection in position 37 introduces a new glycosylation site. Previous studies suggested that the lack of this glycosylation at Asp-46 reduces virus infectivity and can be a marker of attenuation (Wissink et al., 2004) . However, according to Pesch et al. (2005) , glycosylation at position 46 can be found in three European-type attenuated vaccines of which only one also has a glycosylation at position 37. Several authors claimed that these additional glycosylation sites may serve to mask the key B-epitopes (Chen et al., 1998) although this has not been proven yet for PRRSV. The second fact that differentiates the hypothetical GP5 from that of LV was a lesser hydrophobic profile. The consequence would be a more exposed GP5 that could better interact with the receptors in target cells.

The present study shows that ORF5 of PRRSV has increased its genetic diversity over time. This evolution included positive and negative selections of given aminoacids in specific sites of the PRRSV genome, mainly in transmembrane segments of GP5. Also, codon usage for leucine, glutamine, serine and proline changed and in more recent sequences have more resemblance to codon usage in highly expressed pig genes suggesting that a process of adaptation to pig is taking place. These data, if further confirmed by other studies with PRRSV isolates of other countries, may be useful to understand the evolution of PRRSV as well as can be relevant for the design of new and more efficacious vaccines.

Incorporation fidelity of the viral RNA-dependent RNA polymerase: a kinetic, thermodynamic and structural perspective

Neuropathogenicity and susceptibility to immune response are interdependent properties of lactate dehydrogenase-elevating virus (LDV) and correlate with the number of N-linked polylactosaminoglycan chains on the ectodomain of the primary envelope glycoprotein

Genetic immunization with codon-optimized equine infectious anemia virus (EIAV) surface unit (SU) envelope protein gene sequences stimulates immune responses in ponies

Heterogeneity in Nsp2 of European-like porcine reproductive and respiratory syndrome viruses isolated in the United States

A molecular clock dates the common ancestor of European-type porcine reproductive and respiratory syndrome virus at more than 10 years before the emergence of disease

The genetic diversity of European type PRRSV is similar to that of the North American type but is geographically skewed within Europe

Seroneutralization of porcine reproductive and respiratory syndrome virus correlates with antibody response to the GP5 major envelope glycoprotein

The origin and evolution of porcine reproductive and respiratory syndrome viruses

Alternative codon usage of PRRS virus ORF5 gene increases eucaryotic expression of GP(5) glycoprotein and improves immune response in challenged pigs

Genetic diversity and phylogenetic analysis of glycoprotein 5 of European-type porcine reproductive and respiratory virus strains in Spain

Molecular cloning and nucleotide sequencing of the 3 -terminal genomic RNA of the porcine reproductive and respiratory syndrome virus

Emergence of porcine reproductive and respiratory syndrome virus deletion mutants: correlation with the porcine antibody response to a hypervariable site in the ORF 3 structural glycoprotein

Identification of neutralizing and nonneutralizing epitopes in the porcine reproductive and respiratory syndrome virus GP5 ectodomain

New insights into the genetic diversity of European porcine reproductive and respiratory syndrome virus (PRRSV)

Genomic and antigenic variations of porcine reproductive and respiratory syndrome virus major envelope GP5 glycoprotein

Porcine reproductive and respiratory virus: origin hypothesis

GP5 ectodomain epitope of porcine reproductive and respiratory syndrome virus, strain Lelystad virus

The primary GP5 neutralization epitope of North American isolates of porcine reproductive and respiratory syndrome virus

Phylogenetic relationships of european strains of porcine reproductive and respiratory syndrome virus (PRRSV) inferred from DNA sequences of putative ORF-5 and ORF-7 genes

Heterodimerization of the two major envelope proteins is essential for arterivirus infectivity

The molecular biology of arteriviruses

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

Significance of the oligosaccharides of the porcine reproductive and respiratory syndrome virus glycoproteins GP2a and GP5 for infectious virus production

The 2b protein as a minor structural component of PRRSV

Our grateful thanks to Dr. Anna Barceló for her assistance in sequencing and to Dr. Dieter Klein for providing data for Austrian strains.