key: cord-301347-22lt6h40 authors: Jarvis, Matthew C.; Lam, Ham Ching; Zhang, Yan; Wang, Leyi; Hesse, Richard A.; Hause, Ben M.; Vlasova, Anastasia; Wang, Qiuhong; Zhang, Jianqiang; Nelson, Martha I.; Murtaugh, Michael P.; Marthaler, Douglas title: Genomic and evolutionary inferences between American and global strains of porcine epidemic diarrhea virus date: 2016-01-01 journal: Prev Vet Med DOI: 10.1016/j.prevetmed.2015.10.020 sha: doc_id: 301347 cord_uid: 22lt6h40 Porcine epidemic diarrhea virus (PEDV) has caused severe economic losses both recently in the United States (US) and historically throughout Europe and Asia. Traditionally, analysis of the spike gene has been used to determine phylogenetic relationships between PEDV strains. We determined the complete genomes of 93 PEDV field samples from US swine and analyzed the data in conjunction with complete genome sequences available from GenBank (n = 126) to determine the most variable genomic areas. Our results indicate high levels of variation within the ORF1 and spike regions while the C-terminal domains of structural genes were highly conserved. Analysis of the Receptor Binding Domains in the spike gene revealed a limited number of amino acid substitutions in US strains compared to Asian strains. Phylogenetic analysis of the complete genome sequence data revealed high rates of recombination, resulting in differing evolutionary patterns in phylogenies inferred for the spike region versus whole genomes. These finding suggest that significant genetic events outside of the spike region have contributed to the evolution of PEDV. Porcine epidemic diarrhea virus (PEDV) causes diarrhea, vomiting, and dehydration, leading to high mortality (up to 100%) in suckling piglets. PEDV was first discovered in the United Kingdom in 1971, and later was found in Belgium, Hungary, France, Italy, and the Czech Republic (Chasey and Cartwright, 1978; Fan et al., 2012) . In 1986, PEDV was first reported in China, and proceeded to spread throughout Asia (Cui, 1990; Song and Park, 2012) . In late 2010, a "variant" PEDV strain with increased pathogenesis compared to the PEDV is a single-stranded, positive sense RNA virus belonging to the family Coronaviridae, genus Alphacoronavirus. The PEDV genome is approximately 28 kb in length and roughly two-thirds of the genome consists of open reading frame (ORF) 1, which encodes 16 non-structural proteins (nsps) (Lai et al., 2013) . These nsps play important roles in viral replication, post-translational processing, and immune evasion (Lai et al., 2013) . The virus produces various structural proteins, including spike, membrane, and nucleocapsid (Lai et al., 2013) . The spike protein is crucial to cell attachment and infection, and the envelope is an integral membrane protein, aiding in membrane fusion while the nucleocapsid protein is necessary for genomic packaging (Hagemeijer and de Haan, 2015) . In addition, the PEDV genome includes ORF3, located between the spike and membrane genes, that encodes an ion channel protein possibly associated with PEDV pathogenesis (Park et al., 2008; Wang et al., 2012) . Researchers have explored various regions of the coronavirus (CoV) genome to link specific areas with virulence and host cell attachment. For example, the spike gene codes for a viral attachment protein that can be divided into the S1 (1-789 aa) and S2 (790-1383 aa) regions (Song and Park, 2012) . Comparative analysis of transmissible gastroenteritis virus (TGEV), porcine respiratory coronavirus (PRCV), and murine hepatitis virus (MHV) revealed two main antigenic sites in the S1 region: the N-terminal domain (NTD) and the C-terminal receptor binding domain (RBD) (Li et al., 2007) . While both domains can influence virus infectivity, such as in TGEV, one domain tends to be central to a CoV's tropism: the NTD is important for MHV tropism, and the RBD is central to PEDV infectivity and virulence (Reguera et al., 2012) . The NTD can bind to various sialic acids on the host cell surface (Reguera et al., 2012) . The RBD contains residues that bind to the porcine aminopeptidase-N (pAPN), the host receptor utilized by TGEV and PEDV (Delmas et al., 1992) . Since the last large-scale North American PEDV outbreak ended in the spring of 2014, the complete genomes of 93 PEDV strains from the US were sequenced and analyzed to further understand the origin and phylogenetic relationships among the American and global PEDV strains. In-depth nucleotide and amino acid analysis was conducted to identify genes of high diversity. Bayesian analysis was performed to understand the evolution of PEDV and the emergence of different clades within US strains. In addition, the RBD was modeled to visualize the differences between American and Asian strains to better understand how changes in the RBD might affect vaccine efficacy and development. Samples were routinely submitted to the University of Minnesota Veterinary Diagnostic Laboratory (UMVDL) for pathogen detection. Between January 2014 and December 2014 samples were screened for PEDV by real time RT-PCR . Samples for complete genome sequencing were selected based on the criteria of a high viral concentration from the RT-PCR results and geographical diversity within the US. A total of 83 samples, including fecal (n = 38), intestinal homogenate (n = 21), fecal swab (n = 10), oral fluid (n = 5), feedback (n = 4), and environmental (n = 5) samples were selected for complete genome sequencing using next generation sequencing (NGS) techniques as previously described (GenBank numbers KR265759-KR265834, KR265840-KR265846) (Marthaler et al., 2013; Marthaler et al., 2014) . Whole genomic PEDV sequences obtained using NGS techniques were also generously supplied from Iowa State University (n = 7, GenBank numbers KM975735-KM975741) and the Ohio Department of Agriculture (n = 3, GenBank numbers KP641661-KP641663), using previously described methods Chen et al., 2014) . Using the complete PEDV genome sequences from this study (n = 93) and the available PEDV sequences from GenBank (n = 126), two nucleotide alignments were created and analyzed to determine the phylogenetic relationships between American and global PEDV sequences: the concatenation of all ORFs (ORF1, S, ORF3, envelope, membrane, and nucleocapsid), and a S1 alignment. Vaccine and cell-passaged strains were excluded from the analysis (Table S1 ). Nucleotide and amino acid entropy analyses were performed using the MATLAB software (MATLAB v8.0 and Statistics Toolbox v8.1, The MathWorks, Inc., Natick, MA, USA). Threshold values were determined using previously published methods (Shannon, 1948; Litwin and Jores, 1992) . Recombination analysis was performed using the Recombination Detection Program (RDP) v4, which uses multiple detection algorithms, including the RDP Method, GENECOV, and MAXCHI, to check for the presence of recombinant sequences in the sequence dataset (Martin et al., 2015) . Window size was set to 100 bp. Breakpoints, the presence of major/minor donor sequences, and confidence intervals were used to determine regions that required excision from the alignment, or if entire sequences needed to be removed from the analysis due to multiple recombination events within the sequence. Recombinant sequences were removed only prior to the Bayesian analysis, but remained in the alignments for all entropy analysis and molecular modeling. Bayesian Markov Chain Monte Carlo (MCMC) approach using BEAST v1.8.1, with a relaxed molecular clock and Bayesian skyline population (BSP) prior, with a general-time reversible nucleotide substitution and gamma distributed among-site rate variation was used to infer time-scaled phylogeny (Drummond et al., 2002 , 2006 , 2012Drummond and Rambaut, 2007 Minin et al., 2008; Drummond and Suchard, 2010) . The MCMC chain was run for 800 million generations, with sub-sampling every 80,000 iterations. A Maximum clade credibility (MCC) tree was created by discarding the initial 10% of the chains and summarized in TreeAnnotator (v.1.8.0). Key nodes were identified using FigTree (v.1.4 .2) to determine time to most recent common ancestor (tMRCA). The putative pAPN receptor-binding residues were analyzed to determine residue trends between classical and pandemic strains (Reguera et al., 2012) . The C-terminal RBD within the S1 region of the spike gene was modeled using the open-source modeling server SWISS-MODEL provided by the Swiss Institute of Bioinformatics (Biasini et al., 2014) . Predicted tertiary structure of the PEDV pAPN RBD was modeled using PRCV as a template since a PEDV template was not available. Spike monomer and trimer models were developed using a theoretical SARS-CoV model as a template (Bernini et al., 2004) . Illustrations were created using the open-source Javabased molecular viewer Jmol (Herraez, 2006) and the Python-based molecular viewer PyMOL (The PyMOL Molecular Graphics System, Version 1.7.4 Schrödinger, LLC.). 1932, 1951, 1976, 2467, 2517, 2652, 2658, 2664, 2667, 2679 2720, 2809, 2856, 2883, 2940, 3003, 3021, 3025, 3078, 3105, 3110, 3171, 3516, 3708, 3876 The PEDV nucleotide sequences ranged from 27,529 to 28,061 bases in length. Two PEDV genomes from our study had insertions or deletions. Ohio249 had a 3-nt insertion between positions 22,039 and 22,041 in the spike gene while Minnesota309 with a 4-nt deletion from positions 27,768 to 27,771 in the 3 UTR compared to the original US strain, USA/Colorado/2013. Entropy analysis was conducted with 219 whole nucleotide and amino acid sequences, containing the concatenated ORFs excluding 5 and 3 UTRs. Entropy values greater than 0.8 and 0.6 were considered highly variable for the nucleotide and amino acid alignments, respectively, based on the level of diversity in the dataset and previously determined entropy values (Litwin and Jores, 1992) . Within the nucleotide alignment, 15 of the 20 PEDV regions lacked positions with entropy levels above 0.8 (nsp1, nsp5-nsp10, nsp13, nsp15, nsp16, S2, ORF3, envelope, membrane, and nucleocapsid) while 5 regions had entropy levels above 0.8 (nsp2, nsp3, nsp12, nsp14, and S1) (Fig. 1) . The nsp2 and 3 were the most divergent regions containing 10 and 16 diverse nucleotide positions, respectively (Table 1) . Interestingly, the nsp12 gene contained 4 diverse nucleotide positions, which were absent in the amino acid sequence (Fig. 1A) . Inversely, high amino acid diversity was observed in the nsp4, nsp13, nsp16, and S2 genes, which were absent in the nucleotide alignment (Fig. 1B) . Higher entropy levels were present in the nsp2, nsp3, and S1 regions in both the nucleotide and amino acid alignments. Overall, the ORF1a entropy levels were higher compared to ORF1b in the amino acid analysis. Of the structural genes, the S gene had the highest entropy levels compared to the envelope, membrane, and nucleocapsid genes. Recombination was detected in 7 main areas of the concatenated full genome, including the nsp2, nsp3, nsp14-16, S1 domain, and nucleocapsid gene ( Fig. 2A) . In these areas, recombination was present in the majority of the sequences, so the entire region was excised from the alignment prior to Bayesian analysis. In addition, 35 sequences (23 from Asia, 12 from the Americas) were omitted from the Bayesian analysis due to evidence of widespread recombination throughout the genome (Table S1 ). For example, the pandemic sequence Minnesota211 contained a recombinant region with the characteristic S-INDEL deletions and insertions in the S1 domain, indicating a recombinant event occurred between an S-INDEL strain and a non S-INDEL pathogenic strain in the US (Fig. 2B) . A maximum clade credibility (MCC) phylogeny was inferred for both the concatenated genomic sequences excluding the recombinant regions (12,999 nt) and the spike S1 gene (2142 nt). The analysis was run independently twice until convergence was reached, with high agreement between the two runs. In the concatenated alignment tree, the classical and pandemic Asian strains were positioned as basal to the US strains, consistent with an Asian origin for the US outbreak (Fig. 3) . Importantly, the concatenated alignment tree suggests that the US epidemic may have resulted from two independent PEDV introductions into the US, including minor and major clade of viruses. The minor clade contained the American and European S-INDELs, and a small subclade of non S-INDEL sequences from the Ohio, including Ohio249/2014, PC21A/2013, and OH15962/2013. The major clade of US PEDV strains was supported by high posterior probability (100%) and appears to have diverged further into two highly supported sublineages (99% and 100% posterior probability). The phylogeny is consistent with multiple incursions of the major clade of US PEDV viruses into Mexico, Canada, and South Korea. The minor clade includes sequences from late 2013 to early 2014 that are localized to the midwestern and eastern US regions. The estimated tMRCA of the minor clade of US PEDV strains is July 2009-2011, and the estimated tMRCA of the major clade of US PEDV strains is September 2010-August 2012. The estimated evolutionary rate for the complete genome (excluding recombinant regions and sequences) is 6.2 × 10 −4 substitutions/site/year (4.8 × 10 −4 -7.6 × 10 −4 , 95% highest posterior density (HPD)). The rate estimate for the US strains is slightly higher, but not significantly: 5.5 × 10 −3 substitutions/site/year (4.6 × 10 −3 -6.5 × 10 −3 , 95% HPD). The spike tree illustrates the evolutionary relationship between the classical strains and the S-INDELs, which suggests a classical origin for the S-INDEL genotype (Fig. 4) . The pathogenic strains form a highly diverged major clade (Fig. 4B) , which braches into 2 large American clades. In addition, the Bayesian analysis of the spike gene might suggest 2 separate introductions of PEDV into the US. The evolutionary rate for the S1 gene is 1.5 × 10 −3 substitutions/site/year (1.1 × 10 −3 -1.9 × 10 −3 , 95% HPD). Considering the high entropy levels in the spike gene and the evolutionary rate determined from the S1 Bayesian spike tree, the RBD within the S1 was further examined. The S-INDEL and classical PEDV strains shared similar amino acid substitutions, specifically in the NTD of the S1 region (Fig. S1) . Furthermore, the pandemic PEDV strains from China had an increased number of substitutions within the S1 domain when compared to the American strains due to the longer circulation time in China. Compared to the attenuated vaccine strain DR13, 29 of 185 (16%) American strains and 19 of 34 (56%) Asian strains had at least one amino acid substitution in the pAPN RBD ( Table 2 ). The majority of the American strains (n = 156) did not represent any amino acid differences in the pAPN RBD. In this region, 8 positions in the American strains had amino acid differences compared to the vaccine strain DR13 (Fig. 5A) . The most common substitutions were in the fourth region of the pAPN RBD at positions E594D (n = 10) and G598D (n = 9), which were substituted with aspartic acid. More substitutions (n = 13) occurred in the pAPN RBD of the Asian strains (Table 2) , with the most common substitutions at position H515L (n = 5). The RBD regions were three-dimensionally modeled to illustrate the 8 and 13 amino acid positions at which substitutions occurred in the North American and Asian strains, respectively. The modeling of the spike protein suggests that the pAPN RBD residues cluster around the inner pore created by the trimer molecule, while the NTD is oriented around the outer surface of the S1 domain ( Fig. 5B through 5F). Genomic analysis depends critically on complete sequence data to conduct accurate research on phylogeny, evolution, and gene regulation. In the past, it was more economically and time effective to sequence smaller pieces of a genome and develop evolutionary conclusions from these relatively small genomic pieces. However, without full genomic sequences, it is impossible to compare variations within a genome to determine selective pressures on specific genes or regions. Because of NGS technology, tools like site-specific entropy analysis can be used to examine variability throughout the genome of many PEDV sequences. Sun and collaborators reported four regions of diversity within the PEDV genome, including V1 in the nsp2 and nsp3, V2 in the S1, V3 in the S2 and ORF3, and V4 in the nucleocapsid (Sun et al., 2015) . While our results support the nucleotide variance in the nsp2, nsp3, and spike genes, high levels of diversity were not present in the S2, ORF3, and the nucleocapsid. This could be due to the omission of PEDV isolates in our analysis, as well as the comparatively large number of new US sequences in our dataset. Our variance results may more accurately represent variance within American PEDV strains, and underrepresent variance within Chinese strains. The diversity in the S1 region is comprehensible since it is under strong immunological pressure while the S2 region is more conserved throughout CoVs (Aydin et al., 2014) . The functions of nsp2 and nsp3 remain relatively ambiguous. Despite being involved in viral growth and propagation, nsp2 is dispensable for viral replication because CoV strains can replicate in absence of the nsp2 (Graham et al., 2006) . The function of nsp3 may be related to innate immune evasion since it encodes proteases that facilitate proteasome degradation, changes in intracellular destination, signaling, protein interactivity, and host type I interferon (IFN) antagonist activities (Xing et al., 2013) . Due to the multifaceted nature of nsp3, other nsp regions could produce proteins with novel effects not yet understood that mediate the virulence of CoV species. Acquired nucleotide differences throughout the nsp2 and nsp3 regions could contribute to the evasion of host immunity. Thus, future research should focus on the functionality and importance of all PEDV genes to further understand CoV pathogenesis. Recombination plays a pivotal role in the evolution of CoVs by creating new strains with altered virulence. The Minnesota211 strain originated from a recombination even between an S-INDEL and a US pandemic strain, which has been associated with altered pathogenesis. While recombination may occur more often during an epidemic, recombination events occurred in most of the Asian strains. Recombination events can affect the phylogenetic analy-sis because different regions of the genome may have different evolutionary histories (Spade et al., 2015) . Our recombination analysis resulted in a significant portion of the complete genome being removed prior to more detailed phylogenetic analysis. At this time, the BEAST program cannot accommodate genetic data that includes recombined regions. Our analysis supported an Asian origin for the US outbreak while the inference is biased by the lack of background sequences from other regions. Although over 100 genomes were from the US, interpretation of the evolutionary and spatial his- tory of this data is limited by the lack of genomic PEDV data from other regions, including Europe and Asia. US strains had a higher evolutionary rate compared to the global strains, but the Bayesian Skyline plot did not show any significant increase in evolutionary rate, possibly due to the lack of temporal sampling in the US dataset. The evolutionary rate for the spike gene was higher compared to the rest of the genome, reflecting greater selective pressure. The overall evolutionary rate of PEDV (6.2 × 10 −4 substitutions/site/year) is similar to that of TGEV and wild animal CoVs (6.08 × 10 −3 substitutions/site/year, 1.3 × 10 −5 -1.4 × 10 −2 95% HPD), but lower than that of SARS-CoV (2.3 × 10 −3 nucleotide substitutions/site/year), except during the time of the US PEDV 2013 epidemic (5.5 × 10 −3 sub-stitutions/site/year) (Song et al., 2005; Vijaykrishna et al., 2007) . Surprisingly, 3 pandemic strains were positioned within the S-INDEL clade. Possibly, a pandemic and S-INDEL strain were introduced into the Americas, and a recombination event occurred in the NTD that removed the characteristic insertions and deletions of an S-INDEL strain, as indicated in the Minnesota211 sequence. The relationship between the US minor clade and the recent PEDV strains from Europe is less clear. While these viral populations are closely related (posterior probability of 100%), the direction of transmission is unclear at this time. Additional sequences from Europe might help to resolve the origins of these recent European PEDV cases. (E) A monomer model of the PEDV spike protein, with the C-terminal RBD represented in green, dark blue represents the S2 region, light blue represents the S1 region, and yellow represents the N-terminal RBD. (F) A theoretical tertiary structure model of the PEDV spike protein. Blue represents the S1 region, with the specific N-and C-terminal RBDs highlighted in yellow and green, respectively. The pAPN-RBD is shown in violet.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Examining the spike gene can reveal interesting conclusions about the pAPN RBD. While the NTD spans a larger region of the S1 domain, it has not been directly linked to PEDV tropism and functionality, as in TGEV and MHV (Reguera et al., 2012) . The Korean attenuated vaccine strain and the US strains share similar residues in the pAPN RBD, with numerous differences compared to the older classical strains, suggesting this vaccine may protect against the American strains. However, developing a consistently and longitudinally efficacious vaccine may prove challenging, considering the high evolutionary rate of the S1 region. Failures in the development of an efficacious vaccine have been reported, further supporting the difficulty in generating vaccines for PEDV . Despite some uncertainties in vaccine efficacy, a recent study demonstrated that prior exposure of sows to the S-INDEL strain provided a level of protective immunity when their piglets were challenged with the more virulent original US PEDV strain, which is probably due to conservation within the C-terminal region of the viral genome (Goede et al., 2015) . While the exact functionality of all the genes of PEDV and other CoVs is unknown, adding the complete genomes of diverse strains to the global database promotes better understanding of evolutionary and phylogenetic relationships. Multiple regions within the genome are variable, and recombination is common between PEDV strains. Despite excising a large portion of the genome prior to analysis, the Bayesian trees illustrate two distinct entries of PEDV into the US and characterize the evolution of PEDV compared to other CoVs. Modeling of the pAPN RBD region has revealed that Asian strains have increasing diversity compared to previously developed vaccines, and the variability in both the American and Asian strains needs to be considered for future vaccine development. As the US swine industry recovers from the PEDV epidemic of 2013-2014, research is maturing to understand the regions of diversity, evolution, and the RBD of PEDV to prevent future outbreaks and foster vaccine development. Influence of hydrophobic and electrostatic residues on SARS-coronavirus S2 protein stability: insights into mechanisms of general viral fusion and inhibitor design Prediction of quaternary assembly of SARS coronavirus peplomer SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information Virus-like particles associated with porcine epidemic diarrhoea Isolation and characterization of porcine epidemic diarrhea viruses associated with the 2013 disease outbreak among swine in the United States Studies on the detection of porcine epidemic diarrhea virus by immunofluorescent techniques Aminopeptidase N is a major receptor for the entero-pathogenic coronavirus TGEV Relaxed phylogenetics and dating with confidence Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data BEAST: Bayesian evolutionary analysis by sampling trees Bayesian coalescent inference of past population dynamics from molecular sequences Bayesian random local clocks, or one rate to rule them all Bayesian phylogenetics with BEAUti and the BEAST 1.7 Scientific opinion on porcine epidemic diarrhoea and emerginf pig deltacoronavirus Complete genome sequence of a novel porcine epidemic diarrhea virus in south China Previous infection of sows with a mild strain of porcine epidemic diarrhea virus confers protection against infection with a severe strain The nsp2 proteins of mouse hepatitis virus and SARS coronavirus are dispensable for viral replication Comparison of porcine epidemic diarrhea viruses from Germany and the United States Biomolecules in the computer: Jmol to the rescue Nidovirales: coronoviridae and ateriviridae Outbreak-related porcine epidemic diarrhea virus strains similar to US strains, South Korea Porcine aminopeptidase N is a functional receptor for the PEDV coronavirus Phylogenetic analysis of porcine epidemic diarrhea virus (PEDV) field strains in central China based on the ORF3 gene and the main neutralization epitopes In theoretical and experimental insights into immunology Complete genome sequence of strain SDCV/USA/Illinois121/2014, a porcine deltacoronavirus from the United States Complete genome sequence of porcine epidemic diarrhea virus strain USA/Colorado/2013 from the United States RDP4: detection and analysis of recombination patterns in virus genomes Outbreak of porcine epidemic diarrhea virus in Portugal Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics Complete genome sequence of the porcine epidemic diarrhea virus variant tottori2/JPN/2014 The first case of porcine epidemic diarrhea in Canada Cell culture isolation and sequence analysis of genetically diverse US porcine epidemic diarrhea virus strains including a novel strain with a large deletion in the spike gene Structural bases of coronavirus attachment to host aminopeptidase N and its inhibition by neutralizing antibodies The mathematical theory of communication The Bell system Porcine epidemic diarrhoea virus: a comprehensive review of molecular epidemiology, diagnosis, and vaccines Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human Geometric ergodicity of a hybrid sampler for Bayesian inference of phylogenetic branch lengths Genomic and epidemiological characteristics provide new insights into the phylogeographical and spatiotemporal spread of porcine epidemic diarrhea virus in Asia Outbreak of porcine epidemic diarrhea in suckling piglets Evolutionary insights into the ecology of coronaviruses Distinct characteristics and complex evolution of PEDV strains New variant of porcine epidemic diarrhea virus The papain-like protease of porcine epidemic diarrhea virus negatively regulates type I interferon pathway by acting as a viral deubiquitinase This study was supported partially by the Rapid Agricultural Response Fund, established by the Minnesota legislature and administered by the University of Minnesota Agricultural Experiment Station, and by Boehringer Ingelheim Vetmedica, Inc.The authors thank the faculty and personal at the UMVDL for their technical services. Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.prevetmed.2015. 10.020.