key: cord-0796942-1sq2uvur authors: Lau, Susanna K. P.; Wong, Antonio C. P.; Lau, Terrence C. K.; Woo, Patrick C. Y. title: Molecular Evolution of MERS Coronavirus: Dromedaries as a Recent Intermediate Host or Long-Time Animal Reservoir? date: 2017-10-16 journal: Int J Mol Sci DOI: 10.3390/ijms18102138 sha: b1f385c02a27a04b941f53de42b53c1ecc1a7c94 doc_id: 796942 cord_uid: 1sq2uvur While dromedary camels are the immediate animal source of MERS coronavirus (MERS-CoV) infection, the evolutionary origin of MERS-CoV remains obscure. We analyzed 219 camel and human MERS-CoV genome sequences available in GenBank. Phylogenetic analysis showed that 5 and 214 strains belong to clade A and B, respectively, with clade A further divided into lineage A1 (3 human strains) and lineage A2 (2 camel strains), and clade B divided into B1 to B6 (each containing both human and camel strains). Recombination analysis showed potential recombination events in five strains from dromedaries in Saudi Arabia, with recombination between lineage B5 and B3 in four strains, and between lineage B3 and B4 in one strain. The spike protein showed the highest number of amino acid substitutions, especially between A2 and other lineages, and contained positively selected codons. Notably, codon 1020 was positively selected among B and B5 strains, and can distinguish between clade A (Q1020) and B (R1020/H1020) strains, suggesting that this residue may play a role in the evolution of S protein during divergence of different lineages. The time of the most recent common ancestor of all MERS-CoV was dated to approximately 2010. The implications on the role of camels in the evolution of MERS-CoV are discussed. Since its first appearance in 2012, the Middle East Respiratory Syndrome (MERS) has affected more than 25 countries in four continents with more than 1900 cases and a frightening fatality rate of more than 30% [1] . A novel lineage C betacoronavirus, MERS coronavirus (MERS-CoV), has been confirmed to be the etiological agent of MERS [2, 3] . Subsequent detection of MERS-CoV and its antibodies in dromedaries in various countries in the Middle East and North Africa have implied that these animals are probably the reservoir for MERS-CoV [4] [5] [6] . In addition, the discovery of other closely related lineage C betacoronaviruses both before and after the MERS epidemic in various bat species and hedgehogs [7] [8] [9] [10] , as well as the demonstration of the ability of the spike protein of Tylonecteris bat CoV HKU4 to bind dipeptidyl peptidase 4 [11, 12] , the receptor of MERS-CoV, have also suggested that these animals could be the hosts for the ancestor of MERS-CoV. However, it is still uncertain how and when the ancestor of MERS-CoV may have jumped from bats/hedgehogs to camels and humans. Up to March 2017, a total of more than 200 complete genome sequences of MERS-CoV are available in GenBank, with around half of them from MERS-CoVs found in human and the other half from MERS-CoVs in dromedaries. Based on our previous studies on genome sequencing and phylogenetic analysis of MERS-CoVs from dromedaries of the United Arab Emirates (UAE), we observed that diverse MERS-CoV strains were circulating in these dromedaries, supporting a polyphyletic origin [13] . In this study, we analyzed the diversity and phylogeny of all the 219 human and dromedary MERS-CoV genome sequences in GenBank. Possible recombination events among the MERS-CoV strains were also detected. The implications of the results on the origin and control of MERS, and the role of dromedary camels in the evolution of MERS-CoV, are discussed. Among the 219 complete genomes of MERS-CoV, 151 were from Saudi Arabia (human, n = 73; dromedary, n = 78), 29 were from the UAE (human, n = 9; dromedary, n = 20), 4 were from Jordan (human, n = 4), 1 was from Egypt (dromedary, n = 1), 5 were from the United Kingdom (human, n = 5), 3 were from Qatar (human, n = 2; dromedary, n = 1), 3 were from the United States (human, n = 3), 2 were from France (human, n = 2), 2 were from Oman (human, n = 2), 1 was from Thailand (human, n = 1), 1 was from China (human, n = 1), and 17 were from South Korea (human, n = 17) (Figure 1 ). also suggested that these animals could be the hosts for the ancestor of MERS-CoV. However, it is still uncertain how and when the ancestor of MERS-CoV may have jumped from bats/hedgehogs to camels and humans. Up to March 2017, a total of more than 200 complete genome sequences of MERS-CoV are available in GenBank, with around half of them from MERS-CoVs found in human and the other half from MERS-CoVs in dromedaries. Based on our previous studies on genome sequencing and phylogenetic analysis of MERS-CoVs from dromedaries of the United Arab Emirates (UAE), we observed that diverse MERS-CoV strains were circulating in these dromedaries, supporting a polyphyletic origin [13] . In this study, we analyzed the diversity and phylogeny of all the 219 human and dromedary MERS-CoV genome sequences in GenBank. Possible recombination events among the MERS-CoV strains were also detected. The implications of the results on the origin and control of MERS, and the role of dromedary camels in the evolution of MERS-CoV, are discussed. Among the 219 complete genomes of MERS-CoV, 151 were from Saudi Arabia (human, n = 73; dromedary, n = 78), 29 were from the UAE (human, n = 9; dromedary, n = 20), 4 were from Jordan (human, n = 4), 1 was from Egypt (dromedary, n = 1), 5 were from the United Kingdom (human, n = 5), 3 were from Qatar (human, n = 2; dromedary, n = 1), 3 were from the United States (human, n = 3), 2 were from France (human, n = 2), 2 were from Oman (human, n = 2), 1 was from Thailand (human, n = 1), 1 was from China (human, n = 1), and 17 were from South Korea (human, n = 17) (Figure 1 ). The 219 complete genomes of MERS-CoV were clustered into two major clades, A and B ( Figure 2 ). Most MERS-CoV strains belong to clade B, while clade A only contains five strains. Clade A is further divided into two lineages: lineage A1, with three human strains, one being the first MERS-CoV strain isolated from the index 60-year-old male patient from Bisha, Saudi Arabia in September 2012, and the other two from Jordan; and lineage A2, with two camel strains, one from the UAE and the other from Egypt. Clade B is further divided into six lineages. Both human and camel strains were present in most lineages. Lineage B1 is dominated by MERS-CoV strains from the UAE, whereas lineages B2 to B6 are mainly dominated by strains from Saudi Arabia. Lineage 3 consists of the largest number of strains, including the human strains from an outbreak in South Korea, which occurred as a result of acquisition of MERS-CoV by a Korean who visited multiple Middle East countries, including Bahrain, Qatar, Saudi Arabia, and the UAE from 24 April to 4 May 2015 and subsequent The 219 complete genomes of MERS-CoV were clustered into two major clades, A and B ( Figure 2 ). Most MERS-CoV strains belong to clade B, while clade A only contains five strains. Clade A is further divided into two lineages: lineage A1, with three human strains, one being the first MERS-CoV strain isolated from the index 60-year-old male patient from Bisha, Saudi Arabia in September 2012, and the other two from Jordan; and lineage A2, with two camel strains, one from the UAE and the other from Egypt. Clade B is further divided into six lineages. Both human and camel strains were present in most lineages. Lineage B1 is dominated by MERS-CoV strains from the UAE, whereas lineages B2 to B6 are mainly dominated by strains from Saudi Arabia. Lineage 3 consists of the largest number of strains, including the human strains from an outbreak in South Korea, which occurred as a result of acquisition of MERS-CoV by a Korean who visited multiple Middle East countries, including Bahrain, Qatar, Saudi Arabia, and the UAE from 24 April to 4 May 2015 and subsequent spread the infection after returning to South Korea [14] . The same strain of MERS-CoV was also carried to China by a 44-year-old male Korean [15] . This patient visited his father and sister, who were infected with MERS-CoV in South Korea, before traveling to Hong Kong on 26 May and subsequently to Huizhou in the Guangdong province through Shenzhen City. spread the infection after returning to South Korea [14] . The same strain of MERS-CoV was also carried to China by a 44-year-old male Korean [15] . This patient visited his father and sister, who were infected with MERS-CoV in South Korea, before traveling to Hong Kong on 26 May and subsequently to Huizhou in the Guangdong province through Shenzhen City. Fifteen MERS-CoV strains, including six from human and nine from dromedaries, fell into different lineages in the phylogenetic trees constructed using ORF1ab and S (Table 1 and Figure 3 ), suggesting possible recombination events. For example, four MERS-CoV strains from dromedaries (accession numbers KT368829, KT368830, KT368831, KT368832) belong to lineage B5 in the ORF1ab tree but lineage B3 in the S tree; and three MERS-CoV strains from human (accession numbers KT861627, KJ156910, KJ156874) belong to lineage B2 in the ORF1ab tree but lineage B4 in the S tree. Fifteen MERS-CoV strains, including six from human and nine from dromedaries, fell into different lineages in the phylogenetic trees constructed using ORF1ab and S (Table 1, Figure S1 and Figure 3 ), suggesting possible recombination events. For example, four MERS-CoV strains from dromedaries (accession numbers KT368829, KT368830, KT368831, KT368832) belong to lineage B5 in the ORF1ab tree but lineage B3 in the S tree; and three MERS-CoV strains from human (accession numbers KT861627, KJ156910, KJ156874) belong to lineage B2 in the ORF1ab tree but lineage B4 in the S tree. Comparison of the amino acid sequences of the various lineages of MERS-CoV revealed that the number of changes in the S proteins is higher than in other proteins. Notably, MERS-CoV of lineage A2 showed 14 amino acid changes compared to other lineages, while most other lineages (A1, B2-6) only showed one amino acid change compared to other lineages ( Figure 4 ). These amino acid changes are not concentrated in particular motifs ( Figure 4 ). Comparison of the amino acid sequences of the various lineages of MERS-CoV revealed that the number of changes in the S proteins is higher than in other proteins. Notably, MERS-CoV of lineage A2 showed 14 amino acid changes compared to other lineages, while most other lineages (A1, B2-6) only showed one amino acid change compared to other lineages ( Figure 4 ). These amino acid changes are not concentrated in particular motifs (Figure 4 ). For the 15 MERS-CoV genomes that showed different clustering in the ORF1ab and S phylogenetic trees, they were subject to recombination analysis. Among these 15 strains, bootscan analysis revealed possible recombination sites in five ( Figure 5A ,B). These five strains were all from dromedaries in Saudi Arabia. They were separated into two groups, with one group containing four strains from Jeddah of almost identical sequences (accession Nos. KT368829, KT368830. KT368831, KT368832) and the remaining one strain from Taif (accession No. KT368887). For the other 10 strains, bootscan analysis did not reveal any obvious pattern that suggested possible recombination ( Figure 5C For the 15 MERS-CoV genomes that showed different clustering in the ORF1ab and S phylogenetic trees, they were subject to recombination analysis. Among these 15 strains, bootscan analysis revealed possible recombination sites in five ( Figure 5A ,B). These five strains were all from dromedaries in Saudi Arabia. They were separated into two groups, with one group containing four strains from Jeddah of almost identical sequences (accession Nos. KT368829, KT368830. KT368831, KT368832) and the remaining one strain from Taif (accession No. KT368887). For the other 10 strains, bootscan analysis did not reveal any obvious pattern that suggested possible recombination ( Figure 5C,D) . For the 15 MERS-CoV genomes that showed different clustering in the ORF1ab and S phylogenetic trees, they were subject to recombination analysis. Among these 15 strains, bootscan analysis revealed possible recombination sites in five ( Figure 5A ,B). These five strains were all from dromedaries in Saudi Arabia. They were separated into two groups, with one group containing four strains from Jeddah of almost identical sequences (accession Nos. KT368829, KT368830. KT368831, KT368832) and the remaining one strain from Taif (accession No. KT368887). For the other 10 strains, bootscan analysis did not reveal any obvious pattern that suggested possible recombination ( Figure 5C When the 15 MERS-CoV genomes that showed different clustering in the ORF1ab and S phylogenetic trees were subject to multiple alignment analysis, it was shown that for the five strains with possible recombination revealed by bootscan analysis, the possible recombination sites were confirmed by multiple alignments. For four strains (accession No. KT368829, KT368830, KT368831 and KT368832), sequences upstream to the possible recombination site were similar to that of KT368875 from lineage B5, while those downstream to the possible recombination site were similar to that of KT368860 from lineage B3 ( Figure S1A ). As for the remaining strain (accession No. KT368887), sequence upstream to the possible recombination site was similar to that of KT368878 from lineage B3, while sequence downstream to the possible recombination site was similar to that of KT368890 from lineage B4 ( Figure S1B ). To assess adaptive evolution of MERS-CoV, dN/dS ratios (ω) in S gene across the 219 strains were calculated on codon-by-codon basis. The overall ω was 0.424737 (0.392404 and 0.439492 for clade A and B strains, respectively), with most codons having ω < 1, indicating purifying selection (Table 2) . Nevertheless, four codons were predicted to have ω > 1 by different methods with statistical significance, including codons 26 and 312 by fixed effects likelihood (FEL) method, codon 1250 by mixed-effects model of evolution (MEME) method, and codon 1224 by both FEL and MEME methods, indicating possible functional constraints at these positions during evolution. Among them, codons When the 15 MERS-CoV genomes that showed different clustering in the ORF1ab and S phylogenetic trees were subject to multiple alignment analysis, it was shown that for the five strains with possible recombination revealed by bootscan analysis, the possible recombination sites were confirmed by multiple alignments. For four strains (accession No. KT368829, KT368830, KT368831 and KT368832), sequences upstream to the possible recombination site were similar to that of KT368875 from lineage B5, while those downstream to the possible recombination site were similar to that of KT368860 from lineage B3 ( Figure S2A ). As for the remaining strain (accession No. KT368887), sequence upstream to the possible recombination site was similar to that of KT368878 from lineage B3, while sequence downstream to the possible recombination site was similar to that of KT368890 from lineage B4 ( Figure S2B ). To assess adaptive evolution of MERS-CoV, dN/dS ratios (ω) in S gene across the 219 strains were calculated on codon-by-codon basis. The overall ω was 0.424737 (0.392404 and 0.439492 for clade A and B strains, respectively), with most codons having ω < 1, indicating purifying selection (Table 2) . Nevertheless, four codons were predicted to have ω > 1 by different methods with statistical significance, including codons 26 and 312 by fixed effects likelihood (FEL) method, codon 1250 by mixed-effects model of evolution (MEME) method, and codon 1224 by both FEL and MEME methods, indicating possible functional constraints at these positions during evolution. Among them, codons 1224 and 1250 were also predicted to be positively selected sites among clade B strains by at least one method (Table 2 ). In addition, codon 879 was predicted to be positively selected among clade A strains, whereas codons 312 and 1020 were predicted to be positively selected among clade B strains by at least one method (Table 2) . Furthermore, codons 1250 and 1020 were predicted to be positively selected among clade B3 and B5 strains, respectively, by MEME method (Table 2 ). However, none of these amino acid residues were within the receptor binding domain of MERS-CoV, suggesting that there has not been strong selective pressure at the receptor-binding interphase during virus evolution. The models of spike glycoprotein from each clade were built. It was observed that the amino acid substitutions observed by multiple alignments appeared to be randomly distributed throughout the structure (Figure 6 ). Based on the model, we did not observe specific evolutionary patterns of the MERS-CoV spike protein. However, intriguingly, the residue at position 1020 can distinguish between clade A and B (Figure 4 ). The glutamine (Q1020) and arginine residues (R1020) at that position are highly conserved in clade A and clade B, respectively, except for a histidine residue (H1020) found in 12 lineage B5 strains. Together with results from selective pressure analysis, this suggests that this amino acid residue may play a role during the molecular evolution of S protein of MERS-CoV and divergence of different lineages. 1224 and 1250 were also predicted to be positively selected sites among clade B strains by at least one method ( Table 2 ). In addition, codon 879 was predicted to be positively selected among clade A strains, whereas codons 312 and 1020 were predicted to be positively selected among clade B strains by at least one method (Table 2) . Furthermore, codons 1250 and 1020 were predicted to be positively selected among clade B3 and B5 strains, respectively, by MEME method (Table 2 ). However, none of these amino acid residues were within the receptor binding domain of MERS-CoV, suggesting that there has not been strong selective pressure at the receptor-binding interphase during virus evolution. The models of spike glycoprotein from each clade were built. It was observed that the amino acid substitutions observed by multiple alignments appeared to be randomly distributed throughout the structure ( Figure 6 ). Based on the model, we did not observe specific evolutionary patterns of the MERS-CoV spike protein. However, intriguingly, the residue at position 1020 can distinguish between clade A and B (Figure 4 ). The glutamine (Q1020) and arginine residues (R1020) at that position are highly conserved in clade A and clade B, respectively, except for a histidine residue (H1020) found in 12 lineage B5 strains. Together with results from selective pressure analysis, this suggests that this amino acid residue may play a role during the molecular evolution of S protein of MERS-CoV and divergence of different lineages. The divergence time of 138 selected MERS-CoV strains based on S gene analysis is shown in Figure 7 . Strains with similar sequences or collection dates were excluded. For example, only 9 out of 17 strains from South Korea were included. On the basis of time-resolved phylogeny, the root of the tree was dated to The divergence time of 138 selected MERS-CoV strains based on S gene analysis is shown in Figure 7 . Strains with similar sequences or collection dates were excluded. For example, only 9 out of 17 strains from South Korea were included. On the basis of time-resolved phylogeny, the root of the tree was dated to Among all known CoVs, the largest numbers of complete/near-complete genomes are available for SARS-CoV and MERS-CoV, of which >300 and >200 genomes, respectively, are available for analysis. Such large number of genomes has given us a good opportunity to improve our Among all known CoVs, the largest numbers of complete/near-complete genomes are available for SARS-CoV and MERS-CoV, of which >300 and >200 genomes, respectively, are available for analysis. Such large number of genomes has given us a good opportunity to improve our understanding on the origin and evolution of the corresponding viruses. In contrast to the major SARS outbreak in 2003 which is of monophyletic origin, MERS-CoVs circulating in human and camels since 2012 originated from multiple sources in the Middle East and North Africa. For example, strains from a single country such as the UAE are present in lineages A2, B1, 3, 4, and 5, and strains from Jordan are also present in lineages A1, B4, and B5. This indicated that different MERS-CoV strains have resided in dromedaries in different countries with sporadic transmission to humans, sometimes causing fatal diseases. Overall, some minor recombination events between different sublineages (between B5 and B3, and between B3 and B4) might have occurred (Figures 5 and 6 ), but it seems that none of these events has resulted in significant changes in the genome. This is in contrast to some CoVs such as human CoV HKU1, human CoV OC43, or canine CoVs where recombination events are responsible for the generation of novel genotypes [16] [17] [18] . While there is evidence suggesting that MERS-CoV may have infected camels for at least a few decades, it is intriguing that the present molecular clock analysis using available dated MERS-CoV genome sequences suggests a relatively recent common ancestor dated to approximately 2010. Based on existing evidence, there are two possibilities in the evolutionary pathway of MERS-CoV. The first possibility is that MERS-CoV has resided in camels for long time before infecting humans to cause the epidemic in 2012. The alternative hypothesis is that MERS-CoV has only emerged in camels recently from an unknown animal reservoir, and was spilled over to humans because of efficient receptor binding to human dipeptidyl peptidase 4 when a sufficient proportion of camel population was infected. The first hypothesis is supported by two previous serological studies showing that MERS-CoV neutralizing antibodies in archived sera samples of dromedaries in Eastern Africa and Kenya can be detected as early as 1980s and 1990s [19, 20] . Moreover, the lack of positive selection at the receptor binding domain of spike protein of MERS-CoV may suggest the virus has been well adapted in camels and humans for a period of time. However, several questions were not readily answered by this hypothesis. First, since genomes of MERS-CoVs from humans and camels were intermingled and closely related without major adaptation before transmission to humans, one would expect similar camel-to-human transmission efficiency before the epidemic in 2012. However, no human cases before 2012 were reported so far, and more importantly, MERS-CoV has not been detected in any archived human or camel samples before 2012. Nevertheless, there is a possibility that early human cases may have been missed if they were not investigated for coronaviruses. Yet, if MERS-CoV has been circulating in camels for long time, one would expect a much earlier estimated tMRCA by molecular clock analysis using available MERS-CoV genome sequences. Although the present study may be limited by the lack of "historical" strains of MERS-CoV for analysis, the diverse circulating strains detected since 2012 should have reflected the history of evolution of MERS-CoV and allow a fair estimation of tMRCA. Nevertheless, isolation of MERS-CoV or detection of viral RNA from archived samples will be important to confirm if camels are indeed long-time reservoir of MERS-CoV. The evolutions of clade A and clade B MERS-CoV are quite different from each other. Intriguingly, clade B MERS-CoV out-numbered clade A MERS-CoV strains (Figure 2 ). This may be a result of sampling bias or a genuine survival or transmission advantage of clade B MERS-CoV. As for the specific lineages, the S protein of lineage A2 MERS-CoV showed higher number of amino acid changes as compared to other lineages. After the virus has evolved into clades A and B, we speculate that lineage B5 diverged and evolved separately from the other clade B lineages, as demonstrated by its deep branch in the phylogenetic tree as well as the unique amino acid change at position 1020 of the S2 subunit in S protein (Figures 4 and 6) . Moreover, this residue at 1020 also distinguishes between clade A and B strains, and was positively selected among B and B5 strains, suggesting a role in the divergence of different lineages along the evolution of spike protein. Interestingly, this residue is located at the HR region of S2 which is responsible for virus-host fusion. Further studies have to be performed to understand whether clade B MERS-CoV have better survival advantages over clade A MERS-CoV, and the potential effect of amino acid substitution at codon 1020 on viral fusion. In addition to understanding the evolutionary histories of the corresponding viruses, genome analysis has provided important insights to the approach of controlling the clinical diseases of SARS and MERS. The major SARS outbreak in humans in early 2003 originated from a single or very few civet-to-human interspecies jumping events in Southern China, which explains the monophyletic evolution of strains involved in this outbreak. As a result, the epidemic was successfully controlled when the animal source was segregated from humans after closure of wild life wet markets in Southern China. On the other hand, MERS-CoV had been endemic in dromedary population in the Middle East during the outbreak in 2012 with many independent, sporadic camel-to-human transmission events in Saudi Arabia and the neighboring countries. Since MERS-CoV is found in dromedary calves in most countries in the Middle East and Africa, it would be impossible to control the disease by closure of a limited number of facilities in a restricted geographical region. For visitors to the Middle East, it is important to improve education so that they will avoid contacts with dromedaries, particularly dromedary calves. For residents of the Middle East, more viable options may be through the development of rapid laboratory diagnostics, antivirals, and vaccines for treatment and prevention of MERS. Up to 10 March 2017, a total of 223 "complete" genome sequences of MERS-CoV were found in GenBank. Four of these 223 MERS-CoV "complete" genome sequences have an incomplete 5 genomic region or are redundant sequences of a single strain from GenBank, and therefore were not included in the analysis. As a result, 219 complete genome sequences of MERS-CoV, including 119 from MERS-CoV found in humans and 100 from MERS-CoV in dromedaries, were included in the analysis. The 219 MERS-CoV genomes were aligned by multiple alignments using fast fourier transform. Maximum-likelihood phylogenetic trees with 1000 bootstrap replicates were constructed using PhyML v3.0 (The French Institute of Bioinformatics & France Genomique, Montpellier, France) on the basis of their complete genomes as well as ORF1ab and S genes. The best-fit substitution model was selected using PhyML with Smart Model Selection and used in the maximum-likelihood analysis. To detect possible recombination, bootscan analysis with 1000 bootstrap replicates was performed by using the nucleotide alignment of the genome sequences of MERS-CoV and Simplot version 3.5.1 (SCRoftware, USA), as previously described [16, 17] . The analysis was conducted using a sliding window of 1000 nucleotides moving in 200 nucleotide steps with genome sequences obtained in the present study as the query. Possible recombination sites suggested by the bootscan analysis were confirmed through multiple sequence alignments. For genomes that showed possible recombination in bootscan analysis, the nucleotide sequences of the corresponding genomes were analyzed by multiple alignment using ClustalW available in BioEdit Sequence Alignment Editor Version 7.2.5 (Ibis Therapeutics, Carlsbad, CA, USA). Nucleotide mutations in the genomes were revealed. The number of synonymous substitutions per synonymous site, dS, and non-synonymous substitutions per non-synonymous site, dN, in S were calculated. Sites under positive selection were inferred using different methods as described in previous publications [21] [22] [23] [24] . First, single-likelihood ancestor counting (SLAC) and fixed effects likelihood (FEL) methods were used as implemented in DataMonkey server (Available online: http://www.datamonkey.org). The overall ω (dN/dS) value was calculated according to NJ trees under the custom (010121 with AIC of 16604.47594182065) substitution model. The ω (dN/dS) values for different clades and lineages were calculated according to NJ trees under either the F81 model or the custom substitution model (010121, 010011, 001020, or 000010 with various AIC). Positive selection for a site was considered to be statistically significant if p-value was <0.1. A mixed-effects model of evolution (MEME) was further used to identify positively selected sites under episodic diversifying selection in particular positions among different clades within a phylogenetic tree, even when positive selection was not evident across the entire tree. Models of representative sequences of the S proteins from each clade were built with the structure of MERS-CoV S protein (5X59) using homology modeling (SWISS-MODEL) with default parameters. The models were then superimposed and analyzed using Discovery Studio visualizer (Accelrys) (BIOVIA, San Diego, CA, USA). Divergence times for the MERS-CoV strains were calculated using a Bayesian Markov chain Monte Carlo (MCMC) approach implemented in BEAST (version 1.7.4) (BEAST Developers, Open Source, available online: beast.community/programs), as described previously [25, 26] . Representative strains were selected for MERS-CoV strains with close sequence similarity and obtained from the same outbreak. Analyses were performed under the SRD06 substitution models for the spike coding regions of the genome (S), with an uncorrelated exponential relaxed clock and a constant size coalescent model. The MCMC run was 5 × 10 7 steps long with sampling every 1000 steps. Convergence was assessed on the basis of the effective sampling size after a 10% burn-in using Tracer software, version 1.6.0 (Available online: http://tree.bio.ed.ac.uk/software/tracer/). The mean time to tMRCA and the highest posterior density regions at 95% (HPDs) were calculated. The trees were summarized in a target tree by using the Tree Annotator program included in the BEAST package by choosing the tree with the maximum sum of posterior probabilities (maximum clade credibility) after a 10% burn-in. Middle east respiratory syndrome coronavirus: Another zoonotic betacoronavirus causing sars-like disease Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Genomic characterization of a newly discovered coronavirus associated with acute respiratory distress syndrome in humans Seroepidemiology for MERS coronavirus using microneutralisation and pseudoparticle virus neutralisation assays reveal a high prevalence of antibody in dromedary camels in Egypt Novel betacoronavirus in dromedaries of the middle east Middle east respiratory syndrome coronavirus neutralising serum antibodies in dromedary camels: A comparative serological study Characterization of a novel betacoronavirus related to middle east respiratory syndrome coronavirus in european hedgehogs Comparative analysis of twelve genomes of three novel group 2c and group 2d coronaviruses reveals unique group and subgroup features Genetic characterization of betacoronavirus lineage C viruses in bats reveals marked sequence divergence in the spike protein of pipistrellus bat coronavirus HKU5 in Japanese pipistrelle: Implications for the origin of the novel middle east respiratory syndrome coronavirus Middle east respiratory syndrome coronavirus in bats, Saudi Arabia Receptor usage and cell entry of bat coronavirus HKU4 provide insight into bat-to-human transmission of MERS coronavirus Bat origins of MERS-CoV supported by bat coronavirus HKU4 usage of human receptor CD26 Polyphyletic origin of MERS coronaviruses and isolation of a novel clade a strain from dromedary camels in the United Arab Emirates Preliminary epidemiological assessment of MERS-CoV outbreak in South Korea Origin and possible genetic recombination of the middle east respiratory syndrome coronavirus from the first imported case in China: Phylogenetics and coalescence analysis Comparative analysis of 22 coronavirus HKU1 genomes reveals a novel genotype and evidence of natural recombination in coronavirus HKU1 Molecular epidemiology of human coronavirus OC43 reveals evolution of different genotypes over time and recent emergence of a novel genotype due to natural recombination Discovery of a novel canine respiratory coronavirus support genetic recombination among betacoronavirus 1 Enterovirus D68 infections associated with severe respiratory illness in elderly patients and emergence of a novel clade in Hong Kong Detecting individual sites subject to episodic diversifying selection Not so different after all: A comparison of methods for detecting amino acid sites under selection Datamonkey: Rapid detection of selective pressure on individual sites of codon alignments Beast: Bayesian evolutionary analysis by sampling trees Discovery of a novel coronavirus, china rattus coronavirus HKU24, from Norway rats supports the murine origin of betacoronavirus 1 and has implications for the ancestor of betacoronavirus lineage A The authors declare no conflict of interest.