key: cord-0010233-9lliq4nu
authors: Li, Kemang; Yan, Shiyu; Wang, Ningning; He, Wanting; Guan, Haifei; He, Chengxi; Wang, Zhixue; Lu, Meng; He, Wei; Ye, Rui; Veit, Michael; Su, Shuo
title: Emergence and adaptive evolution of Nipah virus
date: 2019-09-10
journal: Transbound Emerg Dis
DOI: 10.1111/tbed.13330
sha: aa8e48f1afd82bbfd319e3847721a6f5a6a4f16e
doc_id: 10233
cord_uid: 9lliq4nu

Since its first emergence in 1998 in Malaysia, Nipah virus (NiV) has become a great threat to domestic animals and humans. Sporadic outbreaks associated with human‐to‐human transmission caused hundreds of human fatalities. Here, we collected all available NiV sequences and combined phylogenetics, molecular selection, structural biology and receptor analysis to study the emergence and adaptive evolution of NiV. NiV can be divided into two main lineages including the Bangladesh and Malaysia lineages. We formly confirmed a significant association with geography which is probably the result of long‐term evolution of NiV in local bat population. The two NiV lineages differ in many amino acids; one change in the fusion protein might be involved in its activation via binding to the G protein. We also identified adaptive and positively selected sites in many viral proteins. In the receptor‐binding G protein, we found that sites 384, 386 and especially 498 of G protein might modulate receptor‐binding affinity and thus contribute to the host jump from bats to humans via the adaption to bind the human ephrin‐B2 receptor. We also found that site 1645 in the connector domain of L was positive selected and involved in adaptive evolution; this site might add methyl groups to the cap structure present at the 5′‐end of the RNA and thus modulate its activity. This study provides insight to assist the design of early detection methods for NiV to assess its epidemic potential in humans.

India, Bangladesh and the Philippines adding up to 666 human infections, 388 deaths and a mortality rate of around 60% according to the World Health Organization (WHO) and other recent reports (Arunkumar et al., 2018; Ching et al., 2015; Chua, 2003; Sourimant & Plemper, 2016) . Of note, in February 2018, NiV infection was listed by the WHO as one of the priority diseases posing a public health risk. In May 2018, a NiV outbreak was reported in Kerala, India, with 23 identified human cases (18 laboratory-confirmed cases) and 21 deaths, being the third NiV outbreak known to occur in India (Arunkumar et al., 2018) . Unlike the initial outbreaks in Malaysia and Singapore, human-to-human transmission played an important role in the spread of NiV during the outbreaks in India, Bangladesh and the Philippines (Arankalle et al., 2011; Arunkumar et al., 2018; Chadha et al., 2006; Ching et al., 2015; Gurley et al., 2007; Luby et al., 2009 ). An epidemiological investigation on Bangladesh human cases of Nipah virus infection during [2001] [2002] [2003] [2004] [2005] [2006] [2007] shows that more than half of human infections caused by human-to-human transmission (Luby et al., 2009 ). In addition to bats of the genus Pteropus, NiV also naturally infects animals more closely related to humans, such as pigs, goats, horses, dogs and cats (AbuBakar et al., 2004; Ching et al., 2015; Chua, 2003; Chua et al., 2000) . This wide host range may be due to the two cell receptors for NiV, ephrin-B2 and ephrin-B3, which are highly conserved across many species (Bonaparte et al., 2005; Negrete et al., 2005 Negrete et al., , 2006 Xu, Broder, & Nikolov, 2012) . Due to its high lethality, the lack of effective vaccines or treatments and the re-emergence of deadly zoonotic NiV in South and Southeast Asia that suggested human-to-human transmission, greater outbreaks of NiV might be possible in the future.

Given the ongoing infections of humans, NiV is considered to have pandemic potential. When stablishing in a new host, NiV has to adapt to novel conditions, a process that provides strong selection pressure. Little is known about the NiV genomic changes required for its transmission to humans, in line with the lack of knowledge on common genetic 'host jump' rules from bats to humans or to other mammals. Here, we combined phylogenetic with selection analysis and structural biology to understand the role of different NiV lineages in interspecies transmission and the role of adaptive evolution during transmission from bats to new hosts in relation to structural and functional changes. Moreover, we investigate a possible increase in pathogenicity and the ability for human transmission and the genetic and evolutionary dynamics of NiV from the Bangladesh and Malaysia lineages.

All the NiV sequences available in December 2018 in National Center for Biotechnology Information (NCBI) GenBank database (https ://www.ncbi.nlm.nih.gov/genba nk/) were included in the analysis. After deleting sequences from unknown sources or too short in length, the sequence dataset included 17 full-genome sequences, 113 nucleocapsid (N) coding sequences, 20 phosphoprotein (P) coding sequences, 23 matrix protein (M) coding sequences, 19 fusion protein (F) coding sequences, 21 glycoprotein (G) coding sequences and 16 polymerase protein (L) coding sequences (Table S1 ). The sampling dates ranged from 1999 to 2018.

Sequences were aligned using MUSCLE and manually adjusted within the MEGA software (version 7) (Edgar, 2004; Kumar, Stecher, & Tamura, 2016) . The best fit nucleotide substitution models were detected using the IQ-tree software (version 1.6.5) according to the Bayesian information criterion (BIC) score (Lam-Tung, Schmidt, Arndt, & Bui Quang, 2015) . The TempEst software (version 1.5.1) was used to analyse the root to tips distances against time (Rambaut, Lam, Max Carvalho, & Pybus, 2016 ).

To detect potential recombination events, 17 aligned genomic sequences and all N, P, M, F, G and L coding sequences were submitted to the Recombination Detection Program 4 (RDP4) (D. P. Martin, Murrell, Golden, Khoosal, & Muhire, 2015) . Seven different methods including RDP (Martin & Rybicki, 2000) , GENECONV (Padidam, Sawyer, & Fauquet, 1999) , Chimaera (Posada & Crandall, 2001) , MaxChi (Smith, 1992) , BootScan (Martin, Posada, Crandall, & Williamson, 2005) , SiScan (Gibbs, Armstrong, & Gibbs, 2000) and 3Seq (Boni, Posada, & Feldman, 2007) with default settings were used for recombination signal detection. The highest acceptable pvalue was set to 0.05. Only recombination results confirmed by four or more methods are displayed. Recombination events were further identified using SimPlot software (version 3.5.1) (Lole et al., 1999) .

Maximum likelihood (ML) trees were constructed in RAxML software (version 8.4.10) (Stamatakis, 2014) using the general time-reversible Importance NiV was identified by the World Health Organization (WHO) as a likely cause of a future pandemic. In South and Southeast Asia, it has already been transmitted several times from bats to humans with the resulting outbreaks being associated with human-to-human transmission and a high mortality rate. Using all available sequence data, we performed a combined bioinformatics study to analyse its adaptive evolution.

We also identified amino acids in many viral proteins that might be associated with the host jump from bats to humans.

The results obtained can assist the implementation of surveillance systems in the affected countries. plus gamma (GTR + G) distribution model or the Hasegawa-Kishino-Yano model plus gamma (HKY + G) distributed rate heterogeneity nucleated substitution models and 1,000 bootstraps. In addition, maximum clade credibility (MCC) trees were reconstructed using BEAST software (version 1.8.4) (Drummond & Andrew, 2007) , with the GTR + G, uncorrelated lognormal relaxed clock and coalescent:

Bayesian SkyGrid model chosen according to Bayes factor and Marginal Likelihood methods (Li et al., 2018) . The tip dates were estimated according to the time of virus isolation or sequencing with the format of year-mouth. Markov Chain Monte Carlo (MCMC) sampling was run for 1 × 10 8 generations, with trees and posteriors sampled every 1 × 10 4 steps. Two independent runs were combined using LogCombiner (He, Auclert, et al., 2019) . The final tree was summarized using Tree Annount software and displayed using FigTree (version 1.4.7).

The Bayesian Tip-Significance testing software (BaTS) was used to analyse the correlation between each NiV sequence and geographical location (Parker, Rambaut, & Pybus, 2008) . The NiV geographic structure was defined according to countries, including Malaysia, Bangladesh, India, Cambodia and Thailand. The association index (AI) and parsimony score (PS) statistics were calculated using the MCC trees of NiV N gene. When the p-values of AI and PS were less than .05, the correlation between NiV and geographical distribution was considered significant .

To locate positively selected and adaptive sites in the L, G and F protein and all the amino acid differences between two virus lineages in the G and F proteins, we created figures with PyMol (Molecular Graphics System, version 2.0 Schrödinger, LLC, https :// pymol.org/2/). For G, we used the pdb file 3D12, which contains the structure of the ectodomain of NiV G protein (residues 71-602) bound to the mouse ephrin-B3 (residues 30-170) (Xu et al., 2008) and the pdb file 2VSM which is the structure of G (residues 188-606) bound to the human ephrin-B2 receptor (residues 31-170) . The prefusion structure of the F protein was visualized using the pdb file 5EVM (Xu et al., 2015) .

Since no post-fusion structure of NiV F is available, we used the structure of F from the related paramyxovirus Newcastle Disease Virus (pdb file 3MAW) (Swanson et al., 2010) , which has ~50% amino acid similarity with NiV F. Likewise, since no structure of L from NiV (or from any other paramyxovirus) is available, we used the structure of L from Vesicular Stomatitis virus (pdb file 5A22) (Liang et al., 2015) and identified the positively selected and adaptive sites by sequence alignment. Determination of the distance of a salt bridge was done with the measurement wizard tool of the PyMol software.

Selection analysis was performed by uploading the ML trees and the sequences to DATAMONKEY (www.datam onkey.org). The fixed effects likelihood (FEL), single-likelihood ancestor counting (SLAC), fast unconstrained Bayesian approximation (FUBAR), mixed effects model of evolution (MEME) were the algorithms used to identify sites under selection. Significance level was set with p-value threshold of 0.1 for FEL, SLAC and MEME and with posterior probability of 0.9 for FUBAR. A site detected by more than two algorithms was considered under selection. An adaptive branch-site REL test for episodic diversification (aBSREL) was used to detect positively selected branches (Kosakovsky Pond & Frost, 2005; Murrell et al., 2013; Murrell et al., 2012; Smith et al., 2015) . We also split the N, P, G, F, L, M genes into the Bangladesh and the Malaysian lineages and reconstructed the common ancestor amino acid sequence of each lineage independently. The ML method implemented in CODEML of PAML (version 4.8) was used to reconstruct the ancestral amino acid state.

Potential adaptive sites were defined as changes which dominated in another (non-bat) host, sequences that were different from other dominant amino acids in pigs and ancestral amino acids. The association of potential adaptive sites and phenotype (host jump) using counts of dominant amino acids in pigs and other host sequences was determined using the chi-square test. The statistical significance was tested using the method described by He et al. (article In press ).

Given that two bat species from the Pteropus genus (P. alecto and P. vampyrus) from where NiV were isolated revealed no amino acid differences in the ephrin-B2 and no differences in the part of ephrin-B3 that is present in the crystal structure, we can assume that the two receptors are highly conserved between bat species.

Ephrin-B2 and ephrin-B3 sequences from the Pteropus vampyrus bat (NP_001292125.1 and ABV44497.1) were aligned with the human ephrin-B2 and the mouse ephrin-B3 sequences, respectively. Only a few amino acid differences were found, which were labelled in the structure of their mammalian ortholog.

Only one possible recombination event occurred in NiV genomic sequences was listed by RDP4. Further analysis performed by SimPlot 3.5.1 indicated that the event was a false-positive result (data not shown). Since there was no recombination event interfering the construction of phylogenetic trees, all full-genome sequences and sequences of each gene were used to reconstruct ML trees. N gene tree has apparently more complex structure than full-genome tree since more N sequences were used in the analysis. In both ML trees, NiV all could be divided into two main lineages: the Bangladesh and Malaysia lineages, and only NiV from the Bangladesh lineage circulate in India and Bangladesh while in Malaysia only NiV from the Malaysia lineage circulate (Figure 1 ), as reported previously (Lo et al., 2012; Lo Presti et al., 2016; Rahman et al., 2010) . Analysis based on a large number of N coding sequences (Figure 1b) revealed human and bat-derived NiV in Bangladesh lineage and a more complicated structure in the Malaysia lineage including viruses derived from multiple hosts (bat, swine and human), which is corresponding to the different transmission modes of two lineages (Av et al., 2018) . Of note, NiV belonging to different lineages was observed in local bat population of Thailand and Cambodia, and this phenomenon in Thailand was previously reported while NiV from Cambodia bats was formerly thought to only belong to Malaysian lineage, which could be the result of the analysis of additional Cambodia bats sequences obtained in 2013 (Lo et al., 2012; Reynes et al., 2005; Wacharapluesadee et al., 2010 Wacharapluesadee et al., , 2005 Wacharapluesadee et al., , 2016 . In addition, NiV could be divided into these two lineages based on ML trees reconstructed based on other coding sequences ( Fig S1) , which is similar to the results of previous studies (Lo et al., 2012) . BaTS analysis revealed that the NiV p-value of AI and PS was less than .05 (Table 1) and apart from the MC p-value of Cambodia which was 1, the p-values of other countries were less than .05. This is consistent with the structure of phylogenetic trees and indicates a significant geographic association.

Next, we reconstructed the NiV evolution dynamics based on the N gene. Based on the MCC tree, we can conclude that the two NiV lineages were associated with independent epidemics (Figure 2 ).

The time to the most recent common ancestor (tMRCA) of NiV was estimated to be 1992.63 (95% HPD: 1985 (95% HPD: .55-1997 HPD: 1995 HPD: .60-2002 ). Additionally, the NiV evolutionary rate was 1.10 × 10 -3 substitutions/site/year (95% HPD: 7.34 × 10 -4 -1.50 × 10 -3 substitutions/site/year) based on the N gene. In particular, the Bangladesh lineage had a mean 6.50 × 10 -3 substitutions/ site/year (95% HPD: 6.03 × 10 -9 -1.60 × 10 -2 substitutions/site/year) while the Malaysia lineage had a mean 1.43 × 10 -2 substitutions/site/ year (95% HPD: 4.98 × 10 -8 -6.40 × 10 -2 substitutions/site/year). To understand the population size of NiV, the Bayesian SkyGrid coalescent was reconstructed. We found that the population size of NiV fluctuated in the past 20 years, but overall it has remained at the same level. 

The G protein is a type II membrane protein consisting of an N-terminal intraviral domain (~50 residues), one transmembrane region (~20 residues), a helical stalk region (~100 residues) and a head domain (residues 176-603) that folds into a ß-propeller with six blades surrounding a central cavity (Bowden, Crispin, et al., 2008; Xu et al., 2008) . The head region binds to the cellular receptor ephrin-B2 or ephrin-B3 (Bonaparte et al., 2005; Bowden, Crispin, et al., 2008; Negrete et al., 2005 Negrete et al., , 2006 Xu et al., 2008) . It is currently believed that receptor-binding transduces a signal to the stalk region which then activates the viral F protein leading to conformational changes that result in membrane fusion.

F I G U R E 2 Maximum clade credibility (MCC) tree and skygrid plot based on the N gene. (a) The MCC tree was reconstructed using BEAST (version 1.8.4). The GTR + G distribution model and the coalescent: Bayesian skygrid model with a total chain length of 1 × 10 9 and sampled every 1 × 10 4 times. Host and country of NiV isolates are indicated with inner and outer coloured rectangular boxes, respectively. In virus particles, G forms a tetrameric spike during intracellular transport and at the cell surface might interact with the F protein (Bose, Jardetzky, & Lamb, 2015) . The amino acid locations, which vary between the Bangladesh and the Malaysia lineage of bat-derived viruses and are neither adaptive nor positive selected sites are shown in Table S2 and Figure 3a as magenta sticks within the head domain of G (blue cartoon) bound to ephrin-B3 (green cartoon).

Most of them are located at the proposed interaction surface between G monomers and thus might contribute to oligomerization of G (Bowden, Crispin, et al., 2008) . N481 is part of the used glycosylation site N 481 NT (Bowden, Crispin, et al., 2008) , which is exchanged to D in all strains from the Bangladesh lineage and in three out of 13 strains from the Malaysia lineage which therefore lack a carbohydrate at this site.

The F protein, which forms a trimer, is a typical type I transmembrane protein, which is proteolytically cleaved by cathepsin L into the N-terminal F2 subunit and the larger F1 subunit which carries the fusion peptide, the transmembrane region and a short C-terminal cytoplasmic tail (Xu et al., 2015) . To analyse where these amino acids are located in the post-fusion structure of F, we labelled the analog residues in the F protein of 

Residues 437 in P, 241 in M, 207 in F, 20 in G and 1645 in L were identified to be under positive selection (Table 3 ) by MEME and FUBAR. However, FEL only found positive selection at residue 1645 in L. Next, we identified the adaptive sites for transmission of NiV from bats to humans independently in the Bangladesh and Malaysia lineages. Of note, we found only one adaptive site, 436, in the N protein. Although six sites in G (288, 344, 376, 384, 386, 427 and 498) and site 1645 in L were not significantly associated with cross-species adaption, we also considered them as adaptive sites because the Bangladesh lineage had only one or none sequence from an infected bat (Table 4 ).

In the G protein, the positively selected site 20 is located in the cytoplasmic tail. The amino acid present at this site varies among (and is even deleted) some virus strains. In general, amino acids in cytoplasmic tails affect intracellular transport of G, support membrane fusion and are believed to be involved in interactions with the peripheral matrix proteins that are required for virus assembly (Sawatsky, Bente, Czub, & von Messling, 2016) . All of them are located at one side at the surface of the molecule opposite to the proposed interaction surface between G monomers (Sawatsky et al., 2016) . N288 (which is not a carbohydrate attachment site), R344, K376 and V427 are located too far away from the receptor-binding site on top of the molecule, and thus, these amino acid changes are unlikely to affect binding to ephrin-B3. K376 might modulate the proposed signalling caused by binding of G to the receptor from the head to the stalk domain which activates the fusion activity of the F protein (Wong et al., 2017) . Residue 289 has been shown to change upon selection of G-mutants with a monoclonal antibody, and N288 might thus be part of this antibody epitope.

Likewise, residues 384 and 386 are also part of an antibody epitope (White et al., 2005) . To our knowledge, no functions have been associated with the other amino acids.

The adaptive sites I384, K386 and T498 do not directly interact with ephrin-B3, but are located close to the receptor-binding site, which is depicted in higher magnification in Figure 3b . A loop in the structure of ephrin forms a shallow but extensive protein-protein interaction surface that is buried deeply in a hydrophobic pocket on the G surface. These hydrophobic interactions are assisted by amino acids (shown as red sticks) which form salt bridges or hydrogen bonds with ephrin-B3. Especially interesting in this regard is E501, which forms a salt bridge with R106 in ephrin-B3 and Q388

and Y389 that form hydrogen bonds with D108 (Xu et al., 2008) .

The same network of interactions is also involved in binding of ephrin-B2, except that R106 is replaced by K . Thus, mutations at residues 384, 386 and 498 located in close proximity to these sites might affect the strength of the G-receptor interaction. Two sites that differ between the two virus lineages (T385 and I502, which are A and V, respectively, in the Bangladesh lineage) are also located in this region, but they have not been identified as adaptive sites by our analysis. The positively selected site 207 is located in the interior of the head domain of the F protein within a α-helix of the DIII domain. This region of the molecule does not refold, and no function has been associated with this residue (Figure 4) . 

Based on previous structural analyses of G protein and its interaction with ephrin-B2 and ephrin-B3 Bowden, Crispin, et al., 2008; Xu et al., 2008) , we then asked whether there are amino acid differences between the bat and the other mammalian ephrin-B2 and ephrin-B3 receptors that might require changes in G in order to adapt from bat receptors to human receptors. Alignment of the other mammalian and bat ephrin-B3 sequences revealed two amino acid changes: S75 changes to N in bat and E85 is replaced by G in bat (labelled as red sticks in Figure 3b ).

Although these are non-conservative changes, they are too far away from the part of ephrin-B3 contacting G and thus are unlikely to affect virus binding. In ephrin-B2, three conservative amino acid differences became apparent: T93 is replaced by S, I111 by V and K106 by R in the bat receptor (labelled as red sticks in Figure 3c , except K106 which is a purple stick). Especially, interesting is residue 106, which forms a salt bridge with E510 in G. This salt bridge is also present in the structure of G bound to ephrin-B3, but here a K instead of an R is present. Both of these basic and positively charged amino acids are apparently capable to form a salt bridge with E501, but since the side chain of K is larger than the side chain of R, the distance of this salt bridge is shorter (2.8 Å) in G bound to mammalian ephrin-B2 compared with G bound to mammalian ephrin-B3 (4.3 Å).

In the bat ephrin-B2, the shorter R is present and thus the adaptive sites I384, K386 and especially T498 might modulate the interaction TA B L E 3 Positively selected sites in P, M, F, G, L coding sequences of NiV 

Since its emergence in 1998-1999 in Malaysia, NiV has reappeared in several South and Southeast Asian countries ( Figure S2 ), includ- (Breed et al., 2013; Ching et al., 2015; Sendow et al., 2013 Figure S2 ).

The tMRCA was later than previously reported probably due to the larger number of analysed sequences (Sun, Jia, Liang, Chen, & Liu, 2018) . On the other hand, the NiV evolutionary rate was similar to some other important zoonotic RNA viruses such as Ebola virus (Yi-Gang et al., 2015) . Thus, NiV is highly variable, which makes disease prevention, control and vaccine development difficult. Of note, we found a significant relationship between NiV and geography, except for Cambodia, which may be due to the limited number of available sequences from this region. This indicates that once NiV is epidemic in one area, it differentiates into a new lineage that adapts to the local background. This adaption also causes a fast rate of evolution.

Given that different NiV lineages differ in their ability for human-to-human transmission, we conducted mutation, selection and adaptive evolution analysis and related changes to structural and functional modifications in particular for the G protein and its receptors, ephrin-B2 and/or ephrin-B3. None of the G protein adaptive sites are in direct contact with amino acids of ephrin-B2 or ephrin-B3. However, adaptive amino acids were identified near the second interaction site which comprises a hydrogen-bonding network between Q388 and Y389 in G and the negatively charged D108 in both ephrin-B2 and ephrin-B3 and a salt bridge between negatively charged E501 in G and a basic residue at position 106 in ephrin. Interestingly, the identity of the basic amino acid varies: in bats and other mammalians ephrin-B3 and in the bat ephrin-B2 it is an arginine, whereas in the mammalian ephrin-B2 a lysine is present. Since the side chain of lysine is longer, the distance between E501 in G and the basic residue in ephrin becomes shorter and as a consequence probably the strength of the interaction increases. We speculate that for bat-derived NiV to adapt to the human ephrin-B2

receptor the adaptive sites at position 384, 386 and especially 498 might modulate the receptor-binding affinity and might thus contribute to a host jump. Interestingly, the G protein of a henipavirus recently isolated from a bat in Africa does not contain this second interaction site since the amino acids at an equivalent position to Q388 and Y389 do not form hydrogen bonds with D108 in ephrin.

As a consequence, the G protein of African bat henipavirus shows decreased ephrin-B2 binding relative to the G protein of NiV (Lee et al., 2015) . Thus, hydrogen bonds at the secondary binding site impart affinity and stabilize the receptor-bound complex and even small differences in receptor-binding can translate into significant differences in the efficiency of infection.

On the other hand, the function of the positively selected site 207 in the F protein and the variant amino acid 273 between two lineages is hard to predict. However, the F protein contains another interesting amino acid difference between the two NiV lineages.

Residue 42 is located within the so-called strap region that is supposed to be the binding site for the receptor-bound G protein and also very close to the fusion peptide in the trimeric structure of F ( Figure 3) . Residue 42 is a valine in the Malaysia lineage of the batderived virus, but an isoleucine in the Bangladesh lineage which is also hydrophobic with a larger side chain. It is thus tempting to speculate that residue 42 affects the exposure of the fusion peptide after activation of F by binding to the G protein and thus virus entry into cells.

The positively selected and adaptive site in the L protein is located in the connector domain, which consists of a bundle of eight helices. No specific function has been assigned to this domain; however, it seems to play an organizational role in positioning or spacing the catalytic domains. Both sides of the connector domain contain unstructured linkers, which are supposed to bind to the P protein and might modulate binding. Alternatively, since position 1,471 is located close to the methyl transferase domain, which adds methyl groups to the cap structure present at the 5´-end of the RNA, residue 1,471 might modulate this activity.

The genetic polymorphisms of NiV may be associated with virus circulation, infectivity and antigenic variability. When NiV jump from bats to humans, they face new selection pressures from their new host environment. In particular, antigenic variability is critical to escape the host immune response. Three of the adaptive sites we found in the G protein, residues 288, 384 and 386, are part of a known antibody epitope (White et al., 2005) . NiV adaptation to humans probably depends on the stepwise accumulation of potentiating mutations that favour the emergence of a particular adaptive mutation, similar to the mutations described for the adaption of avian influenza virus to humans (Imai et al., 2018; Su et al., 2017) .

Therefore, the mutational panel provided here might be very useful as an early detection system for transitional stages in the NiV evolution before it acquires full pandemic potential. We also identified several amino acid changes between two virus lineages which may affect receptor binding and hence transmission. The application of these findings is invaluable not just for veterinarians/virologists but also for public health officers, as the threat of a more serious NiV pandemic is real.

This work was financially supported by the National Key Research 

The authors declare no conflict of interest.

Our paper is an evolution and bioinformatics analysis paper, all the date come from genBank, we no need to add the ethical statement about the sample.

https://orcid.org/0000-0003-0187-1185

Isolation and molecular identification of Nipah virus from pigs

Genomic characterization of Nipah virus

Outbreak investigation of Nipah Virus Disease in Kerala

Nipah Virus Infection

Ephrin-B2 ligand is a functional receptor for Hendra virus and Nipah virus

An exact nonparametric method for inferring mosaic structure in sequence triplets

Timing is everything: Fine-tuned molecular machines orchestrate paramyxovirus entry

Structural basis of Nipah and Hendra virus attachment to their cell-surface receptor ephrin-B2

Crystal structure and carbohydrate analysis of Nipah virus attachment glycoprotein: A template for antiviral and vaccine design

The distribution of henipaviruses in Southeast Asia and Australasia: Is Wallace's line a barrier to Nipah virus?

Nipah virus-associated encephalitis outbreak

Outbreak of henipavirus infection

Nipah virus outbreak in Malaysia

Nipah virus: A recently emergent deadly paramyxovirus

BEAST: Bayesian evolutionary analysis by sampling trees

MUSCLE: A multiple sequence alignment method with reduced time and space complexity

Sister-scanning: A Monte Carlo procedure for assessing signals in recombinant sequences

Person-to-person transmission of Nipah virus in a Bangladeshi community

Interspecies transmission, genetic diversity, and evolutionary dynamics of pseudorabies virus

Emergence and adaptation of H3N2 canine influenza virus from avian influenza virus: An overlooked role of dogs in interspecies transmission

Genetic analysis and evolutionary changes of Porcine circovirus 2

Diversity of influenza A(H5N1) viruses in Infected humans

Not so different after all: A comparison of methods for detecting amino acid sites under selection

MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets

IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies

Molecular recognition of human ephrinB2 cell surface receptor by an emergent African henipavirus

Origin, Genetic Diversity, and Evolutionary Dynamics of Novel Porcine Circovirus 3

Structure of the L protein of vesicular stomatitis virus from electron cryomicroscopy

Characterization of Nipah virus from outbreaks in Bangladesh

Origin and evolution of Nipah virus

Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination

Recurrent Zoonotic Transmission of Nipah Virus into Humans

RDP4: Detection and analysis of recombination patterns in virus genomes

A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints

RDP: Detection of recombination amongst aligned sequences

FUBAR: A fast, unconstrained Bayesian AppRoximation for inferring selection

Detecting individual sites subject to episodic diversifying selection

EphrinB2 is the entry receptor for Nipah virus, an emergent deadly paramyxovirus

Two key residues in ephrinB3 are critical for its use as an alternative receptor for Nipah virus

Possible emergence of new geminiviruses by frequent recombination

Correlating viral phenotypes with phylogeny: Accounting for phylogenetic uncertainty

Evaluation of methods for detecting recombination from DNA sequences: Computer simulations

Characterization of Nipah virus from naturally infected Pteropus vampyrus bats

Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen)

Nipah Virus in Lyle's Flying Foxes

Morbillivirus and henipavirus attachment protein cytoplasmic domains differently affect protein expression, fusion support and particle assembly

Nipah Virus in the Fruit Bat Pteropus vampyrus in Sumatera

Emerging trends of Nipah virus: A review

Analyzing the mosaic structure of genes

Less is more: An adaptive branch-site random effects model for efficient detection of episodic diversifying selection

Organization, function, and therapeutic targeting of the morbillivirus RNA-dependent RNA polymerase complex

RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies

Epidemiology, evolution, and pathogenesis of H7N9 influenza viruses in five epidemic waves since 2013 in China

Epidemiology, genetic recombination, and pathogenesis of coronaviruses

Phylogeography, transmission, and viral proteins of Nipah virus

Structure of the Newcastle disease virus F protein in the post-fusion conformation

A longitudinal study of the prevalence of Nipah virus in Pteropus lylei bats in Thailand: Evidence for seasonal preference in disease transmission. Vector-Borne and Zoonotic Diseases

Bat Nipah virus

Molecular characterization of Nipah virus from Pteropus hypomelanus in Southern Thailand

Location of, immunogenicity of and relationships between neutralization epitopes on the attachment protein (G) of Hendra virus

Monomeric ephrinB2 binding induces allosteric changes in Nipah virus G that precede its full activation

Ephrin-B2 and ephrin-B3 as functional henipavirus receptors

Crystal structure of the pre-fusion Nipah virus fusion glycoprotein reveals a novel hexamer-of-trimers assembly

Host cell recognition by the henipaviruses: Crystal structures of the Nipah G attachment glycoprotein and its complex with ephrin-B3

Genetic diversity and evolutionary dynamics of Ebola virus in Sierra Leone

Additional supporting information may be found online in the Supporting Information section at the end of the article. How to cite this article