key: cord-0971828-h8ldbzip authors: Lin, Serena Y. C.; Magalis, Brittany Rife; Salemi, Marco; Liu, Hsin‐Fu title: Origin and dissemination of hepatitis B virus genotype C in East Asia revealed by phylodynamic analysis and historical correlates date: 2018-10-17 journal: J Viral Hepat DOI: 10.1111/jvh.13006 sha: 1e7440ce38d40e5f12edf201728f3742f9c082bc doc_id: 971828 cord_uid: h8ldbzip Hepatitis B virus disease progression in East Asia is most frequently associated with genotype C (HBV/C). The increasing availability of HBV/C genetic sequences and detailed annotations provides an opportunity to investigate the epidemiological factors underlying its evolutionary history. In this study, the Bayesian phylogeography framework was used to investigate the origins and patterns in spatial dissemination of HBV/C by analyzing East Asian sequences obtained from 1992 to 2010. The most recent common ancestor of HBV/C was traced back to the early 1900s in China, where it eventually diverged into two major lineages during the 1930s‐1960s that gave rise to distinct epidemic waves spreading exponentially to other East Asian countries and the USA. Demographic inference of viral effective population size over time indicated similar dynamics for both lineages, characterized by exponential growth since the early 1980s, followed by a significant bottleneck in 2003 and another increase after 2004. Although additional factors cannot be ruled out, we provide evidence to suggest this bottleneck was the result of limited human movement from/to China during the SARS outbreak in 2003. This is the first extensive evolutionary study of HBV/C in East Asia as well as the first to assess more realistic spatial ecological influences between co‐circulating infectious diseases. Hepatitis B virus disease progression in East Asia is most frequently associated with genotype C (HBV/C). The increasing availability of HBV/C genetic sequences and detailed annotations provides an opportunity to investigate the epidemiological factors underlying its evolutionary history. In this study, the Bayesian phylogeography framework was used to investigate the origins and patterns in spatial dissemination of HBV/C by analyzing East Asian sequences obtained from 1992 to 2010. The most recent common ancestor of HBV/C was traced back to the early 1900s in China, where it eventually diverged into two major lineages during the 1930s-1960s that gave rise to distinct epidemic waves spreading exponentially to other East Asian countries and the USA. Demographic inference of viral effective population size over time indicated similar dynamics for both lineages, characterized by exponential growth since the early 1980s, followed by a significant bottleneck in 2003 and another increase after 2004. Although additional factors cannot be ruled out, we provide evidence to suggest this bottleneck was the result of limited human movement from/to China during the SARS outbreak in 2003. This is the first extensive evolutionary study of HBV/C in East Asia as well as the first to assess more realistic spatial ecological influences between co-circulating infectious diseases. East Asia, HBV genotype C, human mobility, phylogeography, population bottleneck, SARS the different genotypes, except for genotypes D and G, which are scattered worldwide. 4 Genotype A is prevalent in Africa, North America and Europe; Genotypes B and C are the major genotypes circulating in Asia 4 and, even in the USA, are the most common among Asian patients 5 ; genotype E is prevalent in Africa; F/H in Central and South America; I in Taiwan; and J in Japan. 4, 6, 7 HBV genotype C (HBV/C), in particular, is the most prevalent genotype in almost every East Asian country, and it also accounts for a large number of infections in the USA (prevalence of 41% and 23% along the Western and Eastern coasts, respectively 5 ). In Korea, genotype C constitutes almost 100% of the infections, 8 approximately 50% in Hong Kong, 9 and 85% in Japan. 10 In Mainland China, genotype C is predominant in the Northern part of China (Beijing, Xingjiang and Gansu), while genotype B is prevalent in the Central and Eastern part of China (Hunan and Fujian), with an overall prevalence of 41% for genotype B and 53% for genotype C. 11 Similarly in Taiwan, where HBV genotype B is the most prevalent (68%), genotype C still accounts for almost one-third of the infections (32%). 12 and C4 in the Aborigines from Australia. 14 Chronic infection with HBV/C has been associated with significantly higher risk than with other genotypes for progression to LC and HCC. 15, 16 In Taiwan, in particular, chronic hepatitis and HCC are still the 9th ranking cause of death. 16 Indeed, several studies have suggested that disease outcomes are related to specific genetic variants. [17] [18] [19] Due to the disease activity and the risk of HCC development with which HBV/C is associated, it is important to investigate the molecular evolution and demographic history of this genotype in highly endemic countries notably the high prevalence of genotype C in USA and the prevalence (14.8%) of chronic HBV infections among Asian immigrants who likely acquired infections in their country of origin. 20 We consider this as an Asianrelated infection network, and therefore the HBV/C sequences from USA will be included in the study. Previous studies based on Bayesian coalescent analysis estimated the HBV evolutionary rate to be approximately 10 −4 -10 −5 nucleotide substitutions/site/ year 21, 22 and traced back the time of the most recent common ancestor (tMRCA) of the currently circulating human genotypes to ~1500 years ago, which in turn separated from the avian HBV lineage ~6000 years ago. 21 However, specific genotypes, or lineages within genotypes responsible for current outbreaks, may have a more recent origin, and their successful spread could be the result of specific historical or geopolitical correlates during the past decades potentially related to an unprecedented increase in human mobility. Therefore, our main objective was to infer the origin and epidemic history of HBV/C in East Asia and investigate ecological factors affecting dissemination and epidemic outbreaks of the virus. We mined the GenBank database (https://www.ncbi.nlm.nih.gov) to compile a comprehensive data set of all currently available HBV/C sequences. The gold standard method for HBV genotyping is wholegenome sequencing followed by phylogenetic analysis. 23 Therefore, to infer a reliable demographic history of HBV/C, we focused on fullgenome sequences with known sampling time and country of origin Sequences included in the final data set were required to satisfy the following criteria: nonrecombinant sequences with no uncertainty concerning genotype assignment; sequences isolated only from human serum or plasma (sequences from liver tumour were excluded to avoid the potential confounding factor of tissue-specific convergent evolution of sequences sampled from different patients); sequences not epidemiologically linked (ie, not linked through a direct transmission chain); and when multiple sequences from the same subjects were available, only one sequence was randomly selected. Genotyping classification was confirmed by phylogenetic analysis using Neighbor Joining (NJ) tree reconstruction, with the GTR nucleotide substitution model, gamma-distributed rate heterogeneity among sites (GTR + G), and 1000 bootstrap replicates, from an alignment including the full-genome sequences obtained from GenBank and well-established genotype (A to J) reference strains (accession numbers shown in Table S1 ). Calculations were carried out within MEGA6 software. 24 Data collection times of HBV sequences included in the full-genome data set (n = 429) spanned from 1992 to 2010. To ensure that a specific country was not falsely over-represented in the alignment, a sampling ratio was calculated using the proportion of genotype C in chronic HBV cases (China 53%, Korea 98%, Japan 85%, Taiwan 32%, USA 41%) normalized by HBV prevalence in the general population of each country (China 10%, 25 Korea 5.9%, 26 Japan 4%, 27 Taiwan 15%, 28 USA 15%, 20, 29 which resulted in a sampling ratio China:Korea:Japan:Taiwan:USA of 1.6:1.7:1:1.4:1.8. Sequences were, then, randomly selected from each country according to this ratio to generate an alignment including a final alignment of 120 strains representative of the virus prevalence in each country spanning from 1992 to 2010 (see Table S2 ) to infer maximum likelihood (ML) trees, NJ trees and Bayesian coalescent inference. The best-fitting nucleotide substitution (GTR + G) model was selected using a hierarchical likelihood ratio test within PAUP* v4.0. 30 NJ and ML trees were then inferred according to the best-fitting model using MEGA6 and PhyML 3.0 (http://www.atgc-montpellier. To assess the molecular clock signal carried in the temporally sampled viral sequences, a cross-platform software, TempEst (formerly known as Path-O-Gen; http://tree.bio.ed.ac.uk/software/tempest/), is used to explore the association between genetic divergence through time and the sampling dates. Time-scaled phylogenetic trees, evolutionary rates and demographic histories of HBV/C strains were evaluated using the Bayesian coalescent framework implemented in BEAST v1.8.2 (http://beast. community/index.html), which uses a Markov Chain Monte Carlo (MCMC) sampling method to obtain posterior distributions of tree topologies and parameter estimates. Bifurcating nodes with posterior probability greater than 0.95 were considered statistically well supported. Six different evolutionary models were tested: strict vs relaxed molecular clock, each one with a constant size, exponential growth or Bayesian Skygrid (nonparametric) demographic prior. 32 Depending on the model, MCMCs were run for 500 million to 1.5 billion generations (sampling every 0.01% of the run) until the effective sampling size of each parameter estimate (after burn-in of 10%-25%, depending on the model) was >200 to ensure proper mixing of the Markov chain. For each run, the marginal likelihood was estimated via path sampling (PS) and stepping stone (SS) methods 33 and the resulting Bayes Factors (BF) (ratio of marginal likelihoods) used to select the best-fitting clock/demographic model. 34 In practice, following the original work of Kass and Raftery, 35 In ML and NJ trees, the putative location of each ancestral lineage (internal branch) was inferred by assigning a discrete character to each sequence corresponding to the country of origin and reconstructing ancestral states by maximum parsimony. A more in-depth phylogeographic analysis, incorporating both spatial and temporal information, was also performed with BEAST 36 using a discrete trait, symmetric substitution model with Bayesian stochastic search variable selection (BSSVS). The MCC tree was converted to a keyhole markup language file (KML file) using SPREAD 37 software and projected onto a geographical map using Google Earth (available online: http://www.google.com/earth) to produce a graphical animation of the estimated spatiotemporal patterns of HBV/C evolution. Longitude and latitude of each centre of the cities or countries were marked orderly in a text-delimited file for SPREAD. The entry numbers at customs arrival point of Eastern Asian countries, as well as departure trends, were collected from a number of national databases (see below) to evaluate potential correlations between viral demographic history and human mobility during the past two to three decades (depending on available data for each country). Human mobility data in Taiwan The NJ tree inferred from genotype C full-genome sequences was rooted with genotype B strains (see Figure 1 ) In order to investigate the timeframe of HBV/C spatial dispersion patterns in East Asia within the Bayesian coalescent framework, we selected from all available full-genome sequences a random sub-sample according to the relative ratio of HBV/C prevalence in each country ( site/year), in agreement with previous estimates. 21 The overall topology of the Bayesian maximum clade credibility (MCC) tree ( Figure 3A ) inferred from the data set is almost the same as ML and NJ phylogenies inferred from the full data set (Figure 1 and Figure S1 ). The majority of the available strains (99.5%) clustered within Similar to the data set including all lineages, the relaxed molecular clock and nonparametric Skygrid demographic prior were determined to be the best-fitting models when sequences of each of the three major lineages described in the previous section were analysed separately (Table 1) The prevalence and dynamics of HBV/C infection are a major public health concern in East Asia. Our study resulted in two significant findings. First, we showed that during the past three decades Since 1995, when successful vaccination programs started to curtail the number of "mother-to-infant" transmissions, overall HBV prevalence has been decreasing, especially in Japan. 48 The authors declare no conflict of interest. Hsin-Fu Liu http://orcid.org/0000-0003-0082-2269 World Health Organization. WHO guidance on development of influenza vaccine reference viruses by reverse genetics Typing hepatitis B virus by homology in nucleotide sequence: comparison of surface antigen subtypes Molecular epidemiological study of hepatitis B virus infection in two different ethnic populations from the Solomon Islands Hepatitis B virus genotypes Hepatitis B virus genotypes in the United States: results of a nationwide study A genetic variant of hepatitis B virus divergent from known human and ape genotypes isolated from a Japanese patient and provisionally assigned to new genotype Molecular characterization of hepatitis B virus and a 9-year clinical profile in a patient infected with genotype I Distribution of hepatitis B virus genotypes in Korea Hepatitis B virus genotype C is associated with more severe liver fibrosis than genotype B Geographic distribution of hepatitis B virus (HBV) genotype in patients with chronic HBV infection in Japan Geographic distribution, virologic and clinical characteristics of hepatitis B virus genotypes in China Genotypes and clinical phenotypes of hepatitis B virus in patients with chronic hepatitis B virus infection Genotype C of hepatitis B virus can be classified into at least two subgroups A novel variant genotype C of hepatitis B virus identified in isolates from Australian Aborigines: complete genome sequence and phylogenetic relatedness Hepatitis B virus genotype C takes a more aggressive disease course than hepatitis B virus genotype B in hepatitis B e antigen-positive patients Viral genotype and hepatitis B virus DNA levels are correlated with histological liver damage in HBeAg-negative chronic hepatitis B virus infection Hepatitis B virus genotype A is more often associated with severe liver disease in northern India than is genotype D Core promoter mutations and genotypes in relation to viral replication and liver damage in East Asian hepatitis B virus carriers Screening for chronic hepatitis B among Asian/Pacific Islander populations Bayesian estimates of the evolutionary rate and age of hepatitis B virus Molecular evolution of hepatitis B virus over 25 years Hepatitis B virus genotyping: current methods and clinical implications MEGA6: molecular evolutionary genetics analysis version 6.0 Epidemiological serosurvey of hepatitis B in China-declining HBV prevalence due to hepatitis B vaccination Hepatitis B vaccinations among Koreans: Results from 2005 Korea National Cancer Screening Survey Epidemiology of Hepatitis B Virus in Japan, especially in Nagasaki Epidemiology of hepatitis B virus infection in the Asia-Pacific region Epidemiology of hepatitis B in the United States Estimation of levels of gene flow from DNA sequence data Improving Bayesian population dynamics inference: a coalescentbased model for multiple loci Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution Bayesian phylogenetics with BEAUti and the BEAST 1.7 SPREAD: spatial phylogenetic reconstruction of evolutionary dynamics Viral phylodynamics and the search for an 'effective number of infections' SARS epidemic area Severe acute respiratory syndrome coronavirus sequence characteristics and evolutionary rate estimate from maximum likelihood analysis Cultural Revolution memoirs written and read in English: Image formation, reception and counternarrative: Thesis(Ph.D.)-University of Minnesota Life and Death in Shanghai World Health Organization changes Hong Kong, Guangdong travel recommendations Summary of probable SARS cases with onset of illness from 1 SARS in healthcare facilities Control measures for severe acute respiratory syndrome (SARS) in Taiwan Molecular evolution and phylodynamics of acute hepatitis B virus in Japan Characterising two-pathogen competition in spatially structured environments