key: cord-0053416-4sordqxu authors: Kamau, Everlyn; Otieno, James R.; Lewa, Clement S.; Mwema, Anthony; Murunga, Nickson; Nokes, D. James; Agoti, Charles N. title: Evolution of respiratory syncytial virus genotype BA in Kilifi, Kenya, 15 years on date: 2020-12-03 journal: Sci Rep DOI: 10.1038/s41598-020-78234-0 sha: 1c449d54059ba4e2ced88d9ec65b0c5dcc812176 doc_id: 53416 cord_uid: 4sordqxu Respiratory syncytial virus (RSV) is recognised as a leading cause of severe acute respiratory disease and deaths among infants and vulnerable adults. Clinical RSV isolates can be divided into several known genotypes. RSV genotype BA, characterised by a 60-nucleotide duplication in the G glycoprotein gene, emerged in 1999 and quickly disseminated globally replacing other RSV group B genotypes. Continual molecular epidemiology is critical to understand the evolutionary processes maintaining the success of the BA viruses. We analysed 735 G gene sequences from samples collected from paediatric patients in Kilifi, Kenya, between 2003 and 2017. The virus population comprised of several genetically distinct variants (n = 56) co-circulating within and between epidemics. In addition, there was consistent seasonal fluctuations in relative genetic diversity. Amino acid changes increasingly accumulated over the surveillance period including two residues (N178S and Q180R) that mapped to monoclonal antibody 2D10 epitopes, as well as addition of putative N-glycosylation sequons. Further, switching and toggling of amino acids within and between epidemics was observed. On a global phylogeny, the BA viruses from different countries form geographically isolated clusters suggesting substantial localized variants. This study offers insights into longitudinal population dynamics of a globally endemic RSV genotype within a discrete location. Over 15 RSV epidemics (2002 to 2017), a total of 903/12203 (7.4%) respiratory samples tested positive for RSVB. G gene sequencing was attempted for all RSVB positive samples and was successful for 857 (95%) RSVB positive samples. The proportion of RSVA and RSVB samples in each epidemic ranged from 3.8 to 77.0% (Table 1) (Table 1 ). The annual distribution of RSV cases by age from 2002 to 2017 are shown in Supplementary Table 1 . Although the hospital-based RSV surveillance in Kilifi was established in 2002, the genotype BA was not detected until January 2003. From the RSVB positive samples, RSVB BA viruses were confirmed during sequence assembly and analysis by the presence of a 60-nucleotide duplication in the C-terminal of the G gene. Overall, a total of 735 genotype BA G gene sequences spanning nucleotides 214 to 981 of the RSVB strain B1 (DQ227363) were obtained. Genetic diversity of the genotype BA. The sequenced G gene region showed 3% nucleotide divergence (overall mean P distance) over the entire period and varied by a maximum of 38 nucleotides within an epidemic. Genetic divergence increased proportionally with time, Fig. 1A , indicating clock-like evolution. Molecular clock phylogenies showed a well-ordered diversification of the genotype BA since its introduction in Kilifi (Fig. 1B) . Viruses from the same epidemic formed multiple phylogenetic clusters and often, sequences from the preceding epidemic were often positioned at the basal nodes of those in the successive epidemic, suggesting sequential virus circulation and persistence between epidemics. A total of 56 distinct variants were identified, some were singletons suggesting either low level transmission or unsampled genetic diversity. The 2007/8 and 2009/10 epidemics were the most heterogeneous, with 12 and 11 variants, respectively. Up to four variants circulated over multiple epidemics (Fig. 1C) , and some were observed in non-consecutive epidemics likely indicating reintroduction of variants that had been previously observed locally. Evolutionary history of the genotype BA in Kilifi. Our previous study reported the number of virus introductions into Kilifi from 2002/3 to 2011/12 epidemics 24 . To place the Kilifi genotype BA viruses in the context of global RSVB diversity, we analyzed the 2012/13 to 2016/17 Kilifi sequences (n = 233) and 583 global sequences sampled from 16 countries between 2012 and 2017. In the current study, distinct geographical clusters were evident with viruses clustering primarily by country of origin. The 2012/13 to 2016/17 RSVB epidemics in Kilifi were seeded by at least 15 virus introductions (Fig. 2) . For many countries, viruses circulating during the same year were not placed into single monophyletic groups but in multiple clusters of assorted sizes (Fig. 2) , indicative of multiple virus entries followed by local spread and genetic expansion. Between 2012 and 2017, Figure S1 ). This study documents the natural history of the RSVB genotype BA viruses in Kilifi, Kenya, over 15 epidemics. Our analysis found substantial genetic diversity exhibited by multiple variants co-circulating within and between epidemics. Shifts in the predominating strains have been suggested to confer a selective advantage by variations 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 2002/3 T E R D T S T P Q S T V L D T T T S The 60-nucleotide duplication region showed further accumulation of nucleotide substitutions further than previously reported 15 , which likely points to enhanced fitness and molecular adaptation providing evolutionary advantage of the BA genotype 41 compared to other RSVB genotypes. Accumulation of nucleotide changes in RSV G gene was also attributed to antibody selection to some extent, as well as selective constraints other than immune selection, for instance, high mutation rate, protein malleability and bottleneck effects 42 . We observed amino acid reversion patterns in the Kilifi genotype BA viruses. Evolutionary reversals provide a means of escape by generating antigenic novelty and reflect fluctuating dynamics in immune responses, in which cross-immunity wanes, thereby controlling pathogen replication and facilitating virus transmission 30 . Escape mutations within or flanking functionally conserved epitopes come under selective pressure to revert to the wild type in hosts that do not mount an immune response against the epitope 43 . Amino acid reversions could also be a necessary consequence of a limited set of possible replacements at epitopes, a constraint on the repertoire of functionally viable amino acids 30 . The Arg-180 substitution may have altered the positive charge of the G protein cysteine noose. The role of electrostatic charge in pathogenesis has been appreciated for HIV rapid progression to AIDS by the presence of charged residues at two amino acid positions on the gp120 subunit 44 . In Influenza A, several immune escape mutants showed an increased positive charge in HA and virions with a higher net positive charge in HA may promote favourable electrostatic interactions with target cells since host cell receptors possess a strong negative charge 45 . A change in surface charge (Glu to Lys) of the envelope glycoprotein was also implicated in the emergence of enzootic and epizootic viral phenotypes 46 . In addition, a potentially significant element of the immune escape repertoire is the selection of 'adsorptive mutants' , showing enhanced binding to target cells 47 . An important aspect of receptor-binding avidity is that of electrostatic charge 45 . The Arginine replacement at position 180 may have resulted in increased net positive surface charge at the cysteine noose consequently promoting electrostatic interactions with target host cells. Future therapeutic design efforts will need to assess for possible loss of human G-directed mAbs cross-reactivity to epitopes containing the N178S and Q180R mutations. Both pervasive positive selection and episodic diversifying selection were detected in amino acid positions within the sequenced G gene segment, six of which (227, 270, 287, 305, 306 and 313) were in agreement with previous studies 30, 48 . Similarly, four putative N-glycan binding sites reported in this study (86, 144, 296, and 310) were previously reported for RSV G 49 . Amino acid position 144 was also identified in our study as being under positive selection by at least two methods. In summary, our analyses show that the genotype BA dynamics in Kilifi have been marked by multiple cocirculating variants even within an epidemic, which are serially replaced over time perhaps by virus importations or evolution in situ. There has been gradual diversification specially in the C-terminal of the G protein possibly due to antibody-driven selection or functional constraints. In later epidemics (from 2007/8) the genotype was characterized by marked evolutionary reversions that reflect fluctuating immunological dynamics-either loss of pre-existing immunity or positive selection in a newly susceptible populations 30 -in the local communities. www.nature.com/scientificreports/ Contemporary sampling and sequencing of RSVB particularly in Africa is still insufficient, compared to HIV-1 viruses, for which the regional spatial ecology has been extensively characterized 50, 51 . Paucity of sequence data from neighboring countries limited our inferences on virus importations and movement within the sub-Saharan region. Future studies on local epidemics, transmission and spatial dynamics of the genotype BA could greatly benefit from improved sampling and sequencing in the region. Nonetheless, this study highlights the value of longitudinal sampling within a discrete location and of sequencing sufficiently over a long time to characterize RSV population dynamics. Further studies are required to determine the importance of variant-specific genetic differences in immune response and virus transmission. While the G gene is the fastest evolving RSV gene and has been extensively useful in RSV molecular epidemiology, studying the diversity of the rest of the RSV genome could allow for a deeper understanding of the long-term evolutionary dynamics of the RSV genotype BA. Study samples and epidemic patterns. This study was part of longitudinal surveillance study established to understand the epidemiology and disease burden of RSV associated pneumonia 21 . The respiratory samples were collected into the universal transport media (UTM; Copan Diagnostics, USA). We used sequence data from RSV positive samples (nasal washes, aspirates and swabs) collected between January 2003 and August 2017. A majority of the samples were from children < 5 years admitted to Kilifi County Hospital (KCH) (2004-2017) with lower respiratory tract illness 20, 21 . The 2003 samples were from the Kilifi RSV birth cohort acute respiratory infection cases 52 . All sample sets arise from the same catchment population and were processed similarly in the laboratory as detailed below. Informed consent was sought from parents or guardians, and the study protocols were reviewed and approved by the Scientific and Ethical Review Unit, KEMRI, Kenya. All methods were carried out in accordance with relevant guidelines and regulations. RSV epidemics in Kilifi typically start in November and run through to May with sporadic (inter-epidemic) infections occurring in June to October 21 . However, there is year-to-year variation and therefore we took a pragmatic approach and defined an epidemic as running from October of one year through to September of the following year. The dominant RSV group within a given epidemic was associated with ≥ 60% of the cases; otherwise, the groups were considered co-dominant. Viral detection, amplification and sequencing. RSV detection was performed by immunofluorescence antibody test (IFAT; RSV DFA kit, Light Diagnostics, UK). Beginning 2007, a real-time RT-PCR assay with primer/probe sets targeting the highly conserved nucleoprotein (N) gene was used to discriminate RSVA and RSVB for all respiratory samples collected during the surveillance period as previously described 24, 53, 54 . In 2015, the real-time RT-PCR assay for RSVB was updated following failed detection of drifted variants 55 . For genotyping purposes, a 900 base pair (bp) fragment of the G gene was amplified from purified viral nucleic acids of RSVB positive samples and sequenced in a nested PCR reaction as previously described 18, 53 . Sequence assembly was done using Sequencher v5.0 (Gene Codes Corp., USA). The RSVB genotype BA viruses were identified following sequence alignments and phylogenetic analysis of the assembled sequences. Global data set. In addition to the Kilifi sequences, we retrieved RSVB G gene sequences collected between 2012 and 2017 from GenBank. We excluded sequences were shorter than 700 nucleotides, non-BA genotype sequences and those without sampling date or location information. Sequences were aligned using MAFFT 7.222 and alignments manually edited in Aliview 56 . Phylogenetic analyses. Model selection (ModelFinder) and Maximum likelihood (ML) phylogenetic trees were estimated with IQTREE 1.6 57, 58 . Node support values were estimated using bootstrap resampling (1000 replicates). Temporal signal in the data was examined using TempEst 59 . Phylogenetic relationships and viral demographic histories were inferred using BEAST 1.10 60 . We used a GTR substitution model with discrete gamma distributed rate variation among sites and uncorrelated relaxed molecular clock with branch rates drawn form a lognormal distribution to account for evolutionary rate variation among lineages. A Skyride demographic prior with time-aware smoothing 61 was selected and a CTMC rate reference prior was specified for the mean clock rate. Chain length of MCMC sampling was 300 million generations, sampling every 10,000. Stationarity and mixing were examined using Tracer 1.7 and maximum clade credibility (MCC) trees summarized using TreeAnnotator. A variant was defined as a virus or group of viruses with 4 nucleotide differences compared to other viruses and/or falling into a distinct cluster with at least 60% bootstrap support 24 . Selection analysis. We tested the G gene data for evidence of natural selection using methods implemented in the Datamonkey 2.0 webserver 62 . Mean non-synonymous to synonymous rate ratio (d N /d S ) was estimated using the SLAC method. Pervasive selection analyses were performed using SLAC and FEL methods. Episodic (frequently transient) selection was evaluated using MEME, and the FUBAR method was used to evaluate the difference between nonsynonymous and synonymous rates per codon site. Significance of SLAC, FEL and MEME results used a P-value cutoff of 0.05, and FUBAR results were assessed posterior probability of 0.9. N-linked glycosylation sites. The N-Glycosite web tool (http://www.hiv.lanl.gov/conte nt/seque nce/ GLYCO SITE/glyco site.html) was used to identify putative N-glycosylation sites (amino acid configuration N-x-S/T, where x is not Proline (P)). Global burden of acute lower respiratory infections due to respiratory syncytial virus in young children: a systematic review and meta-analysis Global respiratory syncytial virus-associated mortality in young children (RSV GOLD): a retrospective case series Global, regional, and national disease burden estimates of acute lower respiratory infections due to respiratory syncytial virus in young children in 2015: a systematic review and modelling study Surveillance of respiratory viruses in the outpatient setting in rural coastal Kenya: baseline epidemiological observations Respiratory syncytial virus infection in adults Respiratory syncytial virus seasonality: a global overview Progress toward a respiratory syncytial virus vaccine Advances in RSV vaccine research and development-a global agenda Structure and function of respiratory syncytial virus surface glycoproteins Molecular epidemiology of respiratory syncytial virus The G glycoprotein of human respiratory syncytial viruses of subgroups A and B: extensive sequence divergence between antigenically related proteins Phylodynamics and dispersal of HRSV entails its permanence in the general population in between yearly outbreaks in children Genetic variability and molecular evolution of the human respiratory syncytial virus subgroup B attachment G protein Major changes in the G protein of human respiratory syncytial virus isolates introduced by a duplication of 60 nucleotides Ten years of global evolution of the human respiratory syncytial virus BA genotype with a 60-nucleotide duplication in the G protein gene Frequent asymptomatic respiratory syncytial virus infections during an epidemic in a rural Kenyan household cohort Severe lower respiratory tract infection in early infancy and pneumonia hospitalizations among children Kenya Genetic relatedness of infecting and reinfecting respiratory syncytial virus strains identified in a birth cohort from rural Kenya Viral etiology of severe pneumonia among Kenyan infants and children A preliminary study of pneumonia etiology among hospitalized children in Kenya Incidence and severity of respiratory syncytial virus pneumonia in rural Kenyan children identified through hospital surveillance Respiratory syncytial virus infection and disease in infants and young children observed from birth in Kilifi District Kenya The incidence and clinical burden of respiratory syncytial virus disease identified through hospital outpatient presentations in Kenyan children Successive respiratory syncytial virus epidemics in local populations arise from multiple variant introductions, providing insights into virus persistence The source of respiratory syncytial virus infection in infants: a household cohort study in rural Kenya Local evolutionary patterns of human respiratory syncytial virus derived from whole-genome sequencing Tracking virus outbreaks in the twenty-first century Genomic analysis of viral outbreaks Epidemic dynamics revealed in dengue evolution Positive selection results in frequent reversible amino acid replacements in the G protein gene of human respiratory syncytial virus Structural, antigenic and immunogenic features of respiratory syncytial virus glycoproteins relevant for vaccine development Proposed three-dimensional model for the attachment protein G of respiratory syncytial virus Structures of respiratory syncytial virus G antigen bound to broadly neutralizing antibodies Potent high-affinity antibodies for treatment and prophylaxis of respiratory syncytial virus derived from B cells of infected patients Structural basis for recognition of the central conserved region of RSV G by neutralizing human antibodies The effect of genetic structure on molecular dating and tests for temporal signal Influence of respiratory syncytial virus strain differences on pathogenesis and immunity Circulation patterns of group A and B human respiratory syncytial virus genotypes in 5 communities in North America Molecular characterization of respiratory syncytial viruses circulating in a paediatric cohort in Amman Jordan Molecular epidemiology and evolution of human respiratory syncytial virus and human metapneumovirus Functional analysis of the 60-nucleotide duplication in the respiratory syncytial virus buenos aires strain attachment glycoprotein Conservation of G-protein epitopes in respiratory syncytial virus (Group A) despite broad genetic diversity: is antibody selection involved in virus evolution Frequent toggling between alternative amino acids is driven by selection in HIV-1 Maintenance of syncytium-inducing phenotype of HIV type 1 is associated with positively charged residues in the HIV type 1 gp120 V2 domain without fixed positions, elongation, or relocated N-linked glycosylation sites Differences in the biological phenotype of low-yielding (L) and high-yielding (H) variants of swine influenza virus A/NJ/11/76 are associated with their different receptor-binding activity Positively charged amino acid substitutions in the e2 envelope glycoprotein are associated with the emergence of venezuelan equine encephalitis virus Hemagglutinin receptor binding avidity drives influenza A virus antigenic drift Characteristics and their clinical relevance of respiratory syncytial virus types and genotypes circulating in Northern Italy in five consecutive winter seasons The comparative genomics of human respiratory syncytial virus subgroups A and B: genetic variability and molecular evolutionary dynamics Origin, imports and exports of HIV-1 subtype C in South Africa: a historical perspective Evolution of HIV-1 within untreated individuals and at the population scale in Uganda Respiratory syncytial virus epidemiology in a birth cohort from Kilifi district, Kenya: infection during the first year of life Molecular evolutionary dynamics of respiratory syncytial virus group A in recurrent epidemics in coastal Kenya Added value of an oropharyngeal swab in detection of viruses in children hospitalized with lower respiratory tract infection Recent sequence variation in probe binding site affected detection of respiratory syncytial virus group B by realtime RT-PCR AliView: a fast and lightweight alignment viewer and editor for large datasets ModelFinder: fast model selection for accurate phylogenetic estimates IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2, vew007 Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology A. conceived and designed the research. All authors read and reviewed the manuscript We are thankful to the laboratory, field, and clinical staff for sample and clinical data collection and laboratory testing. We are also thankful to the guardians and parents of the children who participated in this study. This paper is published with the permission of the Director of KEMRI. This study was funded by The Wellcome Trust E.K. performed sequence analysis and wrote the manuscript drafts, J.R.O. was involved in data collection and sequence analysis, C.L. and A.M. prepared samples for sequencing, N.M. was involved in patient data collection The authors declare no competing interests. Supplementary information is available for this paper at https ://doi.org/10.1038/s4159 8-020-78234 -0.Correspondence and requests for materials should be addressed to E.K.Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.