key: cord-0003448-81znlofa
authors: Domingo-Calap, Pilar; Schubert, Benjamin; Joly, Mélanie; Solis, Morgane; Untrau, Meiggie; Carapito, Raphael; Georgel, Philippe; Caillard, Sophie; Fafi-Kremer, Samira; Paul, Nicodème; Kohlbacher, Oliver; González-Candelas, Fernando; Bahram, Seiamak
title: An unusually high substitution rate in transplant-associated BK polyomavirus in vivo is further concentrated in HLA-C-bound viral peptides
date: 2018-10-18
journal: PLoS Pathog
DOI: 10.1371/journal.ppat.1007368
sha: 806a1df203eb5126eec25561bd4cb1b10df18c53
doc_id: 3448
cord_uid: 81znlofa

Infection with human BK polyomavirus, a small double-stranded DNA virus, potentially results in severe complications in immunocompromised patients. Here, we describe the in vivo variability and evolution of the BK polyomavirus by deep sequencing. Our data reveal the highest genomic evolutionary rate described in double-stranded DNA viruses, i.e., 10(−3)–10(−5) substitutions per nucleotide site per year. High mutation rates in viruses allow their escape from immune surveillance and adaptation to new hosts. By combining mutational landscapes across viral genomes with in silico prediction of viral peptides, we demonstrate the presence of significantly more coding substitutions within predicted cognate HLA-C-bound viral peptides than outside. This finding suggests a role for HLA-C in antiviral immunity, perhaps through the action of killer cell immunoglobulin-like receptors. The present study provides a comprehensive view of viral evolution and immune escape in a DNA virus.

Introduction Viral evolutionary rates can vary strongly depending on the method used to estimate them [1, 2] . Among Baltimore groups, the fastest evolving entities are single-stranded (ss) RNA and reverse-transcribing (RT) viruses, with rates ranging between 10 −2 and 10 −5 substitutions per site per year (s/s/y). The rates of double-stranded (ds) RNA and ssDNA viruses range between 10 −3 and 10 −6 s/s/y, whereas dsDNA viruses evolve more slowly (10 −3 and 10 −8 s/s/y) [3, 4] . It is important to note that only few estimates on dsDNA viruses are published. In fact, higher estimates are based on specific genes, as estimated for human papillomavirus 16 (E6 and E7), human adenovirus (hexon), or JC virus (VP1), which are in the order of 10 −3 s/s/y [4, 5] . Regarding estimates based on dsDNA complete genomes, all of them range between 10 −5 and 10 −7 s/s/y [3, 5] . This finding confirms that viruses are fast evolving entities whereas humans have much lower evolutionary rates (10 −8 -10 −9 s/s/y). However, the well-established co-divergence of viral populations with their hosts suggests the possibility of low evolutionary rates in viruses as well. For example, polyomaviruses were historically considered to be examples of human-virus co-divergence, and have been used as markers for human migration patterns, with proposed estimates ranging from 1.41 × 10 −7 to 4 × 10 −8 s/s/y [6, 7] . Detailed studies are needed to better understand dsDNA virus evolution in vivo, especially in viruses that can be considered as potential pathogens.

In vertebrates, the major driving force in anti-viral immunity is the high level of polymorphism in human leukocyte antigen (HLA) genes. Despite a few recent reports [8, 9] , limited information is presently available on the extent of viral variability in vivo, especially at the whole viral genome level, and only a few studies have tackled this variability in conjunction with the HLA genotype of infected individuals. Consequently, viral escape mutants-i.e., viruses that produce mutated peptides that are no longer able to bind to cognate HLA molecules-have been mainly studied for limited model epitopes in in vitro systems and in highly relevant RNA viruses such as HIV, HCV, influenza or dengue (see the following historical references [10, 11] ; for a recent review and full bibliography on the subject see [12] ). It is not surprising that RNA viruses can adapt to circumvent the immune responses [4] , but little is known about viral escape in DNA viruses.

A better understanding of the epitopes involved in viral escape from the immune system could be useful for the development of vaccines and specific treatments. Here, we initiate a dual approach using the BK virus (BKV) as a model. BKV, which was detected for the first time in 1971, is a 5.1 kb dsDNA virus of the Polyomaviridae family that harbors six genes (Agnogene, VP1 to VP3, large T antigen "LTA" and small t antigen "stA") [13] . The primary infection occurs essentially in childhood and the virus infects up to 90% of the human population. The virus remains persistent throughout life, primarily in the urinary tract [14] . High-level replication mainly occurs in immunocompromised hosts and, more specifically in those receiving investissementsdavenir/). Additional support was received from the Strasbourg High Throughput Next Generation Sequencing facility (GENOMAX) (SB)(no URL available), the Institut National de la Santé et de la Recherche Médicale (INSERM)(SB) (www.inserm.fr), Initiative d'Excellence (IDEX) fund of the University of Strasbourg (UNISTRA)(SB) (www.unistra.fr), the Institut Universitaire de France (IUF)(SB)(http://www.iufrance.fr), projects BFU2014-58656R and BFU2017-89594R from Ministry of Economy and Competitiveness (MINECO; Spanish Government) (http://www.idi. mineco.gob.es)(FGC) and the project PROMETEO/ 2016/122 from the Generalitat Valenciana (FGC) (www.gva.es/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. modern immunosuppressive regimens, notably post-kidney transplantation. BKV-associated diseases, especially BKV-associated nephropathy, affect 1-10% of transplant recipients [15, 16] and may lead to loss of the allograft and even death [17] . There are no specific prophylactic or curative treatments, and early diagnosis, as well as quick restoration of immunity (through dampening of immunosuppression), remain the most effective strategies to control the disease.

High level of variability in BKV as detected by NGS Access to the virus in the bloodstream and/or urine within a transplant setting, where HLA alleles of both donors and recipients are known, provides a unique opportunity to study viral evolution in vivo in the context of the individual's (both recipient and donor) HLA class I genotype. A retrospective cohort of 96 patients-225 samples-that underwent solid organ (N = 83) or hematopoietic cell transplantations (N = 13), harboring a minimum of 10 4 viral copies/mL in whole blood or urine, was selected. Quantitative real-time PCR showed that the viral titers in blood (8.98 × 10 4 ± 2.47 × 10 4 copies/mL) were significantly lower than those in urine (2.16 ×10 9 ± 3.94 × 10 8 copies/mL) (Mann-Whitney U = 315.0, two-tailed, P < 0.0001). After complete deep genome sequencing of all 225 samples and alignment to the BKV Dunlop reference strain (GenBank accession number NC001538), an average of 110 ± 3 polymorphisms per sample was observed with an average median coverage of 3043 ± 78 reads/position (S1 Table, Gen-Bank accession numbers KT896230-KT896454; see Methods). In total, 37.88% of all amino acid positions in the Agnoprotein, 12.43% in VP1, 9.97% in VP2, 11.21% in VP3, 8.20% in LTA and 8.72% in stA, were found to be polymorphic (S2 Table) . Agnogene is the only gene that is not under apparent selective constraints (Nei-Gojobori test, P = 0.8663), while the others are under purifying selection (Nei-Gojobori test, P < 0.0001, Fig 1, see Methods) . Only a few single nucleotide insertions or deletions were detected in the viral genes (S3 Table) . Due to methodological limitations (short reads) the non-coding control region was not included in the analyses.

The occurrence of mutations is the main process generating genetic variability, but other processes, such as genetic drift, gene flow, selection and recombination, are responsible for shaping the genetic structure and variation of viral populations. Here, we present evidence that BKV is under strong purifying selection even in the immunocompromised host. Several specific features of the Polyomaviridae (e.g., limited size of the genome, small number of genes and overlapping transcription units) likely account for this outcome. In addition, the prevalence of purifying selection in essential genes is anticipated in all viruses as there is a requirement to complete the viral cycle, even in immunocompromised hosts. Most mutations in coding regions must be deleterious, and a high substitution rate implies the accumulation of mutations with deleterious effects [18] . This phenomenon is well known in RNA viruses, which have high mutation rates and short replication times. Similar results have been shown comparing mutational fitness effects and evolution in ssRNA and ssDNA viruses [19, 20] . Our study supports the hypothesis, in concordance with other recent findings [21] , that the evolutionary rate gap between small dsDNA and RNA viruses might not be as wide as previously thought. A recent study in lentiviruses has revealed that the combined effects of sequence saturation and purifying selection can explain the time-dependent pattern of rate variation. Purifying selection acts on the genetic diversity over long timeframes by removing a large number of transient deleterious mutations that are still present within short timeframes [4] .

Phylogenetic analysis with all BKV complete genomes available from GenBank (Fig 2A) suggested the existence of three large groups or genotypes represented by serotypes I, II/III, and IV, with subtypes within genotypes. Limited differences (short branch lengths) between the previously designated genotypes II and III suggested the existence of only one genotype II/III with two subtypes (in contrast to more pronounced differences between serotypes II and III). A similar phylogenetic classification was observed by analyzing only the VP1 gene ( Fig 2B) . Incidentally, this finding indicated that the current BKV classification should be revised due to inconsistencies between serotyping and genotyping. Next, to establish the genotype of our samples, one reference strain of each genotype and subtype was used for the phylogenetic analysis ( Fig 2C) . Most of our samples (80.88%) belonged to genotype I, whereas genotypes IV and II/III were less represented (13.78% and 5.3% respectively). The clustering was patient-dependent but independent of the sample origin (urine or blood) and suggested that some samples likely contained a mixture of genotypes. This mixture might be due to multiple lifelong infections or the replication of viruses from the recipient and/or the donor.

Intra-and inter-patient evolutionary rates were estimated. BKV sequences from samples with possible recombination or a mixture of genotypes according to the RDP output [22] were removed from the analysis (see Methods). We estimated an intra-patient substitution rate for BKV in transplanted patients in the range of 4.90 × 10 −4 -1.22 × 10 −3 substitutions per nucleotide site per year (s/s/y). No differences between substitution rates in solid organ and hematopoietic cell transplant recipients were found (t-test, P = 0.2581). To estimate the inter-patient evolutionary rate, the best substitution (molecular clock) and demographic model according to marginal likelihood analyses was the relaxed log-normal uncorrelated clock with Bayesian skyline demographic prior. The estimated inter-patient evolutionary rate ranged from 1.00 × 10 −5 -2.15 × 10 −4 (95% HDI) for a maximum sampling interval of 568 days. The estimate was quite robust to different demographic and molecular clock models (S4 Table) .

The evolutionary rates based on the maximum likelihood and least-squares methods implemented in treedater were similar when applied to the whole data set (4.30 × 10 −3 s/s/y) but with large parametric bootstrap confidence intervals (in the 10 −20 to 10 14 range), thus preventing their consideration as reasonable estimates. However, when the dataset was reduced to the sequences of genotype I (n = 56) the average evolutionary rate was estimated at 1.33 × 10 −4 (95% CI = 3.13 ×10 −6 -5.59 × 10 −3 ). These values were close to those obtained with the Bayesian approach described previously. It is usually assumed that RNA viruses evolve at a rate of 10 −4 s/s/y, while dsDNA can be close to 10 −8 s/s/y [23] . ssDNA viruses with small genomes evolving at rates similar to those of RNA viruses have been reported previously [24, 25] , as illustrated by the canine parvovirus, with a substitution rate of 1.7 × 10 −4 s/s/y [26] . In the case of dsDNA, many evolutionary rates have been calculated under the assumption of co-divergence between viral and human populations, as observed for polyomaviruses. Recently, the substitution rate for JC polyomavirus was evaluated at 1.7 × 10 −5 s/s/y [27] . Based on this result, Bayesian analyses suggested the substitution rate of BKV to be on the order of 10 −5 s/s/y [5, 28] , while another study found only minor nucleotide substitutions in the genes encoding late proteins [29] .

Here we estimated a substitution rate for BKV on the order of 10 −3 -10 −5 s/s/y (Fig 3) . Our experimental results show, for the first time using whole-genome sequencing of in vivo viral populations (in a large monocentric cohort), that the genomic evolutionary rate of a dsDNA virus can be as high as that of RNA viruses. It is important to note that the sampling window of sequences may affect the estimates of evolutionary rates, because very short timescales can inflate them. A recent study has shown that estimates of evolutionary rates were lower for broader sampling levels and longer timeframes for both, DNA and RNA viruses, suggesting that the time dependence of substitution rates is ubiquitous among all viruses [4] . For example, lentivirus evolutionary rates from serial samples over a few years within a single patient or host are in the order of 10 −3 s/s/y [30], reflecting those observed in this study in a small dsDNA virus.

In addition, a previous study comparing the evolution of ssRNA and ssDNA viruses has shown that small genomes (< 5 kb) can evolve rapidly [24] regardless of their encoding material, and that the well-known correlation between genome size and mutation rate [70] can also hold for evolutionary rates. Here, we show that small dsDNA genomes can also evolve as fast as single-stranded ones. Although BKV uses the host DNA polymerase for its replication, the virally-encoded Agnoprotein inhibits dsDNA break repair activity, thereby potentially increasing the error rate during BKV DNA replication [71] . Interestingly, cell tropism of RNA viruses was recently suggested as a key factor in their capacity to evolve, since viruses replicating in epithelial cells (as BKV) are characterized by rapid replication and higher substitution rates [72] .

To investigate the relationship between the evolutionary rate of the virus and the immunosuppressive drug regimen-hence the strength of the immune system-we analyzed such information in our kidney transplant recipient cohort (the largest subgroup in our cohort). Kidney transplant patients were given either anti-thymocyte globulin (ATG) (immunological high-risk patients) or anti-Interleukin-2 receptor (anti-IL-2R) (immunological low-risk patients) as induction treatments, and tacrolimus (immunological high-risk patients) or cyclosporine (immunological low-risk patients) as maintenance therapy. Mycophenolate mofetil and steroids were also part of both drug regimens (for high-and low-risk patients). Evolutionary analysis of the different subgroups showed no significant differences in the mutational load (full negative binomial mixed model regression with random effect intercept to account for repeated measures) nor in inter-patient substitution rates where ranges overlapped between treatments (ATG 6.12 × 10−4-1.03 × 10 −5 s/s/y, Anti-IL-2R 8.60 × 10 −4 -1.36 × 10 −5 s/s/y, tacrolimus 4.64 × 10 −4 -9.31 × 10 −6 s/s/y, and cyclosporine 1.72 × 10 −3 -1.11 × 10 −5 s/s/y).

To investigate the genetic immune escape mechanism of BKV, potential T-cell epitopes presented by HLA class I were predicted using both donor and recipient HLA alleles, combined with the viral substitutions found herein (S1 Fig, S2 , S5 and S6 Tables, see Methods). In this way, we determined the putative HLA ligandome of the virus as linked to the individual's cognate HLA genotype. Interestingly, the two codons in VP2 that appeared to be under positive selection corresponded to codons within predicted epitopes. The VP2 103 codon, the one with the highest level of significant difference, was found in three predicted HLA-C epitopes (KFFDDWDHKVSTV, FFDDWDHKV and FFDDWDHKVSTV), and codon 340 was located within two HLA-A predicted epitopes (TTNKRRSR and TTNKRRSRSSR).

We also found a higher fraction of observed amino acid substitutions within HLA-C epitopes compared with the fraction of amino acid substitutions outside of HLA-C epitopes (onesided Wilcoxon signed test, P = 3.71 × 10 −10 ). The opposite behavior was observed for HLA-A [69] (TSS 54 y)). Each point represents the value of a previously published genomic evolutionary rate (note that for some references, more than one substitution rate is represented in the caption). Red circles represent short time span estimates (< 5 years) and blue squares represent long-time span estimates (> 5 years). Medians with interquartile ranges are indicated. In the case of the inter-and intra-host genomic evolutionary rates of BKV, the values are represented as a range of values obtained in this study. (Fig 4) . This difference in contribution of HLA loci was independent of the transplantation type (solid organ or hematopoietic) or the origin of the HLA loci (whether from the donor or the recipient) as assessed by a three-way ANOVA (P = 0.7947). Therefore, our results suggest that HLA-C might be specifically involved in the immune response against BKV through its peptide selection capacity for viral peptides. A possible mechanistic explanation for this finding stems from the amply documented interaction of HLA-C with natural killer (NK) and T cells expressing the killer cell immunoglobulin-like receptors (KIR). Notably, the relevance of KIR and HLA-C interactions has been described for viral infections [73, 74] , and the involvement of NK cells in the immune response against BKV has also been reported [75, 76] , although further investigations should be done to confirm this hypothesis.

High evolutionary rates in RNA viruses allow them to escape immune pressures. Interactions between HLA epitopes and viruses have been described for a variety of RNA viruses, such as HIV, HCV, influenza or dengue, while little is known about immune escape in DNA viruses. A few studies in HPV-16 or herpes simplex virus have been done to improve vaccine design and drug development, but those studies have only examined a fraction of the proteins and not at whole-genome sequencing data [77] [78] [79] . This work, to our knowledge, is the first in which predicted epitopes from whole genome sequencing have been studied in an in vivo cohort, in conjunction with cognate HLA alleles, to understand the mechanism involved in immune escape in a DNA virus. Our results of viral escape combined with the high evolutionary rate described herein suggest that a combination of drugs should be used as potential treatment against BKV, as commonly used in highly variable viruses such as HIV and HCV, due to the variable viral populations present in a single patient as observed in our study.

The present work describes an unusually fast evolutionary rate for BKV in vivo and charts its interaction with the immune system-through the analysis of cognate HLA alleles-whilst considering the whole viral genome and not only candidate epitopes. It further offers a blueprint for similar analyses in other viruses and helps to better rationalize anti-viral therapy and candidate vaccine development. Our results suggest that small dsDNA viruses should be treated as RNA viruses due to their similarities in evolution and immune escape. Thus, a combination of drugs might be necessary for the treatment of BKV, as used for fast evolving RNA viruses. It is important to note that new analytic methods for the study of the evolutionary rates are needed to better understand the effect of time spans and improve the comparison between estimates.

Ninety-six transplanted patients between 2012 and 2013 from the Strasbourg University Hospitals (France) with high levels of post-transplant BKV viruria-as detected by routine BKV testing at the hospital's clinical virology laboratory-were enrolled in this study. Sixty-eight patients underwent kidney transplantation, 12 were lung recipients, 3 received double (kidney-heart; heart-lung or kidney-pancreas) transplants and 13 hematopoietic stem cell transplantation. A total of 225 samples, including 197 urine (from 94 patients) and 28 whole blood (from 13 patients) were included. Urine samples were collected longitudinally for 36 patients.

All patients were enrolled in the study following the Helsinki guidelines. Written informed consent for genetic testing was obtained from all patients and the study was approved by the Strasbourg University Hospitals institutional review board (RNI DC-2013-1990).

Urine and whole blood samples were collected, and DNA was purified using the QIAxtractor instrument (Qiagen, Hilden, Germany), following the DX protocol. Extracted DNA was stored at -80˚C until analysis. Blood and urine specimens were assessed using the BK virus R-gene quantification kit (Biomérieux, Lyon, France) following the manufacturer's recommendations. DNA was amplified by Phusion Polymerase (New England Biolabs, MA, USA) using specific overlapping primers. Nested PCR was performed for samples with a low BKV DNA load (usually blood samples). PCR products were purified using the GeneJET DNA purification Kit (ThermoFisher Scientific, Waltham, MA, USA) and quantified with Qubit (ThermoFisher Scientific, Waltham, MA, USA). Twenty-one urine-blood paired samples were used for sequencing by the Sanger method using an ABI Prism 3130 Genetic Analyzer (ThermoFisher Scientific, Waltham, MA, USA). Bi-directional sequencing was performed with the Big Dye Terminator v3.1 kit (ThermoFisher Scientific, Waltham, MA, USA) following the manufacturer's recommendations. Chromatograms were analyzed with the Staden package (24) to obtain the consensus sequence for each sample. These consensuses were obtained to compare with the results after the next-generation sequencing assembly to validate our pipeline.

All 225 urine and blood samples were sequenced by NGS. PCR products from the same samples were pooled in equimolar amounts and library construction with barcodes was performed according to the Fragment Library Preparation protocol using the AB Library Builder System (ThermoFisher Scientific, Waltham, MA, USA). Libraries were quantified by Qubit (Thermo-Fisher Scientific, Waltham, MA, USA) and then pooled in equimolar amounts for Template beads preparation using the SOLiD EZ beads System (ThermoFisher Scientific, Waltham, MA, USA). Template beads were subjected to sequencing using SOLiD 5500 (ThermoFisher Scientific, Waltham, MA, USA) with the paired-end 75 bp / 35 bp workflow. Sequences were assembled against the Dunlop reference strain (GenBank accession number NC001538) using LifeScope software (ThermoFisher Scientific, Waltham, MA, USA). Comparison with Sanger sequencing was performed to ascertain the correct assemblies. To quantify the variability per sample, mutations were analyzed with SeqMan software (DNASTAR, Madison, Wisconsin, USA). For each sample, we obtained a list of variants with their genomic location, coverage, and quality metrics, among others. To establish a cutoff for variant calling, we introduced internal controls including (a) a clone from the Dunlop reference strain, pBK (BKV34-2) plasmid (ATCC 45025) prepared by minipreparation (ThermoFisher Scientific, Waltham, MA, USA); (b) PCR amplicons from the same clone; and (c) PCR amplicons in duplicate from three of the samples. These controls were processed using the same sequencing methodology to establish the rate of sequencing and PCR errors. The final list of variants was selected by means of a Fisher's exact one-sided test comparing evidence obtained from the data for every potential polymorphism to the estimated error rate using our internal controls. Based on this analysis, BKV sequence variants found in less than 0.5% of reads were removed from the analysis.

Sequences were aligned and assembled against the Dunlop strain by Muscle implemented in MEGA version 6 [80] with default parameters in order to compare and determine point mutations, insertions, deletions, and other sequence variations. For better analysis of coding regions, individual datasets per gene were obtained. Further analysis of synonymous and nonsynonymous substitutions and the Nei-Gojobori test of neutrality were performed with MEGA version 6 [80] .

Phylogenetic analyses of the whole genome consensus sequences obtained from all samples, and for each gene separately, were performed using MEGA version 6 [80] . Maximum likelihood phylogenetic trees were constructed with the general time reversible model (GTR) of nucleotide substitution with gamma distribution to account for rate heterogeneity among sites, as this model achieved the lowest AIC score. Similar analyses were performed for 309 BKV complete genome sequences collected from GenBank (all items found by searching the NCBI nucleotide database for "BK polyomavirus complete genome").

To genotype the populations in the different samples, two approaches were performed. First, phylogenetic trees with all our samples and one of the reference strains for each genotype and subtype were obtained following the methodology explained previously. We determined the genotype as the shortest branch distance to one reference. The second approach was based on the methodology proposed by Luo and colleagues, in which point mutations specifically reported in particular genotypes are described [81] .

To estimate the evolutionary rates of BKV, intra-and inter-patient analyses were performed. Upon multiple alignment, consensus sequences were tested using RDP software [22] for potential recombination, and those with positive results using at least two different methods implemented in the RDP package were removed from the ensuing analyses. Samples showing mixtures of genotypes were also excluded since they could interfere with the calculation of the substitution rate. To estimate the intra-patient substitution rate, we used urine samples from twenty-five patients collected at different times (the first positive samples and after 6 months). To calculate substitutions per site per year, we considered all the different genomic positions between two different times that were fixed in the populations. All the substitutions that reverted to the reference base were not included since the possibility of them already being present in the ancestral population at a low frequency could not be ruled out. Thereby only substitutions appearing de novo and exhibiting a high proportion in the population (fixed substitutions, more than 80% of the reads) were included in this approach. With this methodology, we obtained conservative estimates.

To estimate the inter-patient substitution rate, the consensus sequence for the first available urine sample of each patient with a known date of sampling was selected. After being tested by RDP, a dataset of 79 BKV sequences was used to estimate the inter-patient evolutionary rate (sequences from 15 patients were potential recombinants). A maximum likelihood phylogenetic tree was obtained using Phyml [82] with the GTR model with gamma distribution and invariant sites to account for heterogeneity among sites. This model was determined to be the most appropriate for this dataset with jModeltest [83] . TempEst analysis was conducted to detect a correlation between genetic divergence and sampling time, and it assured a temporal signal in our inter-patient dataset (S2 Fig) [84] . We used Bayesian estimates of the evolutionary rate with dated tips as implemented in BEAST [85] . Based on previous results by Firth et al. [5] , we considered three molecular clock models (strict, relaxed log-normal uncorrelated, and relaxed exponential uncorrelated) and two demographic models (constant population size and Bayesian skyline). The GTR model with a gamma distribution and invariant sites was used as the nucleotide substitution model in all combinations. Model selection was performed through computation of the marginal likelihood using path sampling and stepping stone sampling analyses [86] . A lognormalPrior with a mean of 1 × 10 −6 and a standard deviation of 1.0 was used for the substitution rate. Two independent runs of 30 million steps with 10% burn-in were used to obtain the median and 95% high probability density intervals for the relevant parameters in each model. In all cases, the effective sample size was > 200, as checked with Tracer v. 1.5 (available from http://beast.bio.ed.ac.uk).

In addition, we used the recently developed method of Volz and Frost which uses maximum likelihood and least squares to estimate evolutionary rates and dates based on relaxed molecular clocks. The method is implemented in the R package treedater [87] .

To predict BKV-encoded T-cell epitopes that can be presented by HLA alleles, HLA high-resolution typing (2 fields) was done at the Etablissement Français du Sang Grand Est (Strasbourg) using a sequence-specific oligonucleotide technology. High-resolution typing data of HLA-A, -B and -C of 75 available donor / recipient pairs were used in each analysis, using the recipient's viral populations in each case (S5 Table) .

NetMHC 3.4 [88] was used to predict the peptide binding affinities of potential HLA class I epitopes occurring in BKV Dunlop reference proteins to HLA class I alleles of the patients and donors. Peptides eliciting a predicted IC 50 of less than 50 nM were considered epitopes. IC 50 values represent the concentration of the peptide that will displace 50% of a standard peptide from the HLA molecule. The lower the IC 50 value, the stronger is the affinity of the peptide for the tested HLA molecule. According to the NetMHC parameters, peptides with IC 50 < 50 nM were considered high-affinity binders. IC 50 values of 5 nM and 500 nM were also tested, but a cutoff of 50 nM was chosen as the best indicator (at a 5 nM threshold not enough peptides were predicted to bind; at 500 nM all possible peptides within a given proteins were predicted to bind). Furthermore, all predicted epitopes were tested with NetChop 3.1 [89] to predict whether the epitopes could have been produced by the human proteasome using default parameters. All strong binding peptides with a high likelihood of being correctly cleaved (score prediction higher than the default threshold of 0.5) were included in further analyses.

To calculate the fraction of substituted amino acids within and outside of HLA epitopes, the substitutions detected in the specific viral populations of each patient were mapped onto viral reference proteins, and the number of substitutions that occurred within and outside of the predicted epitopes were calculated for each protein and HLA allele of each patient and donor respectively. The counts were normalized to the number of potentially mutable amino acids per category (i.e., within or outside of epitopes), to make them comparable across proteins of varying length.

Statistical comparison of the internal and external fractions was performed with a onesided Wilcoxon signed test for each HLA allele to identify the direction of the difference. The P-values were Bonferroni corrected to account for multiple testing. Table. Inter-patient substitution rates. Summary of interpatient evolutionary rate estimates (substitutions/site/year, s/s/y) of BKV using different molecular clock (strict, relaxed log-normal uncorrelated and relaxed exponential uncorrelated) and demography (constant size and Bayesian skyline) models. Median and 95% high-density interval (HDI) intervals are shown. Estimates were obtained after two independent runs of 30 million generations each with a 10% burn-in. Convergence of the runs (ESS > 200) was checked with Tracer. (PDF) S5 Table. Allele frequencies of MHC class I in our cohort. Alleles are shown for HLA-A, -B and -C at the 2nd field of resolution for donors and recipients. (PDF)

Agnoprotein, VP1-3, large T antigen "LTA" and small t antigen "stA" predicted peptides presented by HLA-A, -B and -C from the BK polyomavirus Dunlop reference strain are listed. The starting and ending amino acid of the protein, length of the peptide, peptide sequence, and HLA allele that can present peptide are shown. The IC 50 for each peptide and specific HLA allele are also included. (PDF) 

Molecular clocks and the puzzle of RNA virus origins

Time-dependent rates of molecular evolution

From molecular genetics to phylodynamics: evolutionary relevance of mutation rates across viruses

Analyses of evolutionary dynamics in viruses are hindered by a timedependent bias in rate estimates

Using time-structured data to estimate evolutionary rates of double-stranded DNA viruses

Evolutionary changes of nucleotide sequences of papova viruses BKV and SV40: they are possibly hybrids

Evolution of four BK virus subtypes. Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases

Zika virus in the Americas: Early epidemiological and genetic findings

Temporal and spatial analysis of the 2014-2015 Ebola virus outbreak in West Africa

Human immunodeficiency virus genetic variation that can escape cytotoxic T cell recognition

How viruses escape from cytotoxic T lymphocytes: molecular parameters and players

Role of HLA Adaptation in HIV Evolution

New human papovavirus (B.K.) isolated from urine after renal transplantation

The Lancet Infectious diseases

Prospective study of polyomavirus type BK replication and nephropathy in renal-transplant recipients. The New England journal of medicine

BK virus as cause of haemorrhagic cystitis after bone marrow transplantation

Polyomavirus infection of renal allograft recipients: from latent infection to manifest disease

Muller's ratchet, epistasis and mutation effects

The evolutionary history and dynamics of bat rabies virus. Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases

Complete genome analysis of 33 ecologically and biologically diverse Rift Valley fever virus strains reveals widespread virus movement and low genetic diversity due to recent common ancestry

Attenuated live vaccine usage affects accurate measures of virus diversity and mutation rates in avian coronavirus infectious bronchitis virus. Virus research

Dynamics of molecular evolution and phylogeography of Barley yellow dwarf virus-PAV

The molecular epidemiology of dengue virus serotype 4 in

Phylogenomics and molecular evolution of foot-and-mouth disease virus

Full length genomes of genotype IIIA Hepatitis A Virus strains (1995-2008) from India and estimates of the evolutionary rates and ages. Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases

The mode and tempo of hepatitis C virus evolution within and among hosts

Molecular phylogenetic and evolutionary analyses of Muar strain of Japanese encephalitis virus reveal it is the missing fifth genotype. Infection, genetics and evolution: journal of molecular epidemiology and evolutionary genetics in infectious diseases

Spread, circulation, and evolution of the Middle East respiratory syndrome coronavirus

Complete genome sequences of porcine reproductive and respiratory syndrome viruses: perspectives on their temporal and spatial dynamics

Molecular evolutionary and epidemiological dynamics of genotypes 1G and 2B of rubella virus

Moderate mutation rate in the SARS coronavirus genome and its implications

Phylogenetic and evolutionary analyses of St. Louis encephalitis virus genomes. Molecular phylogenetics and evolution

Isolation and phylogenetic analysis of Mucambo virus (Venezuelan equine encephalitis complex subtype IIIA) in Trinidad

A constant rate of spontaneous mutation in DNA-based microbes

Role of JC virus agnoprotein in DNA repair

Cell tropism predicts long-term nucleotide substitution rates of mammalian RNA viruses

Natural killer cell receptor repertoire and their ligands, and the risk of CMV infection after kidney transplantation. American journal of transplantation: official journal of the American Society of Transplantation and the American Society of Transplant Surgeons

HLA and NK cell inhibitory receptor genes in resolving hepatitis C virus infection

BK polyomavirus infection and nephropathy: the virus-immune system interplay

The genetic predisposition of natural killer cell to BK virus-associated nephropathy in renal transplant patients. Kidney international

Identification of immunotherapeutic epitope of E5 protein of human papillomavirus-16: An in silico approach

Identification of human papillomavirus-16 E6 variation in cervical cancer and their impact on T and B cell epitopes

Knowledge-based virtual screening of HLA-A*0201-restricted CD8

cell epitope peptides from herpes simplex virus genome

MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Molecular biology and evolution

Genotyping schemes for polyomavirus BK, using gene-specific phylogenetic trees and single nucleotide polymorphism analysis

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood

jModelTest 2: more models, new heuristics and parallel computing

Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus evolution

Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular biology and evolution

Accurate model selection of relaxed molecular clocks in bayesian phylogenetics

Scalable relaxed clock phylogenetic dating. Virus evolution

Reliable prediction of T-cell epitopes using neural networks with novel sequence representations

The role of the proteasome in generating cytotoxic T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage

We thank Drs M. Javad Aman (Integrated BioTherapeutics, Gaithersburg, MD), Marco Colonna and David Wang (both at Washington University, St. Louis, MO), Rafael Sanjuán (University of Valencia, Spain), and Cathal Seoighe (National University of Ireland, Galway) for critical reading of this manuscript. We thank Dr Darren Martin for his help with the RDP package analysis and Dr Santiago F. Elena for earlier discussions. We thank Marion Le Gentil, Sandra Michel and Dr Clotilde Muller for technical assistance and/or help with patient recruitment.

The authors have declared that no competing interests exist.Fafi-Kremer, Nicodème Paul, Oliver Kohlbacher, Fernando González-Candelas, Seiamak Bahram.