key: cord-0320520-aqa2fuq4 authors: Angst, Pascal; Ebert, Dieter; Fields, Peter D. title: Demographic history shapes genomic variation in an intracellular parasite with a wide geographic distribution date: 2021-11-04 journal: bioRxiv DOI: 10.1101/2021.11.02.466881 sha: b0ab803fffec8116d187686e5d1fc00293ca8755 doc_id: 320520 cord_uid: aqa2fuq4 Analyzing variation in a species’ genomic diversity can provide insights into its historical demography, biogeography and population structure, and thus, its ecology and evolution. Although such studies are rarely undertaken for parasites, they can be highly revealing because of the parasite’s coevolutionary relationships with hosts. Modes of reproduction and transmission are thought to be strong determinants of genomic diversity for parasites and vary widely among microsporidia (fungal-related intracellular parasites), which are known to have high intraspecific genetic diversity and interspecific variation in genome architecture. Here we explore genomic variation in the microsporidium Hamiltosporidium, a parasite of the freshwater crustacean Daphnia magna, looking especially at which factors contribute to nucleotide variation. Genomic samples from 18 Eurasian populations and a new, long-read based reference genome were used to determine the roles that reproduction mode, transmission mode and geography play in determining population structure and demographic history. We demonstrate two main H. tvaerminnensis lineages and a pattern of isolation-by-distance, but note an absence of congruence between these two parasite lineages and the two Eurasian host lineages. We suggest a comparatively recent parasite spread through Northern Eurasian host populations after a change from vertical to mixed-mode transmission and the loss of sexual reproduction. While gaining knowledge about the ecology and evolution of this focal parasite, we also identify common features that shape variation in genomic diversity for many parasites, e.g., distinct modes of reproduction and the intertwining of host–parasite demographies. Uncovering the ecological and evolutionary forces that drive variation in genetic diversity within and 47 among populations across a species' range is important for understanding biological processes of 48 basic and applied interest. Factors such as historical conditions (e.g., phylogeography and population 49 structure), environmental conditions (e.g., adaptation to local climate), and stochastic processes 50 (e.g., genetic drift and founder events) may contribute to patterns of genetic variation in a species 51 Gagneux & Small, 2007) . For the Gram-negative 62 bacterium Helicobacter pylori, a known risk factor for stomach cancer in humans, the link between 63 host and parasite biogeography and the parasite's mode of transmission was shown to be an 64 important factor in the evolution of the system (Montano et al., 2015) : similarly, H. pylori, like its 65 human host, shows a continuous loss of genetic diversity with increasing geographic distance from 66 East Africa. Its vertical mode of transmission most likely contributed to the geographical distribution 67 of genetic diversity shared between the parasite and its host. The interplay between co-68 phylogeography and mode of transmission here highlights the importance of studying interactions 69 between factors that drive genetic diversity when trying to understand host and parasite evolution. 70 base stats function prcomp. The RDA was done in R using the package vegan v.2.5-7 (Oksanen et al., 252 2020) with significance assessed with 1,000 permutations. In addition to looking at whole-genome 253 diversity, we also examined the phylogenetic signal in protein-coding regions of the genome directly. 254 As the aforementioned pseudo-haplotypes are inappropriate for a number of phylogenetic methods 255 that rely on haplotype information, we used ambiguity codes, an alternative method for unknown 256 phase. We used the GATK-functions SelectVariants and FastaAlternateReferenceMaker to generate 257 variant-imputed versions of the H. tvaerminnensis genome with ambiguity codes. Single copy 258 ortholog genes were extracted from the alternative references using gff2fasta.pl and concatenated 259 into a single sequence. The larger alignment of these individual samples was filtered for four-fold 260 degenerate sites with MEGA v.7.16.0617 (Kumar et al., 2016 We assessed the amount of genomic variation that differed or was shared between samples by 341 applying a PCA to the whole-genome SNP data (Table S2 ). The first two eigenvectors of the PCA ( analysis applied to the whole-genome SNP data resulted in the same groupings (Fig. S2 ), as did a 347 Bayesian phylogenetic tree estimation based on four-fold degenerate sites of the 2,229 single copy 348 orthologs ( Fig. 1) . After finding these groupings consistently in all of our population structure 349 analyses, we refer to them as Northern and Southern H. tvaerminnensis lineages. 350 We inferred the demographic history for the two H. tvaerminnensis lineages separately. The SNP 352 density and the π value showed that the Northern lineage (11 SNPs per Kbp) was considerably less 353 SNP-dense and less diverse (π = 0.005) than the Southern lineage (19 SNPs per Kbp; π = 0.008; 354 Wilcoxon signed rank test: p < 0.001). We found a positive correlation between pairwise 355 differentiation and geographical distance for the overall H. tvaerminnensis data and the species' 356 lineages using distance-based Moran's eigenvector maps (dbMEM) analysis by redundancy analysis 357 (RDA) (overall: R 2 = 0.15, p = 0.197; Northern: R 2 = 0.43, p = 0.001; Southern: R 2 = 0.82, p = 0.003), 358 which has been suggested as preferable over the formerly used Mantel tests (Legendre et al., 2015) . 359 The much higher correlation coefficients within the lineages, as compared to the total dataset, 360 strongly underline the importance of considering the phylogeography within the lineages. To further understand the evolution of these lineages, we analyzed the changes of Ne across time 375 using software based on a Pairwise Sequentially Markovian Coalescent (PSMC) model to infer 376 population history. Additionally, we did a very rough time calibration using publicly available 377 estimates of mutation rates from the well-studied 'red yeast' fungi Rhodotorula toruloides (Long et 378 al., 2016) and previously reported generation times of microsporidia (Kramer, 1965) . PSMC revealed 379 a relatively consistent pattern across samples from the same species and lineage (Fig. 3) . Ancient Ne, 380 in a loss of sexual reproduction (Fig. 4) , which has been suggested to take place in the second host 456 (Mangin et al., 1995) , with profound consequences for the evolution of genetic diversity of H. 457 Because microsporidian parasites and their hosts interact closely, it might be assumed that parasite 461 and host phylogeography should be congruent-the within species equivalent of Fahrenholz rule 462 (Fahrenholz, 1913) , which states that host and parasite phylogenies are expected to be congruent to 463 each other, i.e., show a pattern of co-cladogenesis. It is now known, however, that this is rarely the 464 case (Page, 2003 Eurasia/Eastern Asia split, we found a North-South split, a signal consistent across all the 479 phylogeographic analyses we conducted (Fig. 1, Fig. S2) To test this theory, we used our whole-genome data to determine the relative strength of 521 selection in Hamiltosporidium. Of the different methods used to estimate α in our study, dfe-alpha 522 relies on a specific model of demography, any deviation from which might confound this method. 523 The SFS, as well as historical changes in Ne were inferred (see section below; Fig. 2 Table 2) , suggesting that (a) positive selection is either weak or constrained, or (b) the 528 underlying likelihood model of dfe-alpha was not appropriate for this data. Test and reference 529 regions were similarly affected by demography, as the synonymous and nonsynonymous SFS have 530 the same shape (Fig. 2) . To ensure a demographically robust analysis for our estimation of α, we 531 conducted both a standard MKT and a more recent approach called asymptotic MKT (Haller & 532 Messer, 2017). It has been suggested that this latter approach can accommodate a wider range of 533 demographic histories by accounting for the effects of low frequency, deleterious variants. α values 534 estimated using MKTs were similar to those using dfe-alpha. Also, the overall dataset always had the 535 highest estimate, while the Northern lineage always had the lowest. Furthermore, among all the 536 species for which α and the rate of adaptive divergence relative to neutral divergence (ωA), has been 537 estimated, H. tvaerminnensis is on the lower end of the selection efficacy distribution (e.g., Galtier Historical processes are a major contributor to present-day distribution of genetic variation, 544 influencing both similarities and differences in the co-phylogeography that a parasite might share 545 with its host. The effective population size history of H. tvaerminnensis constructed with the PSMC 546 method showed a rather consistent pattern across samples from the same lineage but revealed 547 differences among lineages (Fig. 3A) . Since the mutation rates and generation times of H. 548 tvaerminnensis and H. magnivora are unknown, we used available estimates for related species to 549 estimate time intervals of interest. The more ancient decrease in Ne, which is observable in many 550 species (Hewitt, 1996; Holder et al., 1999; Stewart et al., 2010) , is thought to stem from the species' 551 range contraction, which happened when the glaciers expanded at the beginning of the last 552 glaciation about 110,000 years ago. It is followed by an increase in Ne, presumably after the glaciers 553 retreated about 10,000 years ago, as also described for the host D. magna . 554 Samples that originated from hosts in the Middle East, a region where glaciers had no direct impact 555 and where the presumed glacial refugia of the Western Eurasian D. magna clade was located, 556 consistently seem to be more stable during the whole period of glacial expansion and recession. 557 Numerous other species have shown patterns consistent with southern refugia during this period 558 (Stewart et al., 2010) . Importantly, our Spanish samples of the Southern lineage show an Ne pattern 559 of change that is more similar to non-Middle Eastern samples (Fig. 3A) , coinciding with the absence 560 of a glacial refugium for hosts in southwestern Eurasia . Thus, Daphnia hosts and 561 their parasites presumably recolonized Western Europe from the same refugium, including the 562 Hispanic peninsula after the last glaciation. 563 564 A second decrease where Ne dips to very low estimates is observed in more recent times for all H. (Fig. 4) . The loss of sexual 581 reproduction in the absence of a second host, which is believed to be necessary for sexual 582 reproduction in H. magnivora (Mangin et al., 1995) , reduced Ne. However, the transition from sexual 583 to exclusively asexual reproduction would leave a very distinct historical pattern of change in Ne as 584 inferred via PSMC. Specifically, because individual homologous chromosomes can no longer 585 recombine, one would expect to see an upward, nearly vertical trajectory in Ne as mutations 586 accumulate on separate chromosomal copies, which our reconstructed history of H. tvaerminnensis' 587 population size does not show, or at least not deep enough in time when such a transition would be 588 easier to pin down with the PSMC method. This implies that exclusive asexuality is probably not 589 ancient, coinciding with the SFS of the Northern lineage, which shows a distinct peak at 0.5 caused 590 by fixed heterozygosity. Being exclusive asexual for a long period of time would provide sufficient 591 time for individual haplotypes to accumulate their own distinct mutations, which is not what the SFS 592 shows (Fig. 2) The evidence presented here of a non-overlapping phylogeography between two partners in an 630 intimate host-parasite system with a wide geographic distribution has several implications. It 631 suggests that the parasite, H. tvaerminnensis, might be a rather young parasite of its host, D. magna. 632 Else, it could not be simultaneously wide-spread and have a phylogeography distinct from its host. 633 Also, it implies that co-dispersal of host and parasite, while common, is not the only form of 634 dispersal. Thus our research underscores the importance and potential of using samples from the 635 whole-species range in population genomic studies. Regarding microsporidia with long genomes, we 636 could strengthen the assumption of a reduced selection efficacy, likely as a result of high levels of 637 genetic drift that ultimately caused genome expansion . By quantifying genomic 638 variation and selection efficacy in other microsporidia, more evidence for this hypothesis and for the 639 evolution of different genome architectures in this taxon could be provided. However, while our 640 study reveals some general principles on the evolution of genomic variation in a parasite, it also 641 shows how specific factors shape genomic variation, some of these (or their combination) being 642 possibly unique to this system (e.g., loss of the second host and the shift to exclusive asexuality in 643 the absence of the second host). It therefore provides a case study for the genomic evolution of a 644 microsporidium at a fine scale, which awaits comparison to other systems that will help paint a 645 bigger picture of the evolution of specific obligate parasites. RaGOO: Fast and accurate reference-guided scaffolding of draft 656 genomes Genetic diversity of Daphnia magna populations enhances 658 resistance to parasites Spatial population genetic structure of a bacterial 661 parasite in close coevolution with its host Life history determines genetic 664 structure and evolutionary potential of host-parasite interactions Genomics of sorghum local 669 adaptation to a parasitic plant Trimmomatic: A flexible trimmer for Illumina 672 sequence data BEAST 2.5: An advanced software platform for Bayesian 678 evolutionary analysis Broad Institute, GitHub repository Parasite-mediated selection 683 in a natural metapopulation of Daphnia magna Inferring species divergence times 686 using pairwise sequential Markovian coalescent modelling and low-coverage genomic data Coalescence times and the Meselson effect in asexual eukaryotes SeqinR 1.0-2: A Contributed Package to the R Project for Statistical 695 Computing Devoted to Biological Sequences Retrieval and Analysis Structural Approaches to Sequence Evolution: 697 Molecules, Networks, Populations Nonhybrid, finished 701 microbial genome assemblies from long-read SMRT sequencing data Microsporidia: Eukaryotic Intracellular Parasites Shaped by Gene Loss and 704 Draft genome sequence of 707 the Daphnia pathogen Octosporea bayeri: Insights into the gene content of a large 708 microsporidian genome and a model for host-parasite interactions Assessing the Accuracy and Power of Population Genetic 711 Inference from Low-Pass Next-Generation Sequencing Data The variant call 715 format and VCFtools Transposable element abundance 718 correlates with mode of transmission in microsporidian parasites Ecological implications of 721 parasites in natural Daphnia populations A Phylodynamic Workflow to Rapidly Gain Insights into the Dispersal History and Dynamics 727 of SARS-CoV-2 Lineages adespatial: Multivariate Multiscale Spatial 731 Analysis Bayesian Coalescent Inference 734 of Past Population Dynamics from Molecular Sequences Ecology, epidemiology, and evolution of parasitism in Daphnia Host-parasite coevolution: Insights from the Daphnia-parasite model system Host-parasite co-evolution and its genomic signature Temporal and Spatial Dynamics of Parasite 743 Richness in a Daphnia Metapopulation Estimating the rate of adaptive molecular evolution in 746 the presence of slightly deleterious mutations and population size change Ectoparasiten und Abstammungslehre Mitogenome phylogeographic analysis of a planktonic crustacean Genes mirror geography in Global phylogeography of Mycobacterium tuberculosis and 755 implications for tuberculosis product development Adaptive Protein Evolution in Animals and the Effective Population Size 758 Recent worldwide expansion of Nosema ceranae (Microsporidia) in Apis mellifera 762 populations inferred from multilocus patterns of genetic variation. Infection, Genetics and 763 Evolution Ecological correlates between cladocerans and their endoparasites 765 from permanent and rain pools: Patterns in community composition and diversity Cytological and molecular description 768 of Hamiltosporidium tvaerminnensis gen. Et sp. Nov., a microsporidian parasite of Daphnia 769 magna, and establishment of Hamiltosporidium magnivora comb Microsporidia with Vertical Transmission Were Likely Shaped by 773 Nonadaptive Processes Microsatellite and single-776 nucleotide polymorphisms indicate recurrent transitions to asexuality in a microsporidian 777 parasite Single-nucleotide polymorphisms of two closely 780 related microsporidian parasites suggest a clonal population expansion after the last 781 glaciation asymptoticMK: A Web-Based Tool for the Asymptotic 783 Comparative demography elucidates 786 the longevity of parasitic and symbiotic relationships Some genetic consequences of ice ages, and their role in divergence and 789 speciation A Test of the Glacial Refugium Hypothesis 792 Using Patterns of Mitochondrial and Nuclear Dna Sequence Variation in Rock Ptarmigan 793 (Lagopus mutus) Shared 796 signatures of parasitism and phylogenomics unite Cryptomycota and microsporidia Plasmodium interspersed 799 repeats: The major multigene superfamily of malaria parasites The Impact of 802 Purifying and Background Selection on the Inference of Population History: Problems and 803 Prospects adegenet: A R package for the multivariate analysis of genetic markers MAFFT: A novel method for rapid multiple 808 sequence alignment based on fast Fourier transform MAFFT Multiple Sequence Alignment Software Version 7: 811 Improvements in Performance and Usability Joint inference of the distribution of fitness effects of 814 deleterious mutations and population demography based on nucleotide polymorphism 815 frequencies Canu: 817 Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation pixy: Unbiased estimation of nucleotide diversity and 820 divergence in the presence of missing data Generation time of the microsporidian Octosporea muscaedomesticae flu in 823 adult Phormia regina (Meigen) (Diptera, Calliphoridae) MEGA7: Molecular Evolutionary Genetics Analysis 826 Version 7.0 for Bigger Datasets Genetic, ecological and geographic covariables 829 explaining host range and specificity of a microsporidian parasite Coevolution Should the Mantel test be used in spatial analysis? 834 Plotrix: A package in the red light Phylogeography and genetic diversity 838 of the microbivalve Kidderia subquadrata , reveals new data from West Antarctic Peninsula A statistical framework for SNP calling, mutation discovery, association mapping and 841 population genetical parameter estimation from sequencing data Aligning sequence reads, clone sequences and assembly contigs with Inference of Human Population History From Whole Genome Sequence 846 of A Single Individual The Sequence Alignment/Map format and SAMtools OrthoMCL: Identification of Ortholog Groups for 851 PSMC (pairwise sequentially Markovian coalescent) analysis of 854 RAD (restriction site associated DNA) sequencing data Similar Mutation Rates but Highly Diverse Mutation Spectra in Ascomycete and Basidiomycete Yeasts Phylogeny-aware alignment with PRANK Virulence and transmission modes of two 863 microsporidia in Daphnia magna WhatsHap: Fast and accurate read-based phasing Genomes reveal marked differences in the adaptive evolution between orangutan 871 species Adaptive protein evolution at the Adh locus in Drosophila The Genome Analysis 876 Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data Worldwide Population Structure, Long-Term Demography, and Local Adaptation 880 of Helicobacter pylori The peopling of the Pacific from a bacterial perspective Microsporidia Species Attribute Database and Analysis of the Extensive Ecological and 888 Phenotypic Diversity of Microsporidia iMKT: 891 The integrative McDonald and Kreitman test Mathematical model for studying genetic variation in terms of 894 restriction endonucleases Geography and 898 host biogeography matter for understanding the phylogeography of a parasite. Molecular 899 Phylogenetics and Evolution vegan: Community Ecology Package Genetic resistance and specificity in sister taxa of Daphnia: 905 Insights from the range of host susceptibilities The role of selection in driving 908 landscape genomic structure of the waterflea Daphnia magna Genomic signature of natural and anthropogenic 911 stress in wild populations of the waterflea Daphnia magna: Validation in space, time and 912 experimental evolution geodist: Fast, Dependency-Free Geodesic Distance 915 Calculations Tangled Trees: Phylogeny, Cospeciation, and Coevolution Microsporidian genomes harbor a diverse array of transposable elements that demonstrate an 921 ancestry of horizontal exchange with metazoans A population genomic approach to map recent positive 924 selection in model species Genome analyses suggest 927 the presence of polyploidy and recent human-driven expansions in eight global populations of 928 the honeybee pathogen Nosema ceranae Efficient Swiss Army Knife for Population Genomic Analyses in R The Ordospora colligata Genome: Evolution of Extreme Reduction in Microsporidia and Host-To-Parasite Horizontal 935 Complete Genome Sequences from Three Genetically Distinct Strains Reveal 938 High Intraspecies Genetic Diversity in the Microsporidian Encephalitozoon cuniculi BEDTools: The Swiss-army tool for genome feature analysis R: The R Project for Statistical Computing. R Foundation for Statistical 944 Computing MACSE: Multiple Alignment of 946 Coding SEquences Accounting for Frameshifts and Stop Codons phytools: An R package for phylogenetic comparative biology Purge Haplotigs: Allelic contig 952 reassignment for third-gen diploid genome assemblies Is adaptation 955 limited by mutation? A timescale-dependent effect of genetic diversity on the adaptive 956 substitution rate in animals Evolution of reproductive mode variation 959 and host associations in a sexual-asexual complex of aphid parasitoids Temperature-versus precipitation-limitation shape local temperature 962 tolerance in a Holarctic freshwater crustacean An island-hopping bird 965 reveals how founder events shape genome-wide divergence BUSCO: Assessing Genome Assembly and 968 Annotation Completeness Microsporidia: Nosematidae) and Bees (Hymenoptera: Apidae) Suggests Both Cospeciation and a Host-switch Refugia revisited: Individualistic 975 responses of species in space and time The water flea Daphnia-A "new" model system for ecology and evolution? 978 From FastQ data to high confidence variant calls: The 982 Genome Analysis Toolkit best practices pipeline Evolution of microsporidia: An extremely successful group of 985 eukaryotic intracellular parasites Pilon: An Integrated Tool for 989 Comprehensive Microbial Variant Detection and Genome Assembly Improvement Prevalence and 993 Population Genetics Analysis of Enterocytozoon bieneusi in Dairy Cattle in China Population genomics reveals the origin and asexual 998 evolution of human infective trypanosomes. ELife, 5, e11473 Wolbachia: Master manipulators of invertebrate 1001 biology Genome 1003 sequence surveys of Brachiola algerae and Edhazardia aedis reveal microsporidia with low 1004 gene densities A high-1006 performance computing toolset for relatedness and principal component analysis of SNP data All analysis scripts as well as raw and processed data are available at Zenodo XXXX (see git repo for all accessions) Author contributions All authors designed the study. PA analysed the data and wrote the manuscript. All authors reviewed 1016 the manuscript