key: cord-0597518-sbr69tv3 authors: Berdan, Emma L.; Blanckaert, Alexandre; Slotte, Tanja; Suh, Alexander; Westram, Anja M.; Fragata, Ines title: Unboxing mutations: Connecting mutation types with evolutionary consequences date: 2020-09-29 journal: nan DOI: nan sha: 79141ae7c2b5a68783b01ecfffebba90adedfc62 doc_id: 597518 cord_uid: sbr69tv3 Mutations are typically classified by their effects on the nucleotide sequence and by their size. Here, we argue that if our main aim is to understand the effect of mutations on evolutionary outcomes (such as adaptation or speciation), we need to instead consider their population genetic and genomic effects, from altering recombination rate to modifying chromatin. We start by reviewing known population genetic and genomic effects of different mutation types and connect these to the major evolutionary processes of drift and selection. We illustrate how mutation type can thus be linked with evolutionary outcomes and provide suggestions for further exploring and quantifying these relationships. This reframing lays a foundation for determining the evolutionary significance of different mutation types. Evolution depends on two major processes: (1) The random generation of genetic variation through mutation and its shuffling via recombination, and (2) subsequent changes in allele frequency through gene flow, genetic drift, and natural selection [1] . There are a wide variety of mutation types, which are often grouped by the change to the DNA sequence and size, ranging from point mutations to large structural variants [2] . However, from an evolutionary viewpoint, the most important characteristics of a mutation are its population genetic and genomic effects and how these may influence downstream evolutionary outcomes. Genetic studies have revealed that various types of mutations have drastically different population genetic and genomic effects ranging from changes in recombination rate to modifications of chromatin state. Furthermore, the population genetic effects of larger structural mutations must be considered at two levels; the effects of the mutated region as a whole and the effects of the loci within. To understand the evolutionary importance of various mutation types we must examine these effects. We propose that to understand the contributions of different types of mutations to evolution, we must merge the rich history of population, quantitative and evolutionary genetics [1, 3] with molecular genetics and genomics to connect mutations with evolutionary outcomes (Figure 1 ). A. From mutation type to evolutionary outcome. Colors match mutation type here and in Figure 2 . B. Interactions between mutation type and population genetic and genomic effects. Up and down arrows indicate an increase (up) or decrease (down) while a dash indicates no effect (known or unknown). Grey indicates that the relationship is variable and the mutation may have different effects depending on other factors. Indels and inversions are assumed to be large enough to affect pairing at meiosis. Smaller indels and inversions are expected to behave similarly to SNPs. The DNA methylation and chromatin state of a region may have strong implications on the regulatory environment of the genes present as well as on the rate of recombination. In general, increasing DNA methylation or heterochromatin will decrease recombination (crossovers are less likely in highly heterochromatic regions [31] ) and gene expression although this is not a steadfast rule. Transposable element (TEs) are in a constant arms race with the host and genomes have evolved multitudes of sequence-specific mechanisms for TE silencing via DNA methylation and repressive histone marks (e.g., H3K9me2 and H3K9me3) [32, 33] . Notably, this change in methylation and chromatin state need not be restricted to a new TE insertion itself but can also spread into adjacent genomic regions, e.g., up to 20 kb away from TE insertions in Drosophila melanogaster [34] . Centromeres contain both centromeric chromatin (characterized by the CENP-A histone which is the foundation for the kinetochore) and repressive histone marks [35] . We thus expect that a centromere shift along a chromosome will not only reduce the recombination rate in the new pericentromeric region as discussed below, but also spread DNA methylation and repressive chromatin marks through increased accumulation of TEs due to the reduced recombination. This generates a positive feedback loop between recombination rate, new TE insertions, and chromatin state as previously proposed for regions of low recombination in general [36] . 8 Translocations and fusions may also change methylation and chromatin state. For example, a study on humans found multiple differentially methylated positions with respect to a translocation, 93% of which mapped to the translocation breakpoints [37] . In mice with chromosomal fusions, the area of the chromosome with repressive histone marks was greatly expanded [38] . Research on large structural variants and methylation/chromatin state is in its infancy and more studies looking at non-model organisms outside the context of disease are sorely needed. Local recombination rate can be affected by mutations in different ways. First, the actual rate of recombination may change. Second, recombination may proceed normally but crossovers and segregation problems may lead to the creation of unbalanced gametes that lead to inviable offspring, reducing the effective recombination rate. In the following, we review how different mutations alter recombination rate. We also address this question more quantitatively in Box 1. While the molecular processes underlying recombination are not understood in detail, and may vary between taxa, Box 1 represents a first attempt towards a more quantitative comparison between different mutation types. In eukaryotes, double strand breaks form during the pairing of the homologs in meiosis and are repaired via two pathways. (1) A crossover event (CO), the outcome of which is visible as a chiasma later in meiosis or (2) the break is repaired as a non-crossover (NCO) event. Gene 9 conversion (GC) can occur in both pathways [39] . Different mutation types change the rate, distribution of recombination events, or pathway taken. Several types of mutations can affect the alignment and pairing of homologs at the beginning of the recombination process. In inversion heterozygotes, proper synapsis in the inverted region and subsequent crossing over are slightly reduced [40] . A large heterozygous indel will generally form "unpaired DNA loops" preventing COs [41] . CNVs can also affect recombination in heterozygotes due to differences in chromosome length, effectively reducing recombination by inhibiting proper pairing [42] . Recombination may even be affected outside of the mutated area. For example, COs were suppressed in the regions around large artificial insertions in C. elegans [43] . Other mutations change the distribution of COs. The presence of fusions changes the rates and distribution of chiasmata in both homo-and heterozygotes in a range of mammals [44] . For example, in mice (Mus musculus domesticus), the number of chiasma correlates negatively with the number of fusions but the distribution of the chiasmata along the chromosomal arm varies between homozygotes and heterozygotes [38, 45] . Centromeres and their surrounding pericentromeric regions generally reduce recombination [46] . As a result, centromere shifts reduce recombination in a new genomic region. Conversely, the region of the former inactivated centromere would then be free of centromere-associated recombination reduction. To our knowledge, these aspects of centromere evolution have yet to be appreciated in an evolutionary context. 10 Other mutations change the outcome of double-strand breaks and/or the GC rate. In inverted regions, double-strand breaks are more likely to be resolved as NCOs; however, the GC rate is unchanged [47, 48] Similarly, simple sequence repeats (SSRs) can act as recombination hotspots [50, 51] , especially for tri-and penta-nucleotide microsatellites [50] . This could be linked to the instability during crossing over due to length variation between sister chromatids [52] . Some studies also suggest that different microsatellite motifs have varying effects on recombination rate [50, 51, 53] . Indels may change chromosome length and therefore may alter the frequency of recombination between loci located either side of the indel [54] . This effect may become non-negligible if indels are large or numerous in a given genomic region. Recombination produces unbalanced gametes Recombination may occur normally but lead to the creation of unbalanced gametes. When inversions are heterozygous the inverted region can either pair homosynaptically or heterosynaptically [55] . Crossing over can only occur in homosynaptically paired regions and single crossovers in the inverted region will lead to gametes with unbalanced chromosomes (with potentially large duplications and deletions) in inversion heterozygotes [56] . Both types of pairing will decrease recombination overall but heterosynaptic pairing reduces the production of unbalanced gametes [55] . Alternatively, recombination can proceed normally and create balanced products but these products may fail to segregate properly. In translocation heterozygotes, the four involved chromosomes form a quadrivalent structure during meiosis. Segregation from this structure can lead to the creation of aneuploid gametes with a rate of 18% to more than 80% [57, 58] . Similarly, nondisjunction rates in fusion heterozygotes may be elevated, ranging from 1.2% to 30% depending on the system [44]. Effective population size (N e ) usually reflects the process of drift, which can be affected indirectly by mutations through their effect on recombination rate and their selection coefficient. Changes in recombination and selection (leading to selective sweeps or background selection) may affect local N e and sometimes the global N e as well [52, 53] . Different types of mutations can affect the transfer of genetic material between homologs. To quantify this effect, we derive the probability, ! ! ! ! ! ! , that two loci at position ! ! and ! ! !(with ! ! ! ! ! ), initially on the same homolog, are separated during meiosis. We present here only approximations obtained when the rate of double strand break (DSB) is sufficiently small (see Supplement for detailed expressions). In the absence of a structural variant, the probability that two loci are separated by recombination is given by: and ! !" as the probabilities that a DSB leads respectively to gene conversion (GC) and a crossover and ! the length of a GC tract (! ! ! ! ! ! !). The first term corresponds to one locus being transferred by GC and the second to a crossover between the two focal loci. Insertion/Deletion: Recombination only happens in the ancestral (deletion) or derived (insertion) homozygote (its frequency denoted ! !! ). The two loci are separated with probability: Single crossovers occurring within the inversion breakpoints in heterozygotes form gametes with unbalanced chromosomes, leading to inviable zygotes. Therefore, heterozygotes are underdominant and recombination only happens through GC or double crossovers. The probability that two loci in the inverted region are separated is given by (assuming ! !"# ! ! ! !" ! !!!): The first term corresponds to a recombination event happening between the focal loci in the homozygotes, whose frequency is increased due to underdominant heterozygotes. The second term corresponds to GC and remains unaffected. Double crossovers do not play a significant role under those conditions. For chromosomal fusions, homologs in heterozygotes may fail to segregate properly (with probability ! !"# ), producing unbalanced gametes and reducing the contribution of heterozygotes to the next generation. In addition, the chance of a crossover decreases if at least one fused chromosome is involved. The two loci are separated with probability: The contribution of crossovers is reduced by a factor ! ! !! !" ! ! !! !, which depends on the genotypes frequencies and captures both selection against the heterozygote and the reduced crossover probability when at least one fused chromosome is involved. Similarly, homologs in heterozygotes may fail to segregate properly, producing unbalanced gametes and reducing the contribution of heterozygotes to the next generation. The GC rate in heterozygotes is also reduced. The two loci are separated with probability: The contribution of gene conversion is reduced by a factor ! ! ! !" , which depends on the frequency of the heterozygotes and captures both the effect of selection against, and the reduction of gene conversion within, heterozygotes. Figure Box 1. Probability that two loci on the same structural variant are separated due to recombination as a function of the distance between the two loci. Parameters: The factors ! ! and ! ! are calculated using the expressions given in the Supplement; here ! ! ! !!!! and ! ! ! !!!!. Recombination and therefore N e can be reduced by a variety of structural variants. For example, loci within an indel will experience this reduction in N e twice, once due to the reduction in recombination (see Box 1) and once due to lower number of copies of the indel region. The indel as a locus, if neutral, will not be affected by either of these processes. Similarly, recombination between different arrangements of a polymorphic inversion is lowered and the arrangements can be viewed as two smaller and partially isolated populations [55, 56] . Translocations and fusions will experience a similar effect. Changes in fitness due to mutations can also lead reduction in N e . For example, transposable elements are often considered weakly deleterious [42, 54] . Translocations or chromosome fusions lead to high rates of non-disjunction and subsequent negative selection against heterozygotes. Alternatively, centromere shifts may be under positive selection if they distort their segregation during female meiosis of heterozygotes, a process known as centromere drive [59] . All of the population genetic and genomic effects summarized above will affect the evolutionary processes: selection and drift (summarized in Table 1 ). This is quantified in the selection coefficient which measures differences in relative fitness, encompassing many population genetic effects. The selection coefficient thus bridges mutation and evolutionary process. While the selection coefficient is essential to determine/predict evolutionary outcomes, its estimation is highly complex, due to the large number of levels at which it can be affected. All mutation types can be neutral or under positive/negative selection. The selection coefficient of a mutation depends on a multitude of factors including (1) the genomic context, i.e. whether it alters coding, regulatory, or intergenic regions, (2) whether it leads to sequence changes or positional shifts and (3) the selective environment (both extrinsic and intrinsic) where the change occurs [8, 20, 32, 36, [46] [47] [48] [49] [50] [51] [52] 60, 61] . Furthermore, duplicated regions, such as CNVs, have additional effects as they may free up selective constraints and can lead to the emergence of new [73] showed a long flat deleterious tail and a high peak surrounding the neutral region for TE mutations. Most CNVs are expected to be found at the extremes of the distribution with either largely beneficial or deleterious effects [28] . Estimating the DFE for more mutation types and in different environments is a crucial step moving forward. ↑ In the case of overdominance via beneficial alleles, an increase in h leads to selection being more efficient, with an increase in frequency of the heterozygote ↑ In the case of frequency-dependent selection, an increase in h will help spread the rare derived allele due to penetrance in the heterozygote. Similarly, a decrease in h will help spread the rare ancestral allele. ↓ An increase in h leads to selection being more efficient. Therefore the relative contribution of drift compared to selection is reduced. ↑ Increase in magnitude of the selection coefficient increases the effect of directional or negative selection ↑ Increase in the magnitude of selection coefficient in heterozygotes will speed up the decrease of homozygotes frequency under overdominance ↑ Under negative frequency dependent selection, larger selection coefficients will lead to a faster increase of the rare allele (assuming it has an increased selection coefficient). ↓ Increase in the magnitude of the selection coefficient may make it less likely that the mutation is fixed or lost due to genetic drift A variety of bottom up and top down approaches, including DFE estimation, can be used to bridge the gap between mutation type and evolutionary processes. Starting from the bottom, more empirical studies examining the population genetic effects of mutations are sorely needed 18 for better characterization of mutation effects and to determine how these effects vary across taxa. The information from these studies can also be incorporated into theoretical models (see Box 1 for an example) to generate new predictions. From the top down, population genomic studies can be strengthened by examining multiple types of mutations together. This will require collecting different types of genomic data sets (e.g. short-and long-read re-sequencing and mapping crosses) from the same population and developing detection pipelines targeted at different mutation types. Feeding this information to population genetic models would allow for indirect estimation of DFEs for all mutational types. Along with better estimates of mutation rates, these approaches will enable the quantification of the evolutionary significance of different mutation types. Quantifying the relationships between mutation type, population genetic effects and the major evolutionary processes, selection and genetic drift (summarized in Table 1 ), allows us to draw connections to evolutionary outcomes. For example, speciation requires the build-up of linkage disequilibrium both within and between different reproductive isolation barriers [74] . Mutations that reduce recombination should aid speciation with gene flow by protecting this nascent linkage disequilibrium. We thus predict that mutations such as inversions, indels, TEs, and centromere shifts might be major drivers of speciation events [75] . A critical next step would be testing some of these hypotheses, for example using a meta-analysis (see Outstanding Questions). Our framework highlights the fact that mutations may affect evolution in several ways and that many different mutation types have similar population genetic effects (Box 1, Figure 1 ). We suggest that shifting the focus away from the structure or size of a mutation to its effects facilitates directly linking mutations with evolutionary outcomes. To better understand the link between mutation types and evolutionary outcomes, comparable measurements of mutation rates as well as the population genetic and genomic effects of different mutation types are urgently needed. In particular, much of the current data comes from either model systems or research on diseases. Characterizing different mutation types in nonmodel organisms and natural populations is critical for the future. Understanding the evolutionary significance of different mutations will require unboxing them and viewing their effects in a larger population genetic context. What are the rates of occurrence for different mutation types and does this vary within or between species? Comprehensive studies across taxa using both natural populations and traditional mutation Genomic studies that analyze a variety of mutation types rather than 1-2 will provide key insights into the relative roles of different mutation types in different outcomes (e.g. adaptation). This type of data will be key for allowing more targeted analyses to be done in the future. When such studies have accumulated, a meta-analysis testing for a relationship between mutation type and evolutionary outcome is the next step towards answering this question more generally. Single Nucleotide Polymorphism (SNP) -A single base pair substitution in a specific position in the genome. Centromere shift -Repositioning of the centromere along the chromosome. -Describes the proportion of new mutations that are beneficial, deleterious or neutral in a specific environment. Transposable Element (TE) -A selfish genetic element propagating via copy-and-paste or cutand-paste. Hill-Robertson effect -describes selection having a reduced effect when selected sites are in tight linkage with other selected sites. Translocation (also called balanced or reciprocal translocation) -Two pieces of nonhomologous chromosomes that have broken off and been switched. Homosynaptic pairing -When the two homologs correctly synapse during prophase 1. Heterosynaptic pairing -When non-homologous (heterologous) synapsis occurs during prophase 1. Effective population size, N e -The equivalent population size of a WrightFisher population that will generate population genetic statistics closest to the ones of the focal population. Fisher population that will generate the same linkage disequilibrium patterns as those found in the focal population. Indel -Small genetic variant from 10 to 10 000 bp that can be either inserted or deleted from the genome Inversion -A segment of the genome that is rotated 180 degrees. 2 8 ,9: ; 34 # < = >? @ 7 8 ,9: ; 34 !& 2 8 ,9: ; 34 # < A >< = >% < A >% BC< = < = >% DC? @ >% ' 7 8 ,9: ; 34 !& 2 8 ,9: ; 34 # < A >< = >% 7 8 ,9: ; !$ % " $ & #!1)>7#)=)%1!$.? !$ % " $ & # @ !$ & . $ % #!8 "# ' 8 #: #7 )*+ ' *!( )*+ & #( ! B'!$6%!&%1$!.=!$6%!(77%'-)E,!5%!5)##!3(#3/#($%!$6%!7&.0(0)#)$+!$6($!$5.!#.3),!=./'-!)'!(!1$&/3$/&(#! *(&)('$,!(&%!1%7(&($%-8!96%!('3%1$&(#!1$($%!)1!-%'.$%-!E!('-!$ Reflections on reflections: ecology and evolutionary biology A Roadmap for Understanding the Evolutionary Significance of Structural Genomic Variation Elements of Evolutionary Genetics Estimating mutation rate: how to count mutations? Structural variation in the sequencing era Polyploidy: A biological force from cells to ecosystems A cytogenetic survey of 14,835 consecutive liveborns Chromosomal control of pig populations in France: 2002-2006 survey Variation of spontaneous occurrence rates of chromosomal aberrations in the second chromosomes of Drosophila melanogaster A benchmark of transposon insertion detection tools using real data Computational tools to unmask transposable elements Pedigree-based estimation of human mobile element retrotransposition rates Evolution of the insertion-deletion mutation rate across the tree of life DeNovoGear: de novo indel and point mutation discovery and &A#*5$%68$&$9%:&%68+%+*:%68-65-8/&% &02 ' 01) *8+ !! %68(/5)&C%; &%$/)56/5)8-%A8)#8+/E 2 )$%<&+&)8/&%5+;8-8+6&C%6 ) ))), ;( $ ()) ))) We thank to Roger Butlin, Kerstin Johannesson, Valentina Peona, Rike Stelkens, Julie Blommaert, Nick Barton, and João Alpedrinha for helpful comments that improved the manuscript. The authors acknowledge funding from the Swedish Research Council Formas 3)# "0,*+# +),7# 1%,,)*5%.$*# +%# ,)1%7'0.-+0%.# 3-55).0.<# 40+3%&+# 13-.<)# 0.# +3)# +4%# 3%7%=2<%+)*#@-"+),#*)()1+0%.A>#;3)#*)1%.$#+),7#"%1&*)*#%.#+3)#3)+),%=2<%+)*#@-"+),#*)()1+0%.A6# 40+3# +3)# +4%# (%10# *)5-,-+)$# '2# -# 1,%**%:% &'()* !+ , " + -# . /!+ -0 + , #1 23 $ 40% $ 5 6( 78 $ ! 9": !% 0 8#;< =1 >2 0% $ 5 6( ! 9":40% $ 5 6( 78 $ ! 9": !% 0 8#;< =1 >2 0% $ 5 6( ! 9":