key: cord-0254787-g9yz6l0g authors: Peyrégne, Stéphane; Kelso, Janet; Peter, Benjamin M.; Pääbo, Svante title: The evolutionary history of human spindle genes includes back-and-forth gene flow with Neandertals date: 2021-11-29 journal: bioRxiv DOI: 10.1101/2021.11.29.470407 sha: a15711ba861c9820a9d6180907c2947a9e7a93d2 doc_id: 254787 cord_uid: g9yz6l0g Proteins associated with the spindle apparatus, a cytoskeletal structure that ensures the proper segregation of chromosomes during cell division, experienced an unusual number of amino acid substitutions in modern humans after the split from the ancestors of Neandertals and Denisovans. Here, we analyze the history of these substitutions and show that some of the genes in which they occur may have been targets of positive selection. We also find that the two changes in the kinetochore scaffold 1 (KNL1) protein, previously believed to be specific to modern humans, were present in some Neandertals. We show that the KNL1 gene of these Neandertals shared a common ancestor with present-day Africans about 200,000 years ago due to gene flow from the ancestors (or relatives) of modern humans into Neandertals. Subsequently, some non-Africans inherited this modern human-like gene variant from Neandertals, but none inherited the ancestral gene variants. These results add to the growing evidence of early contacts between modern humans and archaic groups in Eurasia and illustrate the intricate relationships among these groups. The ancestors of Neandertals and Denisovans diverged from those of modern humans between 522 and 35 634 thousand years ago (kya) (1). Differences between modern humans and Neandertals, Denisovans 36 and other hominins can therefore reveal traits that changed in modern humans over the past half-37 million years and are specific to modern humans. The paleontological and archaeological records 38 provide information about morphological and cultural traits (2, 3), yet other traits remain inaccessible by 39 such approaches. The retrieval of DNA from archaic human remains and the reconstruction of 40 Neandertal and Denisovan genomes (4-6) open an additional approach to understanding what sets 41 modern and archaic humans apart. 42 Because large numbers of present-day human genomes have been sequenced while few archaic 43 genomes are available, the comparisons of Neandertal and Denisovan genomes are particularly useful 44 for identifying those nucleotide changes that occurred on the modern human lineage and reached 45 fixation (or high frequencies) in present-day people. According to one estimate (6), there are 31,389 46 such differences of which only 96 change the amino acid sequence of 87 proteins. Three of these 47 missense changes occur in a single gene, sperm associated antigen 5 (SPAG5), which encodes a protein 48 associated with the spindle apparatus (7, 8). The spindle is a cytoskeletal structure composed of 49 microtubules and associated proteins that attach to the chromosomes and ensures their proper 50 segregation during cell division (9). SPAG5 is the only gene in the genome with three missense changes, 51 but four of the 87 genes carry two missense changes. One of these is kinetochore scaffold 1 (KNL1, 52 previously CASC5) which encodes a protein in the kinetochore (10), a protein structure at the 53 centromere of chromosomes to which the spindle attaches during mitosis and meiosis. The occurrence 54 of three missense changes in SPAG5 and two in KNL1, as well as one in the gene KIF18A, which encodes 55 a protein involved in the movement of chromosomes along microtubules (11), is intriguing as it suggests 56 that components of the spindle may have been subject to natural selection during recent human 57 evolution (12, 13). 58 We therefore set out to study genes with missense changes that are associated with the spindle 59 apparatus and to reconstruct their evolutionary history in modern and archaic humans. We show that 60 multiple spindle genes may have experienced positive selection on the modern human lineage. We also 61 show that the variant of KNL1 that carries two derived missense mutations was transferred by gene flow 62 from modern humans to early Neandertals and then again from late Neandertals to modern humans. 63 64 Spindle-associated genes with missense changes in modern humans. 67 According to the classification of genes in the Gene Ontology database (14, 15) that represents current 68 knowledge about the function of genes, there are eight spindle-associated genes among the 87 genes 69 that carry missense changes that are fixed or almost fixed among present-day humans. These eight 70 genes are SPAG5 and KNL1, with three and two changes, respectively, and ATRX, KATNA1, KIF18A, NEK6, 71 RSPH1 and STARD9, which each carry one missense change. A total of eleven missense changes 72 accumulated in these eight genes. Given the length of spindle-associated genes, six missense changes 73 are expected making this a significant enrichment of amino acid changes compared to random 74 expectation (permutation test p = 0.045; see Methods). Multiple missense changes in a single gene are 75 also more than expected by chance (p = 0.044 for two changes; p < 0.001 for three changes). The genes 76 with single changes have a range of functions associated with cell division. ATRX is a chromatin 77 remodeler required for normal chromosome alignment, cohesion and segregation during meiosis and 78 mitosis (16, 17). KATNA1 and KIF18A are depolymerases that disassemble microtubules in the spindle 79 apparatus and are essential for proper chromosome alignment at the equator of dividing cells (11, 18). 80 NEK6 is a kinase that is required for cell cycle progression through mitosis (19) and interacts with 81 proteins of the centrosome (20), an organelle that serves as a microtubule-organizing center and is 82 crucial for the spindle apparatus. RSPH1 localizes to the spindle and chromosomes during meiotic 83 metaphase (21), when the chromosomes align in the dividing cell. Finally, STARD9 localizes to the 84 centrosome during cell division and is required for spindle assembly by maintaining centrosome integrity 85 (22) . We describe the location and predicted effects of the missense changes in Appendix 1 -Tables 1 86 and 2. 87 88 Evolutionary history of spindle proteins in modern humans. The enrichment of missense changes in genes associated with the spindle apparatus suggests that some 90 of these changes may have been subject to positive selection in modern humans (23). To test whether 91 the eleven changes in the eight genes may have been beneficial, we applied a method to detect positive 92 selection that occurred in the common ancestral population of modern humans after their split from 93 archaic humans (24). This method relies on a hidden Markov model to identify genomic segments where 94 archaic human genomes fall outside of modern human variation, i.e., where all modern humans share a 95 common ancestor more recently than any common ancestor shared with archaic humans. Applying this 96 approach to the genomes of a Neandertal (6), a Denisovan (5) and 504 Africans (25), we identified such 97 segments spanning 132 to 547,448 base pairs (bp) around the missense substitutions of each of the 98 eight spindle genes (Figure 1 ; Supplement File 1). The genetic lengths of these segments are informative 99 about how fast they spread in modern humans since the common ancestor shared with archaic humans. 100 Segments longer than 0.025cM were not observed in neutral simulations (i.e., false positive rate <0.1%, 101 (24)) and therefore represent evidence for positive selection. We account for uncertainty in local 102 recombination rates by using two recombination maps (African-American and deCODE maps, (26, 27)). 103 One gene, SPAG5, carries a segment around the missense substitutions longer than 0.025cM in both 104 maps ( Figure 2 ). The segments in two genes, ATRX and KIF18A, exceed this cutoff only for the African-105 American and deCODE recombination maps, respectively ( Figure 2A) . Thus, there is consistent evidence 106 for positive selection on SPAG5, while evidence for positive selection affecting ATRX and KIF18A is 107 dependent on the recombination maps used. 108 To gain further insights into the history of the missense substitutions in the spindle genes, we 109 estimated the time when each of these mutations occurred. Note that this time differs from that of 110 fixation, which may have happened much later, particularly in the absence of positive selection. As 111 fixation events since the split with the ancestors of Neandertals and Denisovans are rare (24), it is most 112 parsimonious to assume that the mutations that reached fixation in the region of each spindle gene 113 represent a single event of fixation (i.e., were linked to the missense substitution(s) as they spread in 114 modern humans). Thus, estimates of the coalescence time of the haplotypes in which the substitutions 115 are found today provides a most recent bound for when the mutations occurred. By contrast, the most 116 recent common ancestors shared between modern and archaic humans (who carried the ancestral 117 variants) will predate the occurrence of the mutations and therefore provide older bounds for the time 118 of origin of the missense variants. 119 By computing pairwise differences among the high-quality genome sequences of 104 individuals 120 from Africa (Human Genome Diversity Panel, (28); including two non-African individuals with the 121 ancestral version of KATNA1 and KIF18A, respectively; Methods), and four archaic human genomes (1, 5, 122 6, 29), we estimated the ages of the missense substitutions of KATNA1, KIF18A, KNL1, SPAG5 and 123 STARD9 ( Figure 2B and 2C, Appendix 2 - Table 1 ). We excluded NEK6 and RSPH1 as the regions identified 124 around the missense substitutions were too short to estimate their age (4,104bp and 132bp, 125 respectively). We also excluded ATRX, which is located on the X chromosome. The relative difference in 126 age estimates suggests that the mutations in KATNA1, KNL1 and STARD9 occurred much earlier than the 127 mutations in SPAG5 and KIF18A and may be more than a million years old. By contrast, the mutations in 128 in SPAG5 and KIF18A are more recent. This is consistent with that the latter two genes have some 129 evidence of positive selection around their missense substitutions whereas the former genes lack 130 evidence for selection. 131 A modern human-like KNL1 haplotype in Neandertals. The identification of modern human-specific missense changes was based on the high-coverage 153 genomes of one Neandertal and one Denisovan (6). With the availability of additional archaic human 154 genome sequences, it is now possible to explore whether the derived states may have occurred in some 155 archaic humans. For five of the spindle genes, sequence information from seven to ten archaic humans 156 is available at the relevant positions (1, 6, 29-32). In none of them, there is evidence for the presence of 157 the derived missense variants (Appendix 3 - Africans carry fixed derived alleles and where information is available for this Neandertal carries any 166 modern human-like allele (Appendix 3 - Table 2 ). This could therefore represent present-day human 167 DNA contamination, which amounts to 2-3% in the DNA libraries sequenced from this individual (1). 168 In contrast, between one and 27 DNA fragments that cover the two missense substitutions in 169 KNL1 in four of the twelve Neandertals available carried only derived alleles (Figure 3 ). This includes the 170 Chagyrskaya 8 genome, which is sequenced to 27-fold average genomic coverage. Several of the 171 fragments carrying derived alleles also carry cytosine-to-thymine substitutions near their ends, which is 172 typical of ancient DNA molecules and suggests that the fragments do not represent present-day human 173 DNA contamination (33). 174 Further evidence that the modern human-like alleles in KNL1 in the four Neandertals are not 175 due to present-day DNA contamination comes from the observation that they carry modern human-like 176 derived alleles that occur in all present-day Africans at 21 other positions in a 276kb-long region around 177 the missense variants, whereas the other Neandertal and Denisovan individuals who carry the ancestral 178 alleles do not ( Figure 3 ). In addition, as there is no evidence of ancestral alleles at any of these 179 informative positions, the four Neandertals probably all carried this "modern human-like" haplotype in 180 homozygous form. 181 To investigate how the divergence among Neandertals of the haplotypes with and without the 182 derived missense substitutions in the KNL1 gene compares to other regions of Neandertal genomes, we 183 calculated the divergence in non-overlapping 276kb-segments between the Altai Neandertal (Denisova 184 5), who carries ancestral KNL1 substitutions, and Chagyrskaya 8, who carries the derived substitutions 185 ( Figure 4A ). The number of differences in the KNL1 region is about one order of magnitude higher than 186 the average across the genome and in the top 0.27% of all regions in the genome. Thus, the KNL1 region 187 stands out as unusually diverged between these two Neandertals. Introgression of the KNL1 haplotype into Neandertals. 213 It is intriguing that the Neandertals who carry the two missense changes in KNL1 also carry modern 214 human-like alleles shared by all (or nearly all) present-day Africans at multiple positions in the region of 215 KNL1 and exhibit unusually high divergence to other Neandertal haplotypes. This raises the question if 216 Neandertals may have inherited this haplotype from ancestors or relatives of modern humans. 217 However, the age of the KNL1 missense mutations predates the divergence of Neandertals and modern 218 humans ( Figure 2C ) and must thus have been segregating in the ancestral populations of the two 219 groups. This may have resulted in the presence of the derived variants in both Neandertals and modern 220 humans even in the absence of any gene flow ("incomplete lineage sorting"). However, in this case, the 221 segment carrying similarity between Neandertal and modern human genomes is expected to be shorter 222 than if it entered the Neandertal population by gene flow because the number of generations over 223 which recombination would have had the opportunity to shorten the haplotype would be larger. 224 We used local estimates of the recombination rate (Methods) and a published method (36) to 225 infer the expected length distribution for haplotypes inherited from the population ancestral to 226 Neandertals and modern humans. The 276kb haplotype is longer than expected ( Figure 4B , p ≤ 0.006) 227 and it is therefore unlikely to be inherited from the common ancestral population. Thus, although the 228 missense variants were segregating in the common ancestral population of Neandertals and modern 229 humans, Neandertals did not inherit this haplotype from that population. Rather, the haplotype in 230 Neandertals is likely the result of gene flow between Neandertals and modern humans. Since all 231 present-day humans carry the derived KNL1 variants whereas only some Neandertals do, and since the 232 modern human-like KNL1 haplotype is unusually diverged to other Neandertal haplotypes, gene flow 233 was likely from modern humans (or their relatives) to Neandertals. 234 Age of KNL1 gene flow into Neandertals. 236 Using the length of the haplotype, the local recombination rate (averaged over the African-American 237 and deCODE recombination maps), the age of the most recent Neandertal carrying the haplotype 238 (~40kya (30)), and given the limitation that recombination among the fixed alleles in modern humans 239 cannot be detected, we estimate an older age limit for the haplotypes seen in Neandertals and modern 240 humans of 265kya (Methods). This means that gene flow occurred after this time. 241 It is possible to further refine the age estimate of this gene flow if some present-day humans are 242 closer to the Neandertals carrying the haplotype than other present-day humans. By comparing the 243 high-quality genomes of Neandertals with and without the modern-like KNL1 haplotype, we identified 244 206 positions that are derived on the modern-like KNL1 haplotype in Neandertals. We then computed 245 the frequency of these alleles among 104 present-day African genomes and identified 19 positions 246 where 32-39% of present-day people share the derived allele (Appendix 4 - Table 1 ). These derived 247 alleles are physically linked (r 2 >0.58) and tag a 78kb-long haplotype (102kb in some individuals, r 2 >0.39) 248 where some present-day individuals are closer to the Chagyrskaya 8 Neandertal than other present-day 249 individuals. The individuals with the 102kb-long haplotype carry seven differences to the Chagyrskaya 8 250 Neandertal (among the 36,106 bases called). This number of differences yields an estimate for the most 251 recent common ancestor with Neandertals carrying the modern-like KNL1 haplotype of 251kya (95% CI: 252 131-421kya; Figure 4C ). Note that this haplotype is also too long to be inherited from the common 253 ancestral population of Neandertals and modern humans half a million years ago (95% CI for the age of 254 the last common ancestor: 49-304kya). Reintroduction of the modern-like KNL1 haplotype into non-Africans. As several late Neandertals carried a modern human-like KNL1 haplotype, it is possible that it was 282 reintroduced into modern humans when they met and mixed outside Africa approximately 44-54kya 283 (37). We would then expect that some non-Africans would carry a haplotype that is more closely related 284 to that of Chagyrskaya 8 than the present-day humans analyzed above (those highlighted by a solid box 285 in Figure 4C ). Among 825 non-African individuals from around the world, we identified 12 individuals 286 from several populations that carry one chromosome that differs at just 7 to 13 positions from the 287 Chagyrskaya 8 genome in the KNL1 region (out of ~140kb with genotype calls in the 276kb region). By 288 comparison, other non-African individuals carry 54 to 179 differences to the Chagyrskaya 8 genome in 289 this region ( Figure 4C ). We estimated that the 12 individuals share a most recent common ancestor with 290 Chagyrskaya 8 about 96 to 139kya in this region of KNL1 (95% CI: 81-145kya to 95-198kya for the 291 individuals with 7 and 13 differences to Chagyrskaya 8, respectively; Methods). These estimates are 292 similar to those computed in Neandertal haplotypes previously described in present-day individuals 293 ( Figure 4C, right panel (34, 35) ). Adjoining the 3'-end of this KNL1 region, there are seven positions 294 where these 12 individuals share alleles with archaic genomes (including 4 alleles shared with 295 Chagyrskaya 8; Supplement File 2) while no other present-day humans in the dataset do so. These 296 observations suggest that these 12 present-day individuals carry a KNL1 haplotype inherited from 297 Neandertals. 298 Although the missense alleles in KNL1 are fixed among the genomes of 2,504 present-day 299 humans (25), some rare ancestral alleles may exist among present-day human genomes, e.g., due to 300 interbreeding with archaic humans or due to back mutations. We therefore looked at the exome and 301 whole-genome sequences from the gnomAD database (v2. 1.1, (38) ). There are no ancestral alleles 302 among the exome sequences (out of 227,420 alleles called). One of the 15,684 whole genomes carries 303 one ancestral allele at one of the two positions carrying missense variants. This may be a back mutation 304 (rs755472529; Appendix 5 - Spindle genes experienced an unusually large number of missense changes in modern humans since the 313 split from a common ancestor with archaic humans. This is intriguing, as mitotic metaphase has been 314 shown to be prolonged in apical progenitors during human brain organoid development when 315 compared to apes (39, 40). Some of the missense changes in spindle genes that occurred since the 316 separation of modern and archaic humans may be involved in such differences. 317 One of the spindle genes carrying missense changes, KNL1, has a particularly interesting 318 evolutionary history in that some Neandertals carried a haplotype sharing two missense variants with 319 present-day humans that were hitherto believed to be specific to modern humans. Both the length of 320 the modern-like KNL1 haplotype in Neandertals and the small number of differences between present-321 day humans and Neandertals in this region suggest that they shared a common ancestor more recently 322 than the estimated split time between these two populations. While only a few Neandertals carry this 323 KNL1 haplotype and its divergence to other Neandertal haplotypes is unusually high ( Figure 4A ), many 324 variants on the haplotype are fixed or common among present-day humans. Therefore, we infer that 325 ancestors (or relatives) of modern humans contributed this haplotype to Neandertals. Figure Identifying genes associated with the spindle apparatus 365 The human Gene Ontology Annotation database was downloaded on June 8 th 2021 (goa_human.gaf.gz, 366 from http://current.geneontology.org/products/pages/downloads.html), together with the basic Gene 367 Ontology terms (http://purl.obolibrary.org/obo/go/go-basic.obo). We selected all the Gene Ontology 368 terms that include the keyword "spindle" (65 terms), identified all the human genes that are associated 369 with at least one of these terms (562 genes out of 19,719) and overlapped them with the list of 87 genes 370 that exhibit fixed missense changes in modern humans (6, 12). 371 Testing for enrichment of missense changes in spindle genes 372 The 27)). The latter was lifted over to hg19 (originally in hg38 coordinates) with the liftover tool 415 and the chain file hg38ToHg19 from UCSC (using a minimum match rate of 0.9 between bases of both 416 assemblies). This resulted in both gaps and overlaps between windows of the recombination map. 417 Therefore, we assumed that the recombination rate in a gap was the average of the two directly 418 adjacent windows, and we truncated the windows that overlap with a previous window (or removed the 419 window if it overlapped completely with the previous one). San and Yoruban individuals from (6)) were identical but differed from four great ape reference genome 429 assemblies (panTro4, panPan1.1, gorGor3 and ponAbe2). We then extracted the genotypes of the Altai 430 Neandertal (Denisova 5) or Denisovan (Denisova 3) genomes at these positions, and only considered 431 those that pass the minimal filters for each genome, respectively (1). 432 We tested whether the missense substitutions in the spindle genes overlap regions displaying 433 signatures of ancient selective sweeps using the hidden Markov model described in (24). We executed 434 that method independently for each chromosome and for both the Altai Neandertal and Denisova 3 435 genomes, using genetic distances computed from the African-American or deCODE recombination maps 436 (26, 27). We then identified regions around the missense substitutions of spindle genes where the 437 archaic genome fall outside the human variation ("external" regions) with a posterior probability above 438 0.2. We applied this cutoff on the sum of the posterior probabilities of both "external" states (i.e., 439 generated either from drift or from a selective sweep, (24)). We further intersected the regions called 440 with the Altai Neandertal and Denisova 3 genomes and measured the genetic length of the overlap to 441 determine whether there is evidence for selection. 442 Reconstructing the chronology of the missense substitutions 444 To get an upper age estimate of the missense substitutions, we estimated the TMRCA between Africans 445 and archaic humans in the regions around the missense substitutions where archaic humans fall outside 446 the human variation. We computed the number of pairwise differences between each African 447 chromosome and each high-coverage archaic human genome (1, 5, 6, 28, 29) sampling a random allele 448 at heterozygous positions in the archaic genomes. The age of the common ancestor of two individuals 449 conditional on the number D of pairwise differences follows a gamma distribution with shape parameter 450 α=D+1 and rate parameter β=θN+1 (55), where θ is the population scaled mutation rate (i.e., 4Neµ with 451 µ and Ne denoting the mutation rate and the effective population size, respectively) and N is the number 452 of bases with genotype calls in both individuals. This model assumes that the prior probability of 453 coalescence follows an exponential distribution, with rate equal to 1 when time is scaled in units of the 454 diploid effective population size (2Ne), and that both individuals are sampled in the present. To account 455 for a branch shortening S associated with the age of ancient individuals, we truncated the posterior 456 distribution so that the age of the common ancestor cannot be more recent than S and we added 2 as a 457 correction for this branch shortening. The expected TMRCA is then: 458 ) + 2 (Equation 1, Appendix 6), with (α, β 2 ) denoting the upper 459 incomplete gamma function with lower limit β 2 , which we computed with the gammainc() function from 460 the R library expint v0.1-6. We assumed µ is 1.45 x 10 -8 mutations per base pair per generation 461 (generation time of 29 years, (56)) and a branch shortening of 50ky for Vindija 33.19 (1), 70ky for 462 Denisova 3 (1), 80ky for Chagyrkaya 8 (29), and 120ky for the Altai Neandertal (Denisova 5) (1). We 463 computed the 95% confidence intervals with the qtgamma() function of the R library 464 TruncatedDistributions v1.0. Note that we also estimated the TMRCA in these genomic regions between 465 Africans and a Papuan (HGDP00548), a Sindhi (HGDP00163) or a Biaka (HGDP00461) because they carry 466 the ancestral versions of the missense substitutions in KATNA1, KIF18A and SPAG5, respectively, and 467 constrain further the upper age estimates of the missense substitutions. 468 To get a lower age estimate of the missense substitutions, we computed the number of pairwise 469 differences among African chromosomes carrying the derived versions of the missense substitutions 470 from the HGDP dataset (all African individuals, except HGDP00461 for SPAG5). For each region, we 471 identified the two chromosomes that are the most distantly related (with the highest number of 472 differences) and classified all the other African chromosomes into two groups depending on which of 473 the two former chromosomes they are the most closely related in this genomic region. A chromosome 474 was assigned to a group if it shared at least two derived alleles with the African chromosome used for 475 defining this group but no more than one shared derived allele with the African chromosome defining 476 the other group. This removes potential recombinants that could bias downward the estimate of the 477 TMRCA. For each individual, we then counted the number of pairwise differences with individuals from 478 the other group, restricting this analysis to positions that are genotyped in the high-coverage archaic 479 genomes. Finally, we converted these counts into estimates of the TMRCA using the same equation as 480 above, albeit with no branch shortening (i.e., ( ) = To test whether the KNL1 haplotype identified in Neandertals could be shared with modern humans 484 because of incomplete lineage sorting, we computed the probability that a haplotype shared since the 485 common ancestral population is as long as, or longer than, the haplotype identified in KNL1. This 486 approach was previously applied to the EPAS1 haplotype shared between Tibetans and Denisovans (36), 487 but we briefly describe it here for completeness. The expected length L for a shared sequence is 1/(r x 488 T), denoting r and T as the recombination rate and the length of the Neandertal and modern human 489 branches since divergence (in generations of 29 years), respectively. As the ancestors of Neandertals 490 and modern humans split from each other around 550kya (1), the most recent common ancestor shared 491 between modern humans and Neandertals cannot be younger than this split time, in the absence of 492 gene flow. As the modern-like KNL1 haplotype is present in Neandertals that lived about 40kya (30), we 493 set T = 550,000 -40,000 = 510,000 years, which corresponds to 17,586 generations of 29 years. Note 494 that we do not include here the length of the modern human branch to be conservative, as we do not 495 know when the substitutions that define the haplotype reached fixation in modern humans. Relying on 496 local recombination rate estimates from the African-American map (0.148cM/Mb, (26)) and the deCODE 497 map (0.191cM/Mb, (27)), the expected length L of a haplotype shared through ILS (i.e., 1/(r x T)) is 498 29.8kb or 38.5kb depending on the recombination map used. Assuming that the length of a shared 499 sequence follows a Gamma distribution with shape parameter 2 and rate parameter 1/L, the probability 500 that such a sequence is as long as or longer than 276kb is then 1 − GammaCDF(276000, shape = 2, rate = 501 1/L). This probability is ≤0.006 and hence a model without gene flow is rejected. 502 Estimating the time of gene flow 504 To estimate the time of gene flow, we used an analogous model to that described above ((55), adapted 505 here to account for the branch shortening associated with the age of ancient individuals). In this model, 506 the age of the common ancestor of two individuals conditional on the length L of their shared haplotype 507 follows a gamma distribution with shape parameter α=3 and rate parameter β=2ρL+1, where ρ is the 508 population scaled recombination rate (i.e., 4Ner with r and Ne denoting the recombination rate and the 509 effective population size, respectively). To account for a branch shortening S, we applied again Equation 510 1, assuming that S is 40,000 years. In the case of the 276kb haplotype in Neandertals that carries alleles 511 that are fixed in present-day humans, we made the conservative thetassumption that recombinations 512 could not be observed in modern humans (i.e., the alleles were already fixed at the time of 513 introgression) and, therefore, multiplied the age estimates by 2. We did not apply this correction for the 514 estimate based on the 102kb haplotype shared between some present-day humans and Neandertals, as 515 the length of this haplotype depends on the number of recombination events in both Neandertals and 516 modern humans. Using the average recombination rate over the African-American and deCODE 517 recombination maps (0.169 cM/Mb), the expected age is 131ky based on the 276kb haplotype in 518 Neandertals and 138ky based on the 102kb shared haplotype. We computed the 95% confidence 519 intervals (83-265ky for the 276kb haplotype and 49-304ky for the 102kb haplotype) with the qtgamma() 520 function of the R library TruncatedDistributions v1.0. 521 As another estimate for the time of gene flow, we look at the genetic distance between modern 522 humans and Neandertals. For this purpose, we estimated the TMRCAs between Chagyrskaya 8 (the only 523 high coverage genome with this haplotype) and each present-day human genome from the HGDP 524 dataset in the region chr15:40,885,107-40,963,160. We randomly sampled one allele at heterozygous 525 positions where the phase is unknown in these genomes (i.e., every heterozygous position of 526 Chagyrskaya 8). The TMRCA conditional on the number D of pairwise differences follows a gamma 527 distribution with parameters α=D+1 and β=θN+1 (see dating analysis of the missense substitutions). 528 However, conditional on both the observed divergence and the length of the shared haplotype, the 529 TMRCA follows a gamma distribution with shape parameter α=D+3 and rate parameter β=θN+2ρL+1. In 530 both cases, we accounted for the branch shortening of Chagyrskaya 8 as described above (Equation 1, 531 S=80,000 years) and assumed 1.45 x 10 -8 mutations per base pair per generation (generation of 29 532 years). The 95% confidence intervals were computed as described above. 533 As these analyses might be sensitive to the value of µ, we tested whether µ in the region of 534 KNL1 may differ from the genome-wide average by computing local estimates of the mutation rate in 535 family trios from Iceland (57 above, only one random allele was considered for the low coverage genomes, in contrast to two for the 548 high-coverage genomes. Positions with more than one missing allele among early Neandertals (out of 4 549 alleles) or 3 missing alleles (out of 6) among the late Neandertals were filtered out. We further removed 550 transitions and positions less than 50kb away from the previously ascertained position. We then 551 computed the proportion of those positions where at least three of the later individuals (i.e., those that 552 lived 40-50kya) carried the derived allele. 553 554 18 Appendix 1 555 556 Table 1 : Location and predicted effects of the studied amino acid changes in spindle proteins, as reported in dbNSFP version 4.2 (48). The predictions are for the ancestral 557 variants. We put "damaging" in between quotation marks as the ancestral versions of ATRX and KATNA1 are unlikely to be damaging (as the ancestral amino acid residues are 558 found in the proteins of many species), but that prediction rather supports a function for these amino acid changes. indicates that the ancestral variant is likely to be deleterious (58-60) and a high conservation score means that the nucleotide position is highly conserved across species (100 562 vertebrates for phyloP and phastCons (61), and 34 mammals for GERP++ RS (62)). In contrast to the other scores that correspond to a single position, phastCons is a measure of 563 the conservation in the region around the position. In dbNSFP, the scores range from -6.458163 to 18.301497 for CADD, from -12.3 to 6.17 for GERP++ RS, from -20.0 to 10.003 564 for phyloP and from 0 to 1 for phastCons. Bases in uppercase were sequenced in the forward orientation, whereas those in lowercase were sequenced in the reverse 580 orientation. Bases that are modern human-like (derived) are highlighted in red and may represent present-day human DNA 581 contamination or an allele shared with modern humans. In contrast, the bases that are compatible with a damage-induced 582 substitution (from the ancestral allele) are highlighted in blue (i.e., C-to-T and G-to-A in the forward and reverse orientation, 583 respectively 585 the Chagyrskaya 8 genome differs from other high-quality archaic genomes without the modern human-like haplotype but 594 some African genomes from the HGDP dataset carry the same allele as Chagyrskaya 8. Note that the modern human-like haplo-595 type in Neandertals is longer and defined by alleles that are shared with all modern humans (Figure 3) andertal genomes with the modern-human like KNL1 haplotype and the 12 present-day non-African ge-667 nomes that inherited one copy of KNL1 from archaic humans. The lower panel ("Other haplotypes") 668 shows the alleles carried by the other chromosomes (that did not inherit a copy ok KNL1 from archaic 669 humans) of these 12 individuals. For the archaic human genomes, one allele was sampled randomly at 670 heterozygous positions. 671 672 673 A high-coverage Neandertal genome from Vindija Cave in Croatia Modern Human versus Neandertal Evolutionary Distinctiveness. Current 678 Anthropology Neandertals revised A draft sequence of the Neandertal genome A high-coverage genome sequence from an archaic Denisovan individual The complete genome sequence of a Neanderthal from the Altai Mountains. 684 Analysis of mitotic microtubule-associated proteins using mass 686 spectrometry identifies astrin, a spindle-associated protein Cloning and characterization of hMAP126, a new member of mitotic spindle-689 associated proteins Mitotic spindle assembly in animal cells: a fine balancing act KNL1 and the CENP-H/I/K complex coordinately 693 direct kinetochore assembly in vertebrates The human kinesin Kif18A is a motile microtubule depolymerase essential for 695 chromosome congression The human condition-a molecular approach A catalog of single nucleotide changes distinguishing modern humans 698 from archaic hominins Gene ontology: tool for the unification of biology. The Gene Ontology 700 Consortium The Gene Ontology resource: enriching a GOld mine ATRX is required for maintenance of 704 the neuroprogenitor cell pool in the embryonic mouse brain Loss of ATRX Suppresses Resolution of Telomere Cohesion to 706 Control Recombination in ALT Cancer Cells Microtubule-severing enzymes The serine/threonine kinase Nek6 is required 710 for cell cycle progression through mitosis Characterization of hNek6 interactome reveals an important role for its 712 short N-terminal domain and colocalization with proteins at the centrosome Molecular cloning and characterization of meichroacidin (male meiotic 715 metaphase chromosome-associated acidic protein) The STARD9/Kif16a kinesin associates with mitotic microtubules and regulates 717 spindle pole assembly A test of neutral molecular evolution based on 719 nucleotide data Detecting ancient positive selection in 721 humans using extended lineage sorting A global reference for human genetic variation The landscape of recombination in African Americans Characterizing mutagenic effects of recombination through a sequence-727 level genetic map Insights into human genetic variation and population history from 929 729 diverse genomes A high-coverage Neandertal genome from Chagyrskaya Cave Reconstructing the genetic history of late Neanderthals Nuclear DNA from two early Neandertals reveals 80,000 years of genetic 735 continuity in The genome of the offspring of a Neanderthal mother and a Denisovan father Patterns of damage in genomic DNA sequences from a Neandertal The major genetic risk factor for severe COVID-19 is inherited from 741 Neanderthals Introgression of Neandertal-and Denisovan-like 743 Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors Altitude adaptation in Tibetans caused by introgression of Denisovan-746 like DNA An Extended Admixture Pulse Model Reveals the 748 Limitations to Human-Neandertal Introgression Dating The mutational constraint spectrum quantified from variation in 141,456 751 humans Differences and similarities between human and chimpanzee neural 753 progenitors during cerebral cortex development From stem and progenitor cells to neurons in the 755 developing neocortex: key differences among hominids Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos 757 hominins Deeply divergent archaic mitochondrial genome provides lower time boundary 759 for African gene flow into Neanderthals The evolutionary history of Neanderthal and Denisovan Y chromosomes Ancient gene flow from early modern humans into Eastern Neanderthals. 763 Mapping gene flow between ancient hominins through 765 demography-aware inference of the ancestral recombination graph Apidima Cave fossils provide earliest evidence of Homo sapiens in Eurasia. 768 The earliest modern humans outside Africa dbNSFP v4: a comprehensive database of transcript-specific 771 functional predictions and annotations for human nonsynonymous and splice-site SNVs Twelve years of SAMtools and BCFtools Analysis, Initial sequence of the chimpanzee genome and comparison with the 775 human genome The bonobo genome compared with the chimpanzee and human genomes. 777 Insights into hominid evolution from the gorilla genome sequence Comparative and demographic analysis of orang-utan genomes Improved pairwise alignment of genomic DNA (The Pennsylvania State University Dating genomic variants and shared ancestry in population-scale 785 sequencing data Revising the human mutation rate: implications for understanding human 787 evolution Parental influence on human germline de novo mutations in 1,548 trios from 789 Iceland A general framework for estimating the relative pathogenicity of human 791 genetic variants CADD: predicting the 793 deleteriousness of variants throughout the human genome Detection of nonneutral substitution 796 rates on mammalian phylogenies Evolutionarily conserved elements in vertebrate, insect, worm, and yeast 798 genomes Identifying a high fraction of the human genome to be under selective 800 constraint using GERP++ Appendix 2 567 568 Table 1 : Age estimates of the missense substitutions in spindle genes. The ages were estimated in the regions where the Altai 569 Neandertal and Denisova 3 genomes fall outside the human variation (intersection of the regions identified with the African-570 American and deCODE recombination maps). The lower age corresponds to the mean age of the ancestor of multiple present-571 day African chromosome pairs. The upper age corresponds to the mean age of the common ancestor shared between each 572 present-day African chromosome and either the archaic genome with the least number of differences (excluding Chagyrskaya 8 573 for KNL1) or a present-day human with an ancestral version of the missense variant(s). (TMRCA) for pairs of modern and archaic 606 humans, we used a published model (55) that we adapted to account for the branch shortening associ-607 ated with the age of the archaic individual. We simply truncated the posterior distribution of the TMRCA 608 obtained with this model so that the TMRCA cannot be more recent than the age of the archaic individ-609 ual, and added a correction to account for missing mutations on the archaic branch. Here, we describe 610 how we derived Equation 1 to compute the expected TMRCA with these modifications from the original 611 model. 612The expectation of the truncated distribution is: 613 As this model assumes that the two lineages diverge for the same amount of time (i.e., for a given coa-623 lescent time T the total branch length is 2T), we therefore set = 2 , with S denoting the age of the an-624 cient specimen, and added 2 as a correction for the branch shortening (i.e., adds S to the total branch 625 length). This leads to Equation 1: 626Appendix 7: Local mutation rate in the region of KNL1 629 630For estimating the age of common ancestors shared between modern humans and Neandertals in the 631 region of KNL1, we assumed the genome-wide average of 1.45 x 10 -8 mutations per base pair per 632 generation (i.e., the mutation rate used to estimate the population split times between the ancestors of 633 modern humans and Neandertals (1, 6); estimated from (56)). As the age estimates may be sensitive to 634 the mutation rate used, we also estimated the local mutation rate among family trios from Iceland (57). 635 We counted the number of de novo mutations in the genomes of the probands and divided this number 636 by twice the length of the region and the number of trios in this dataset (1,548) Other haplotypes 4 1 0 9 5 0 5 1 4 1 0 9 6 0 9 7 4 1 0 9 6 1 4 4 4 1 0 9 8 3 3 9 4 1 0 9 9 0 3 2 4 1 0 9 9 0 5 9 4 1 0 9 9 1 6 3 4 1 0 9 9 9 6 2 4 1 1 0 0 3 4 1 4 1 1 0 1 8 1 6 4 1 1 0 2 6 6 7 4 1 1 0 2 8 2 9 4 1 1 0 3 2 9 3 4 1 1 0 3 7 4 0 4 1 1 0 3 8 6 2 4 1 1 0 4 8 1 7 4 1 1 0 4 8 7 4 4 1 1 0 4 9 4 6 4 1 1 0 5 1 8 8 4 1 1 0 5 6 5 8 4 1 1 0 5 9 2 6 4 1 1 0 6 4 8 5 4 1 1 0 6 7 1 0 4 1 1 0 6 7 2 3 4 1 1 0 8 2 4 0 4 1 1 1 1 0 7 1 4 1 1 1 1 1 5 9 4 1 1 1 8 5 4 5 4 1 1 1 9 1 8 5 4 1 1 2 2 9 0 6 4 1 1 2 3 1 7 2 4 1 1 3 1 0 9 6 4 1 1 3 2 3 1 0 4 1 1 3 2 7 7 8