key: cord-0962803-8h25qj9y authors: Khan, Naazneen; de Manuel, Marc; Peyregne, Stephane; Do, Raymond; Prufer, Kay; Marques-Bonet, Tomas; Varki, Nissi; Gagneux, Pascal; Varki, Ajit title: Multiple Genomic Events Altering Hominin SIGLEC Biology and Innate Immunity Predated the Common Ancestor of Humans and Archaic Hominins date: 2020-06-18 journal: Genome Biol Evol DOI: 10.1093/gbe/evaa125 sha: 3876b836907a851203e1918c686154e3ad9fda07 doc_id: 962803 cord_uid: 8h25qj9y Human-specific pseudogenization of the CMAH gene eliminated the mammalian sialic acid (Sia) Neu5Gc (generating an excess of its precursor Neu5Ac), thus changing ubiquitous cell surface “self-associated molecular patterns” that modulate innate immunity via engagement of CD33-related-Siglec receptors. The Alu-fusion-mediated loss-of-function of CMAH fixed ∼2–3 Ma, possibly contributing to the origins of the genus Homo. The mutation likely altered human self-associated molecular patterns, triggering multiple events, including emergence of human-adapted pathogens with strong preference for Neu5Ac recognition and/or presenting Neu5Ac-containing molecular mimics of human glycans, which can suppress immune responses via CD33-related-Siglec engagement. Human-specific alterations reported in some gene-encoding Sia-sensing proteins suggested a “hotspot” in hominin evolution. The availability of more hominid genomes including those of two extinct hominins now allows full reanalysis and evolutionary timing. Functional changes occur in 8/13 members of the human genomic cluster encoding CD33-related Siglecs, all predating the human common ancestor. Comparisons with great ape genomes indicate that these changes are unique to hominins. We found no evidence for strong selection after the Human–Neanderthal/Denisovan common ancestor, and these extinct hominin genomes include almost all major changes found in humans, indicating that these changes in hominin sialobiology predate the Neanderthal–human divergence ∼0.6 Ma. Multiple changes in this genomic cluster may also explain human-specific expression of CD33rSiglecs in unexpected locations such as amnion, placental trophoblast, pancreatic islets, ovarian fibroblasts, microglia, Natural Killer(NK) cells, and epithelia. Taken together, our data suggest that innate immune interactions with pathogens markedly altered hominin Siglec biology between 0.6 and 2 Ma, potentially affecting human evolution. Prior to the mid-1990s, the extreme similarity of human and chimpanzee protein sequences suggested that phenotypic differences were primarily due to differences in gene regulation (King and Wilson 1975) . The first definitive exception was a fixed loss-of-function genomic mutation unique to the human lineage, in the gene-encoding CMP-Neu5Ac hydroxylase (CMAH) (Chou et al. 1998) , an event mediated by an Alu-Alu fusion (Hayakawa et al. 2001) . This mutation eliminated biosynthesis of the common mammalian sialic acid (Sia) N-glycolylneuraminic acid (Neu5Gc) and caused accumulation of its precursor N-acetylneuraminic acid (Neu5Ac), radically changing cell surface extracellular glycosylation throughout the body in the hominin lineage. The mutation was dated to >2 Ma by multiple methods (Chou et al. 2002; Hayakawa et al. 2006) . Additional examples emerged a few years later, including the forkhead box protein P2 (FOXP2) (Enard et al. 2002 (Enard et al. , 2009 ), a myosin heavy chain family member (MHY16) (Currie 2004; Stedman et al. 2004) , and examples of gene duplication followed by adaptive evolution (Johnson et al. 2001) . Along with biomedical considerations (Varki 2000) , such discoveries lead to sequencing of the chimpanzee genome (Chimpanzee Sequencing and Analysis Consortium 2005) . There are now many more defined genetic differences between humans and our closest evolutionary cousins, involving not only gene expression (Enard et al. 2002; Fujiyama et al. 2002; Watanabe et al. 2004; Calarco et al. 2007; Kehrer-Sawatzki and Cooper 2007; Cruz-Gordillo et al. 2010; Otto et al. 2014; Atkinson et al. 2018 ) but also other distinct genomic changes ranging from massive (megabase) structural variation to differences in gene copy numbers, de novo genes, pseudogenization (O'Bleness et al. 2012; Ruiz-Orera et al. 2015) , human-accelerated regions in noncoding regions, and microRNA genes (Eddy 2001; Mello and Conte 2004; Kim 2005; Ruiz-Orera et al. 2015) . This study focuses on human genes involved in the biology of sialic acids. All living cells are covered with complex arrays of glycoconjugates, with sialic acids occupying the majority of terminal positions on such glycan chains in animals of the Deuterostome lineage (including echinoderms and vertebrates) (Gagneux et al. 2015) . These acidic, nine-carbon backbone amino-sugars play important roles in cell-cell and cell-matrix interactions, as well as in host-pathogen interactions . The hominin-specific loss-of-function mutation in CMAH mentioned above eliminated biosynthesis of Neu5Gc, which was a target for selective cell recognition by various nonhuman pathogens and toxins (Kyogashima et al. 1989; Martin et al. 2005; Campanero-Rhodes et al. 2007; Deng et al. 2014; Alisson-Silva et al. 2018) . It is reasonable to speculate that exogenous pathogen recognition could have initially driven the CMAH loss-of-function, generating a polymorphism as the recessive loss-of-function mutant allele rose in frequency. However, this deletion was then fixed in an ancestral hominin population $2-3 Ma (Chou et al. 2002) , possibly by directional selection against Neu5Gc via female immunity during reproduction, a mechanism demonstrated in vivo using transgenic mice that carry the same mutation in their Cmah gene as humans (Ghaderi et al. 2011) . Given the likely timing of these events, we speculated that anti-Neu5Gc immunity of hominin females immunized against Neu5Gc by increased contact with Neu5Gc-rich vertebrate animal prey might have contributed to the origins of the genus Homo $2 Ma (Wood and Boyle 2016; Bergfeld et al. 2017) . Early discovery of some additional human-specific changes affecting sialic acid biology (Brinkman- Van der Linden et al. 2000; Gagneux et al. 2003; Sonnenburg et al. 2004; Hayakawa et al. 2005; Nguyen et al. 2006 ) and system-wide genomic and biochemical comparisons of sialic acid biology among primates and rodents (Altheide et al. 2006 ) suggested a possible "hotspot" in human sialic acid evolution (Varki 2009 ). With the availability of more human genomes (1000 Genomes Project Consortium et al. 2015 , a recent study showed the patterns of genetic variation of 55 sialic acid biology-related genes in modern human populations did not significantly deviate from neutral expectations and were in fact not significantly different among genes belonging to different functional categories (Moon et al. 2018) . Sialic acid binding Ig-like lectin (Siglecs) are type I transmembrane proteins with an N-terminal immunoglobulin (Ig)-like-V-set domain that mediates sialic acid recognition, and a variable number of Ig-like-C-2 type domains (Angata, Hingorani, et al. 2001) . Siglecs often have a cytoplasmic tail with one or more immunoreceptor tyrosine-based inhibitory motifs that can suppress immune cell activation. Alternatively, they can recruit adaptor proteins with immunoreceptor tyrosine-based activating motifs. Although Siglecs likely have multiple functions, one prominent role appears to be recognition of endogenous sialylated glycans as self-associated molecular patterns, suppressing reactions of innate immune cells against self (Varki 2011) . Activating Siglecs have a positively charged arginine or lysine in their transmembrane domains that can recruit DAP12 and activate cellular immune responses against pathogens mimicking endogenous sialic acids (Schwarz et al. 2017 ). CD33-related Siglecs are rapidly evolving and among this family, nine inhibitory (hSiglec-3 and hSiglec-5 to hSiglec-12) and two activating members (hSiglec-14 and hSiglec-16) have been characterized in humans. Recently, an article published from our group showed variable presence or absence of functional changes in the CD33rSIGLEC cluster in 26 mammalian species including great apes (Khan et al. 2020) . With the availability of genomes from both living great apes (Prado-Martinez et al. 2013; Xue et al. 2015; Kronenberg et al. 2018 ) and extinct archaic hominins (Reich et al. 2010; Meyer et al. 2012; Prufer et al. 2014) , we now systematically reassess the previous discoveries from multiple groups and also report on several new findings. Overall, we find that multiple complex changes in genes involving sialic acid biology beyond the initial CMAH mutation did indeed occur but are mostly confined to the CD33rSIGLEC gene cluster on chromosome 19 (Angata, Hingorani, et al. 2001) , encoding CD33rSiglecs, prominent on innate immune cells (Varki and Angata 2006) . Overall, we found that multiple changes in the CD33rSIGLEC gene cluster are common in all human populations, postdate the common ancestor with the chimpanzee/ bonobo lineage, but predate the common ancestor with Neanderthals and Denisovans. Such multiple complex changes in this gene cluster appear to be associated with altered expression of these genes, not just confined to innate immune cells, but also in other unexpected human cell types, some associated with diseases that appear to be uniquely human. The great ape genome data (total number 147) for CMAH and CD33rSIGLECs were derived from three publications (Prado-Martinez et al. 2013; Xue et al. 2015; de Manuel et al. 2016; Kronenberg et al. 2018 ). These great ape genomes were mapped to human reference genome (GRCh37/hg19) retrieved from UCSC genome browser, using Burrows-Wheeler aligner and further processed with Picard tool to remove duplicates and variant were called using Genome-analysis toolkit. Moreover, the variants in variant call format of chimpanzee, bonobos, gorilla, and orangutan were visualized in integrative genome viewer with their respective reference genome (Pantro4, Gorgor3, and PonAbe2) and annotation file. All archaic hominins raw and processed files were obtained from the Max Planck Institute for Evolutionary Anthropology (Reich et al. 2010; Castellano et al. 2014; Prufer et al. 2014; Slon et al. 2018 ) (http://cdna. eva.mpg.de/neandertal). Bed coordinates of SIGLECs and CMAH genes were provided for each human and ape lineage as supplemental (chimpanzee, gorilla, and orangutan) (supplementary file 1, Supplementary Material online). Additionally, bed coordinates of additional polymorphism present in CD33rSIGLECs with allele frequency in great ape population were also provided as supplemental (supplementary file 2, Supplementary Material online). The genes involved in sialic acid biology (67 genes) were overlapped with regions displaying signatures of ancient selective sweeps (Peyregne et al. 2017) . The method used to identify these signatures of archaic selection relies on a hidden Markov model to detect extended regions in the genome where the Neanderthal and Denisovan lineages fall outside the human variation (Meyer et al. 2012; Prufer et al. 2014 ). This method can only detect events that occurred between the split of modern and archaic humans around 0.5 Ma (Prufer et al. 2017 ) and the split of modern human populations from each other around 0.2 Ma (Schiffels and Durbin 2014) . For events with a selective advantage of 0.5% or larger, with an origin of the beneficial mutation as old as 600,000 years ago, the false positive rate of the method is lower than 0.1% and its true positive rate is larger than 65% (Peyregne et al. 2017) . We also note that signals of positive selection are not detectable if the selection coefficient is smaller than 0.1%. The Ensembl database (release 82) (Aken et al. 2016 ) was used to annotate each gene with hg19 coordinates from transcription start to transcription end. Furthermore, L1CAM, PECAM, CMAH, SIGLEC13, SIGLEC16, and SIGLEC17 were excluded because they fall either in filtered regions not considered for the ancient sweep screen or could not be mapped to hg19 coordinates. As regulatory regions may also have been the target of positive selection, we extended the start and end coordinates 1 Mb upstream and downstream, respectively. If a neighboring gene was within 1 Mb, we only extended the coordinates until 5 or 1 kb from the transcription start or end of this neighboring gene. In order to test whether the lack of signatures of ancient selection is statistically significant, the candidate regions of ancient selection were randomly placed in the genome and the random placements of all regions were iterated 1,000 times, counting how often no overlap with genes involved in sialic acid biology was detected. The depletion in selection is not significant (356 sets never overlap; P value ¼ 0.356). Paraffin sections were deparaffinized in xylene and rehydrated in decreasing concentrations of ethanol and Trisbuffered saline-Tween (TBST). Following this, endogenous binding sites and peroxidases were blocked with 1% bovine serum albumin/TBST and 0.3% H 2 O 2 /TBST. This was followed by blocking of endogenous biotin and then heat-induced antigen retrieval was performed in citrate buffer pH 6. Primary antibodies (mouse anti-Siglec-7 and Rabbit anti-Siglec-13) or mouse IgG were then overlaid at optimal dilutions, and slides were incubated overnight at 4 C in a humid chamber. Specific binding was detected using a biotinylated antimouse or anti-rabbit (for Siglec-13), followed by Horseradish Peroxidase (HRP) streptavidin and then biotinyl tyramide enhancement and then HRP streptavidin. Substrate color was developed, and nuclei were counterstained with hematoxylin and the slides were aqueous mounted for viewing and digital photomicrography using an Olympus BH2 microscope with an Olympus magnafire camera. Multiple Derived, Fixed, or Polymorphic Genomic SIGLEC Variants Are Present in All Modern Human Populations Table 1 summarizes events and frequencies in human populations based on the 1000 Genomes Project data and compares these with archaic hominin and great ape genomes. All human genomic changes in CD33-related SIGLECs (affecting 8 out of 13 members of this class of genes) were present across modern human populations. Genome-wide association studies identified a derived CD33-linked allele (rs3865444(A), rs12459419(T)) that is protective against late onset Alzheimer's disease (Schwarz et al. 2016) . The linked single-nucleotide polymorphism was found to be variable across human populations with highest frequency in native American populations (48%) and lowest in African populations (5%). The previously reported Alu-mediated human SIGLEC13 deletion (Wang, Mitra, Secundino, et al. 2012) and SIGLEC17 (Siglec-P3) pseudogenization due to Open reading frame disruption and mutation of a critical arginine residue in its V-set domain (Wang, Mitra, Secundino, et al. 2012) appear to be fixed in all humans. SIGLEC12, SIGLEC14, and SIGLEC16 show polymorphic pseudogenization in variable frequencies across human populations, as detailed in table 1. SIGLEC12 also harbors a human-universal mutation of the critical arginine residues in the V-set domain (Arg -> Cys, first V-set domain) abrogating its ability to recognize sialic acid. An additional SIGLEC12 inactivation mutation (rs16982743) was found at an overall 18.6% frequency, highest in Africans (37%) and lowest in East Asians (5%). Another SIGLEC12 polymorphism, that is, frameshift (rs66949844) averages 59% (highest in native Americans and lowest in East Asians) in human population. Additionally, based on the two linked alleles present in SIGLEC16 (rs12611411(T) and rs12984584(C)), the pseudogene variant (SIGLEC16P) is present in higher frequency than the functional allele (SIGLEC16) in human populations (Wang, Mitra, Cruz, et al. 2012) . Because these SIGLEC clusters are undergoing multiple gene conversions, some regions are highly prone to low mapping quality and hence some variants are not well defined, for example, SIGLEC14 fusion-deletion, which is highly prevalent in East Asian populations (Yamanaka et al. 2009; Ali et al. 2014 ). Due to the lack of high coverage sequence data, these regions have not been well genotyped in nonhuman genomes. The CMAH exon deletion is confirmed to be universal to humans. Our previous human-ape comparisons involved the incomplete draft of a single chimpanzee genome, a few additional incomplete great ape sequences and a small number of human genomes (Chimpanzee Sequencing and Analysis Consortium 2005). Thus, it remained possible that the genomic changes were actually a common feature of such genes across all these taxa. The availability of newly annotated versions of great ape genomes (Prado-Martinez et al. 2013; Xue et al. 2015; de Manuel et al. 2016; Kronenberg et al. 2018) now allows for much better control against ascertainment bias, by comparing human-specific polymorphisms with multiple different genomes for each ape species, capturing potential polymorphisms in the latter as well (Sullivan et al. 2017) . We observed that great ape genomes share none of Table 1 Detailed the mutations observed in human populations. Instead, we only observed independent changes in critical arginine residues in V-set domain of SIGLEC5/14, rendering them unable to recognize sialic acids (table 1) . Although the essential arginine change (Arg -> His) in SIGLEC5 was found to be polymorphic in chimpanzee and gorilla, sequences in orangutan were not recognized due to low mappability. The SIGLEC14 essential arginine change (Arg -> His) was found to be polymorphic in chimpanzee, bonobo, and gorilla. However, sequences of SIGLEC14 in orangutan were not determined in most of the species due to low coverage, except for one that harbors an essential arginine change to tyrosine (Arg -> Tyr). The CMAH gene is functional in all great apes as expected, given that both types of sialic acids (Neu5Ac and Neu5Gc) are present in all these species (Muchmore et al. 1998 ). Overall, human genomes harbored a far greater number of structural/functional changes in CD33rSIGLEC genes (see fig. 1 and supplementary file 2, Supplementary Material online). To further understand the timing of these human genomic changes of these human genomic variants, we looked for their presence in archaic human genomes, that is, Neanderthal and Denisovan. The exon deletion leading to loss-of-function of the CMAH gene is shared by these two archaic genomes, as expected based on the existing estimates of fixation of the mutation over 2 Ma (Hayakawa et al. 2001) . Notably, almost all other human CD33rSIGLECs variants (except for CD33-linked variants protective against late onset Alzheimer's disease) were also present in these archaic hominin genomes in variable frequency (listed in table 1 and fig. 1 ) placing their origin before the human-Neanderthal common ancestor ($0.6 Ma) (Prufer et al. 2017; Hajdinjak et al. 2018 ). Due to unavailability of well annotated sequences, the SIGLEC14 fusion/deletion could not be determined in archaic hominins. The presence of all genomic SIGLEC changes in human populations indicates that they predate the common ancestor of modern humans $0.2-0.3 Ma (Hublin et al. 2017) . Consistent with this observation, the recent article by Moon et al. (2018) found that the patterns of genetic variation of most CD33rSIGLEC genes did not significantly deviate from neutral expectations, and the few that did significantly deviate from neutrality experienced either soft sweeps or populationspecific hard sweeps (Moon et al. 2018) . Using a method that allows detection of selection in deeper time ($0.5 Ma) (Peyregne et al. 2017) , we also found no evidence of strong selection after the ancestors of modern and archaic humans (Neanderthal-Denisovan) split from each other around 500,000 years ago. As selection may also target neighboring regulatory regions, we defined regulatory domains around each gene and looked for overlaps with candidate sweep regions. Using these extended gene coordinates, we again found no overlap with candidate regions for selective sweeps, suggesting that none of the genes involved in sialic acid biology exhibits strong signatures of selection more recently than 0.5 Myr in the common ancestral population of all modern humans. The description of extended lineage sorting (ELS) is provided in figure 2. In addition to these multiple, polymorphic complex genomic changes in the human CD33rSIGLEC gene cluster, there appear to be unusual (derived) human-specific expression patterns of CD33rSiglecs (in nonhemopoietic cells) in locations such as placental trophoblast (SIGLEC6) (Kang et al. 2011) , ovarian fibroblasts (SIGLEC11/16) (Wang et al. 2011) , amniotic epithelium (SIGLEC5/14) (Ali et al. 2014) , and microglia (SIGLEC11/16) (Hayakawa et al. 2005) . In each of the above instances, we have previously reported human-specific expression and lack of expression in chimpanzee tissue samples (in some instances also other available great ape tissues). With regard to the recent report of SIGLEC7 expression in human pancreatic islets (Yamaguchi et al. 2017) , we now show that such expression is missing in chimpanzee pancreatic islets (Meyer et al. 2012) ; top rectangle to modern humans (bottom rectangles) at any given position along their genomes. There are two types of genealogies: the archaic falls either outside the modern human variation (external genealogy) or inside (internal genealogy). The internal genealogy is the most frequent in the genome. However, if a mutation spread among the ancestors of modern humans (blue dot) more recently than the population split with the ancestors of archaic humans, the archaic will be expected to fall outside the modern human variation in the genomic region around the mutation that has not been unlinked by recombination (blue rectangles). This region is expected to be large if the mutation was positively selected and spread quickly in the population, not leaving enough time for recombination to break its linkage with other mutations (black dots). (B) Time scales for the signatures of selection in humans. The tree represents the simplified population history of Neanderthals, Denisovans, and modern humans (Prufer et al. 2017) . The colors indicate the time scales investigated by the ELS scan (red) and other methods (blue) used to detect events of positive selection on the modern human lineage. ( fig. 3) . We also found that Siglec-13, which is deleted in hominins, is highly expressed in chimpanzee intestinal epithelium and skin ( fig. 3 ). Current and previously published data are summarized in table 2. Although we obviously cannot detect tissue-specific expression in Neanderthals or Denisovans, it is reasonable to consider the possibility that these unusual human-specific expression patterns arose during the course of the large-scale changes affecting the CD33rSIGLECs gene cluster during the proposed "hotspot" and would thus have already existed in both of those extinct relatives. Recent availability of sequences of many modern human genomes (1000 Genomes Project Consortium et al. 2015 , archaic hominins (Green et al. 2010; Reich et al. 2010 Reich et al. , 2011 Meyer et al. 2012 Meyer et al. , 2016 Castellano et al. 2014; Prufer et al. 2014 Prufer et al. , 2017 Sawyer et al. 2015; Kuhlwilm et al. 2016; Slon et al. 2018) , and great apes (Prado-Martinez et al. 2013; Xue et al. 2015; de Manuel et al. 2016 ) provides a tool for understanding the role of evolution in shaping genetic architecture and allowing us to identify the genetic basis of phenotypic traits of various species (Haffter et al. 1996; Bryk and Tautz 2014; Valenzano et al. 2015; Castellano and Munch 2020) . Our earlier studies noted multiple genomic changes affecting sialic acid biology, and we suggested the possibility of a hotspot in humans affecting these pathways compared with our closest evolutionary relatives (Altheide et al. 2006) . However, the limited availability of annotated and high coverage genomes at the time left the question unresolved, and our suggestion has been recently tested by others (Moon et al. 2018 ), who could not find evidence for strong selective sweeps in current human populations. We herewith show that apart from the fixed CMAH null mutation and increased expression of ST6GAL1 (Gagneux et al. 2003) , most of the human-specific changes affecting sialic acid biology are found in the SIGLEC gene cluster on chromosome 19, and that although great ape genomes do not show many changes in this cluster, almost all the human changes are also found in archaic genomes of Neanderthals and Denisovans (Reich et al. 2010; Castellano et al. 2014; Prufer et al. 2014; Slon et al. 2018; Bokelmann et al. 2019 ). In keeping with this overall conclusion, there was no evidence for strong selective pressure in this cluster after $0.6 Ma, when the human lineage diverged from Neanderthal-Denisovan common ancestor. With regard to timing and order of changes, the most likely possibility is that the CMAH mutation first dramatically altered sialic acids throughout the body, possibly initiating a series of events that resulted in SIGLEC cluster changes. The complex and dynamic relationships between host and pathogens is often characterized by the Red Queen hypothesis, whereby slowly evolving hosts have to keep changing to keep pace with the pressure exerted by more rapidly evolving pathogens. Sialic acids form one primary interface, "the molecular frontier" in this evolutionary arms race. The loss-of-function of CMAH and the changes in ST6GAL1 expression undoubtedly led to multiple alterations of human-pathogen interactions. The direct consequence was the change in human cell surfaces, leading to overexpression of Neu5Ac, escaping Neu5Gcspecific pathogens, and favoring human pathogens that interact with Neu5Ac. There are many other known and possible biological consequences of human Neu5Gc loss (Varki 2009; Okerblom and Varki 2017) , including protection from various Neu5Gc-recognizing pathogens such as malaria caused by Plasmodium reichenowi in African great apes (Martin et al. 2005) , Escherichia coli K99 gastroenteritis (Kyogashima et al. 1989) , transmissible gastroenteritis coronavirus (Schwegmann-Wessels and Herrler 2006), and simian virus 40 (Campanero-Rhodes et al. 2007 ). Conversely, CMAH loss likely made humans susceptible to Neu5Ac-preferring pathogens such as Vibrio cholerae (causative agent of the human-specific disease cholera) (Alisson-Silva et al. 2018) and typhoid fever caused by a secreted bacterial toxin that (Deng et al. 2014) . Another human-specific pathogen Streptococcus pneumoniae recognizes free Neu5Ac released from human cells by its secreted sialidase (Hentrich et al. 2016 ). Yet another secondary consequence appears to have been the evolution of pathogens that express Neu5Ac on their surfaces as a way to engage CD33related Siglecs and downregulate innate immune responses in the human host. Again, the existence of many human-specific pathogens that display Neu5Ac supports this scenario (Angata 2018) . For example, Group B Streptococcus type III, interacts with Siglec-9 through sialylated CPS and inhibits inflammatory response by neutrophils (Carlin et al. 2009 ). Likewise, nontypeable Haemophilus influenzae expressing sialic acid on its lipooligosaccharide binds to Siglec-5 and reduces cytokine production by myeloid cells (Angata et al. 2013) , and the E. coli K1 strain that causes meningitis in neonates and infection in urinary tract, engages Siglec-11, and escapes killing (Hayakawa et al. 2017) . Also, as expected, nonhuman primate Siglec-9 prefers binding to Neu5Gc, whereas human Siglec-9 prefers binding to Neu5Ac (Sonnenburg et al. 2004) . Likewise, human Siglec-5 and CD33 prefer binding to Neu5Ac, compared with the baboon CD33, which strongly prefers Neu5Gc (Padler-Karavani et al. 2014 ). These findings imply that Siglecs underwent adaptation in new environments to "catch up" with the changes in human sialome caused by the hominin CMAH mutation. Some human Siglecs still prefer binding to Neu5Gc (Angata 2018) , perhaps due to incomplete adaptation to the derived, human Neu5Ac-dominant sialome. Although these genomic changes are ancient, evidence also suggests that existing human polymorphisms are associated with several diseases such as Chronic obstructive pulmonary disease, asthma, Alzheimer's disease, and meningitis (Yamanaka et al. 2009; Gao et al. 2010; Ali et al. 2014; Schwarz et al. 2016 ). In addition, human Siglec-XII and chimpanzee Siglec12 are expressed on macrophages and luminal epithelia (Mitra et al. 2011) . However, human Siglec-XII harbors a universal mutation (R122C) that makes the protein unable to recognize sialic acid. Interestingly, chimpanzee Siglec12 and human Siglec-XII with its arginine experimentally restored strongly prefer Neu5Gc (Mitra et al. 2011) . These results suggest a scenario where Siglec-12 lost an endogenous ligand and is thus being eliminated from the population. Prior studies have reported human-specific Siglec expression changes in placenta (Siglec-6) (Brinkman- Van der Linden et al. 2007 ), brain microglia (CD33/Siglec11/16) (Hayakawa et al. 2005; Schwarz et al. 2016) , amniotic epithelium (Siglec-5/14) (Ali et al. 2014) , ovarian fibroblasts (Siglec11/16) (Wang et al. 2011), and NK cells (Siglec-17) (Wang, Mitra, Secundino, et al. 2012) . We here add to these expression differences by showing that Siglec-7 is upregulated in human but not chimpanzee pancreatic islets, and that the SIGLEC13 deletion resulted in a loss of Siglec-13 expression from human epithelia. Many of these expression changes could represent secondary consequences of multiple genomic changes that occurred in this gene cluster earlier in the hominin lineage. Overall, it is likely that the selective pressures driving all these changes were most prominent sometime after the split from the common ancestor of human and chimpanzee, but before the split from the Human-Neanderthal-Denisovan common ancestor. Taken together, our data suggest that innate immune encounters with pathogens markedly altered hominin Siglec biology between 0.6 and 2 Ma, potentially affecting human evolution. Notably, this is the time period when a variety of changes were occurring in genus Homo, including exploration of new habitats (White et al. 1993; Semaw et al. 1997; deMenocal 2011) , striding bipedalism and running (Bramble and Lieberman 2004; Lieberman 2015) , and scavenging and hunting, involving butchery of animal carcasses with stone tools (Semaw et al. 1997; O'Connell et al. 2002; McPherron et al. 2010; Sayers and Lovejoy 2014; Harmand et al. 2015; Baird et al. 2016) , activities that may have increased risk of injury, novel infections, and use of fire (Smith et al. 2015) . In this regard, it is also interesting that human-like elimination of Cmah in mice enhances running ability (Okerblom et al. 2018) as well as macrophage activation ). Supplementary data are available at Genome Biology and Evolution online. A global reference for human genetic variation The Ensembl gene annotation system Siglec-5 and Siglec-14 are polymorphic paired receptors that modulate neutrophil and amnion signaling responses to group B Streptococcus Human evolutionary loss of epithelial Neu5Gc expression and species-specific susceptibility to cholera System-wide genomic and biochemical comparisons of sialic acid biology among primates and rodents: evidence for two modes of rapid evolution Possible influences of endogenous and exogenous ligands on the evolution of human Cloning and characterization of a novel mouse Siglec, mSiglec-F: differential evolution of the mouse and human (CD33) Siglec-3-related gene clusters A second uniquely human mutation affecting sialic acid biology Loss of Siglec-14 reduces the risk of chronic obstructive pulmonary disease exacerbation No evidence for recent selection at FOXP2 among diverse human populations Injury, inflammation and the emergence of humanspecific genes N-Glycolyl groups of nonhuman chondroitin sulfates survive in ancient fossils A genetic analysis of the Gibraltar Neanderthals Endurance running and the evolution of Homo Loss of N-glycolylneuraminic acid in human evolution. Implications for sialic acid recognition by siglecs Human-specific expression of Siglec-6 in the placenta Copy number variants and selective sweeps in natural populations of the house mouse (Mus musculus domesticus). Front Genet Global analysis of alternative splicing differences between humans and chimpanzees N-Glycolyl GM1 ganglioside as a receptor for simian virus 40 Group B Streptococcus suppression of phagocyte functions by protein-mediated engagement of human Siglec-5 Population genomics in the great apes Patterns of coding variation in the complete exomes of three Neandertals Initial sequence of the chimpanzee genome and comparison with the human genome A mutation in human CMP-sialic acid hydroxylase occurred after the Homo-Pan divergence Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution Extensive changes in the expression of the opioid genes between humans and chimpanzees Human genetics: muscling in on hominid evolution Chimpanzee genomic diversity reveals ancient admixture with bonobos Anthropology. Climate and human evolution Host adaptation of a bacterial toxin from the human pathogen Salmonella Typhi Non-coding RNA genes and the modern RNA world Molecular evolution of FOXP2, a gene involved in speech and language A humanized version of Foxp2 affects cortico-basal ganglia circuits in mice Construction and analysis of a human-chimpanzee comparative clone map Evolution of glycan diversity Human-specific regulation of alpha 2-6-linked sialic acids Polymorphisms in the sialic acid-binding immunoglobulin-like lectin-8 (Siglec-8) gene are associated with susceptibility to asthma Sexual selection by female immunity against paternal antigens can fix loss of function alleles A draft sequence of the Neandertal genome The identification of genes with unique and essential functions in the development of the zebrafish, Danio rerio Reconstructing the genetic history of late Neanderthals 3-Million-year-old stone tools from Lomekwi 3, West Turkana, Kenya Alu-mediated inactivation of the human CMP-Nacetylneuraminic acid hydroxylase gene A human-specific gene in microglia Fixation of the human-specific CMP-N-acetylneuraminic acid hydroxylase pseudogene and implications of haplotype diversity for human evolution Coevolution of Siglec-11 and Siglec-16 via gene conversion in primates Streptococcus pneumoniae senses a human-like sialic acid profile via the response regulator CiaR New fossils from Jebel Irhoud, Morocco and the pan-African origin of Homo sapiens Positive selection of a gene family during the emergence of humans and African apes Preeclampsia leads to dysregulation of various signaling pathways in placenta Understanding the recent evolution of the human genome: insights from human-chimpanzee genome comparisons Maximum reproductive lifespan correlates with CD33rSIGLEC gene number: implications for NADPH oxidasederived reactive oxygen species in aging MicroRNA biogenesis: coordinated cropping and dicing Evolution at two levels in humans and chimpanzees High-resolution comparative analysis of great ape genomes Ancient gene flow from early modern humans into Eastern Neanderthals Escherichia coli K99 binds to N-glycolylsialoparagloboside and N-glycolyl-GM3 found in piglet small intestine Human locomotion and heat loss: an evolutionary perspective Evolution of human-chimpanzee differences in malaria susceptibility: relationship to human genetic loss of N-glycolylneuraminic acid Evidence for stone-tool-assisted consumption of animal tissues before 3.39 million years ago at Dikika, Ethiopia Revealing the world of RNA interference A high-coverage genome sequence from an archaic Denisovan individual Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins SIGLEC12, a human-specific segregating (pseudo)-gene, encodes a signaling molecule expressed in prostate carcinomas Examination of signatures of recent positive selection on genes involved in human sialic acid biology A structural difference between the cell surfaces of humans and the great apes Loss of Siglec expression on T lymphocytes during human evolution Evolution of genetic and genomic features unique to the human lineage Male strategies and Plio-Pleistocene archaeology Biochemical, cellular, physiological, and pathological consequences of human loss of N-glycolylneuraminic acid Human-like Cmah inactivation in mice increases running endurance and decreases muscle fatigability: implications for human evolution Loss of CMAH during human evolution primed the monocyte-macrophage lineage toward a more inflammatory and phagocytic state Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts Rapid evolution of binding specificities and expression patterns of inhibitory CD33-related Siglecs in primates Detecting ancient positive selection in humans using extended lineage sorting Great ape genetic diversity and population history The complete genome sequence of a Neanderthal from the Altai Mountains A high-coverage Neandertal genome from Vindija Cave in Croatia Genetic history of an archaic hominin group from Denisova Cave in Siberia Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania Origins of de novo genes in human and chimpanzee Nuclear and mitochondrial DNA sequences from two Denisovan individuals Blood, bulbs, and bunodonts: on evolutionary ecology and the diets of Ardipithecus, Australopithecus, and early Homo Inferring human population size and separation history from multiple genome sequences Human-specific derived alleles of CD33 and other genes protect against postreproductive cognitive decline Paired Siglec receptors generate opposite inflammatory responses to a human-specific pathogen Sialic acids as receptor determinants for coronaviruses 5-million-year-old stone tools from Gona, Ethiopia The genome of the offspring of a Neanderthal mother and a Denisovan father The significance of cooking for early hominin scavenging A uniquely human consequence of domain-specific functional adaptation in a sialic acidbinding receptor Myosin gene mutation correlates with anatomical changes in the human lineage An evolutionary medicine perspective on Neandertal extinction The African turquoise killifish genome provides insights into evolution and genetic architecture of lifespan A chimpanzee genome project is a biomedical imperative Multiple changes in sialic acid biology during human evolution Since there are PAMPs and DAMPs, there must be SAMPs? Glycan "self-associated molecular patterns" dampen innate immunity, but pathogens can mimic them Siglecs-the major subfamily of I-type lectins Symbol nomenclature for graphical representations of glycans Evolution of siglec-11 and siglec-16 genes in hominins Specific inactivation of two immunomodulatory SIGLEC genes during human evolution Expression of Siglec-11 by human and chimpanzee ovarian stromal cells, with uniquely human ligands: implications for human ovarian physiology and pathology DNA sequence and comparative analysis of chimpanzee chromosome 22 New discoveries of Australopithecus at Maka in Ethiopia Hominin taxic diversity: fact or fantasy Mountain gorilla genomes reveal the impact of longterm population decline and inbreeding Chemical synthesis and evaluation of a disialic acid-containing dextran polymer as an inhibitor for the interaction between Siglec 7 and its ligand Deletion polymorphism of SIGLEC14 and its functional implications Associate editor: Naruya Saitou We thank members of the Varki