key: cord-0318187-6z5f2gz3 authors: Barreiro, Luis B.; Patin, Etienne; Neyrolles, Olivier; Cann, Howard M.; Gicquel, Brigitte; Quintana-Murci, Lluís title: The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate-Immunity CD209/CD209L Region date: 2005-11-30 journal: The American Journal of Human Genetics DOI: 10.1086/497613 sha: 7fa6a89af06e588d737c868cdcebceaee14d7532 doc_id: 318187 cord_uid: 6z5f2gz3 The innate immunity system constitutes the first line of host defense against pathogens. Two closely related innate immunity genes, CD209 and CD209L, are particularly interesting because they directly recognize a plethora of pathogens, including bacteria, viruses, and parasites. Both genes, which result from an ancient duplication, possess a neck region, made up of seven repeats of 23 amino acids each, known to play a major role in the pathogen-binding properties of these proteins. To explore the extent to which pathogens have exerted selective pressures on these innate immunity genes, we resequenced them in a group of samples from sub-Saharan Africa, Europe, and East Asia. Moreover, variation in the number of repeats of the neck region was defined in the entire Human Genome Diversity Panel for both genes. Our results, which are based on diversity levels, neutrality tests, population genetic distances, and neck-region length variation, provide genetic evidence that CD209 has been under a strong selective constraint that prevents accumulation of any amino acid changes, whereas CD209L variability has most likely been shaped by the action of balancing selection in non-African populations. In addition, our data point to the neck region as the functional target of such selective pressures: CD209 presents a constant size in the neck region populationwide, whereas CD209L presents an excess of length variation, particularly in non-African populations. An additional interesting observation came from the coalescent-based CD209 gene tree, whose binary topology and time depth (∼2.8 million years ago) are compatible with an ancestral population structure in Africa. Altogether, our study has revealed that even a short segment of the human genome can uncover an extraordinarily complex evolutionary history, including different pathogen pressures on host genes as well as traces of admixture among archaic hominid populations. Infectious diseases have been paramount among the threats to health and survival for most of human evolutionary history (Haldane 1949; Lederberg 1999; Harpending and Rogers 2000; Cooke and Hill 2001) . The interaction of the human host with a wide variety of pathogens has been accompanied by genetic adaptations to spatially and temporally fluctuating selective pressures imposed by the infectious agents. Numerous studies have sought the genetic imprint of natural selection imposed by pathogen pressures in human genes involved in immune response or, more generally, in host-pathogen interactions (Vallender and Lahn 2004) . For example, natural selection has acted on such genes as MHC, b-globin, G6PD, IL-2, IL-4, TNFSF5, the Duffy blood group genes, and CCR5 (Ohta 1991; Hughes et al. 1994; Flint et al. 1998; Hamblin and Di Rienzo 2000; Tishkoff et al. 2001; Bamshad et al. 2002; Sabeti et al. 2002; Verrelli et al. 2002) . However, little is known about genetic variation of genes involved in direct recognition of pathogens, or pathogens' products, and virtually no studies have investigated the extent to which pathogens have exerted selective pressures on the innate immune system. The phylogenetically ancient innate immune system governs the initial detection of pathogens and stimulates the first line of host defense (Medzhitov and Janeway 1998a Janeway and Medzhitov 2002) . Recognition of pathogens is mediated by phagocytic cells through germline-encoded receptors, known as "pattern recognition receptors," which detect pathogenassociated molecular patterns that are characteristic products of microbial physiology (Kimbrell and Beutler 2001; Janeway and Medzhitov 2002) . This initial interaction is then translated into a set of endogenous signals that ultimately lead to the induction of the adaptive immune response (Medzhitov and Janeway 1998b) . In recent years, the C-type lectin receptors have re- Scaled diagram of the CD209/CD209L genomic region. Sequenced regions are represented in gray. For CD209, we sequenced a total of 5,500 bp per chromosome, and, for CD209L, 5,391 bp per chromosome. The neck region corresponding to exon 4 and composed of seven coding repeats is also shown. ceived much attention in the area of innate immunology, the results of which were novel functional insights into the primary interface between host and pathogens (Medzhitov 2001; Cook et al. 2003; Fujita et al. 2004; Geijtenbeek et al. 2004; McGreal et al. 2004) . In this context, two prototypic members of the C-type lectinreceptor family are particularly interesting, since they can act as both cell-adhesion receptors and pathogenrecognition receptors. These lectins include CD209 (DC-SIGN: dendritic cell-specific ICAM-3 grabbing nonintegrin [MIM 604672]) and its close relative CD209L (L-SIGN: liver/lymph node-specific ICAM-3 grabbing nonintegrin [MIM 605872]) (Curtis et al. 1992; Geijtenbeek et al. 2000b Geijtenbeek et al. , 2004 Soilleux et al. 2000; Pohlmann et al. 2001 ). These lectin-coding genes are located on chromosome 19p13.2-3, within an ∼26-kb segment, and result from a duplication of an ancestral gene (Bashirova et al. 2003; Soilleux 2003 ). An additional characteristic of both CD209 and CD209L is the presence of a neck region, primarily made up of seven highly conserved 23-aa repeats, that separates the carbohydrate-recognition domain involved in pathogen binding from the transmembrane region. This neck region presents high nucleotide identity between repeats, both within each molecule and between CD209 and CD209L. It has been shown that this region plays a crucial role in the oligomerization and support of the carbohydraterecognition domain; therefore, it influences the pathogen-binding properties of these two receptors (Soilleux et al. 2000 (Soilleux et al. , 2003 Feinberg et al. 2005) . In regard to expression profiles, CD209 is expressed primarily on phagocytic cells, such as dendritic cells and macrophages, whereas CD209L expression is restricted to endothelial cells in liver and lymph nodes (Bashirova et al. 2001; Soilleux et al. 2001 Soilleux et al. , 2002 . As pathogen-recognition receptors, the two lectins have been shown to recognize a vast range of microbes, some of which are of major public health importance (Geijtenbeek et al. 2004) . Indeed, CD209 captures bacteria such as Mycobacterium tuberculosis, Helicobacter pylori, and certain Klebsiela pneumonia strains; viruses such as HIV-1, Ebola virus, cytomegalovirus, hepatitis C virus, Dengue virus, and SARS-coronavirus; and parasites like Leish-mania pifanoi and Schistosoma mansoni (Geijtenbeek et al. 2000a Alvarez et al. 2002; Colmenares et al. 2002; Halary et al. 2002; Appelmelk et al. 2003; Lozach et al. 2003; Tailleux et al. 2003; Tassaneetrithep et al. 2003; Bergman et al. 2004; Marzi et al. 2004) . With regard to CD209L, studies to date have shown an interaction with a variety of viruses, including HIV, hepatitis C, Ebola, and coronavirus, as well as with the parasite Schistosoma mansoni (Bashirova et al. 2001; Alvarez et al. 2002; Gardner et al. 2003; Jeffers et al. 2004; Van Liempt et al. 2004) . In this context, the efficiency of the two lectins in pathogen recognition and subsequent processing may have important consequences for the quality of host immune responses and consequent pathogen control and/or clearance. An important step forward in the understanding of human adaptation to pathogens and control of infectious diseases includes the description of quality and quantity of genetic variation in genes involved in host recognition of infectious agents. Given the direct interaction of CD209 and CD209L with a large variety of pathogens, the CD209/CD209L genomic region provides an excellent model system to illustrate the extent to which pathogens have exerted selective pressures on host immunity genes. An additional feature that makes these genes highly interesting in evolutionary studies is that they are likely to have been influenced by similar genomic forces (recombination, mutation rates, etc.) because of their close physical proximity (∼15 kb), high nucleotide (73%) and amino acid (77%) identity, and identical exon-intron organization (Soilleux 2003) 1 ). In addition, it has been proposed that gene duplication of immunity genes is a molecular strategy developed by the host to enlarge its defense potential (Ohno 1970; Trowsdale and Parham 2004) . A number of immune-system gene families have evolved, by gene duplication followed by natural selection, to provide responses to a wider range of pathogens, with welldocumented examples in immunoglobulin and MHC genes (Hughes et al. 1994; Ota et al. 2000) . In this context, duplicated genes in cis, like CD209 and CD209L, may have undergone differential selective pressures to enlarge the defense role of these lectins. To address these complex issues, we performed a sequence-based survey of the entire CD209/CD209L region in a panel of individuals of different ethnic origins. Here, we report evidence showing that these two closely related innate immunity genes have gone through completely different evolutionary processes that are reflected in their current patterns of diversity. In addition, our study provides novel insights into how pathogens have shaped the patterns of variability of immunity genes resulting from gene duplication. Sequence variation of the CD209/CD209L region was determined in 41 sub-Saharan Africans, 43 Europeans, and 43 East Asians, in a total of 254 chromosomes from the Human Genome Diversity Panel (HGDP)-CEPH panel ). More-detailed information about the composition of the three major ethnic groups can be found in table 1. The variation in the repeat number of the neck region of CD209 and CD209L was defined in the entire HGDP-CEPH panel, comprising 1,064 DNA samples from 52 worldwide populations. In addition, the orthologous regions for both genes were sequenced in four chimpanzees (Pan troglodytes). The sequenced fragments of the CD209/CD209L genomic region are shown in figure 1. The entire CD209 region-including exons, introns, and ∼1 kb of the 5 UTR corresponding to the promoter region-was sequenced, for a total of 5.5 kb per individual. For CD209L, we sequenced a total of ∼5.4 kb per individual, following the same approach used for CD209, with the exception of the neck region. That region was genotyped for its number of repeats, since it turned out to be highly polymorphic, which prevented the sequencing process. Genotyping was performed by a single PCR amplification followed by migration in 2% agarose gels. Human primers were used to both amplify and sequence the orthologous regions in chimpanzees. However, because of polymorphisms specific to the chimpanzee lineage, we could not obtain the entirety of the sequence. Thus, 4.9 kb (90% of the total) of the chimpanzee CD209 sequence were obtained, and 5.3 kb (98% of the total) of CD209L. Detailed information on primer sequences and PCR amplification conditions is available on request. All nucleotide sequences were obtained using the Big Dye terminator kit and the 3100 automated sequencer from Applied Biosystems. Sequence files and chromatograms were inspected using the GENALYS software (Takahashi et al. 2003 ; Centre National de Genotypage). As a measure of quality control, when new mutations were identified in primer binding regions, new primers were designed and sequence reactions were repeated, to avoid allele-specific amplification. All singletons observed in our data set were systematically reamplified and resequenced. On the basis of the levels of diversity observed in the CD209/CD209L genomic region, we calculated the average number of pairwise differences (p) and the Watterson's estimator ( ) (Watterson 1975 (Fay and Wu 2000) . P values for the different tests were estimated from coalescent simulations under an infinite-site 4 10 model, with use of a fixed number of segregating sites and the assumption of no recombination, which has been shown to be a conservative assumption (Gilad et al. 2002) . In parallel, we estimated P values for all these tests, using the empirical distribution obtained from sequencing data of 132 genes in a panel of 24 African Americans and 23 European Americans (Akey et al. 2004 ). All these analyses, together with the interspecies McDonald-Kreitman (McDonald and Kreitman 1991) and K A /K S (Kimura 1968 ) tests, were performed using the DnaSP package (Rozas et al. 2003) . Genetic distances between populations (F ST ) and heterozygosity values were estimated using the Arlequin package (Schneider et al. 2000) . F ST statistical significance was assessed using 10,000 bootstrap replications. To bear out a deficit or an excess of heterozygosity in the neck region of CD209 and CD209L, we used BOTTLENECK (Cornuet and Luikart 1996) to compute for each geographic region, the distribution of the heterozygosity expected from the observed number of alleles, given the sample size (n) under the assumption of mutational-drift equilibrium. This distribution was obtained through simulation of the coalescent process of n genes under two mutational models, the infinite-site model and the stepwise mutation model. In addition, to obtain information on the fraction of genetic variance in the neck region that is due to intraand interpopulation differences, we performed an anal-ysis of molecular variance (AMOVA), using the Arlequin package (Schneider et al. 2000) . The AMOVA results were compared with those of 377 microsatellites analyzed in the same population panel . Haplotype reconstruction was performed by use of the Bayesian statistical method implemented in Phase (v.2.1.1) (Stephens and Donnelly 2003) . We applied the algorithm five times, using different randomly generated seeds, and consistent results were obtained across runs. After haplotype reconstruction, linkage disequilibrium (LD) between pairs of SNPs was computed using Lewontin's D index (Lewontin 1964) . For this analysis, only markers presenting a minimum allele frequency (MAF) of 10% were considered, since rare alleles have been shown to present a higher probability of being in significant LD than do common ones (Reich et al. 2001) . The graphic display of the LD plots was constructed using GOLD (Abecasis and Cookson 2000; Center for Statistical Genetics) . To support the existence of a recombination hotspot in the region under study, we used the hotspot-recombination model implemented in Phase (v.2.1.1). Under this model, we assumed that there was, at most, one hotspot of unknown position. We then estimated the background population-recombination rate (r) and the relative intensity of any recombination hotspot. To obtain better estimates, we increased 10 times the number of iterations of the final run of the algorithm. All our estimations were obtained by averaging results of five independent runs with use of different seed numbers. Since the model used is Bayesian, we could also estimate, for each population, the posterior probability of a hotspot of intensity 11 ( ) l 1 1 and 110 ( ). l 1 10 We obtained the gene tree and estimated the time of the most recent common ancestor (T MRCA ) for CD209, using the maximum-likelihood coalescent method implemented in GENETREE (Griffiths and Tavare 1994) . The mutation rate m for each gene was estimated on the basis of the net divergence between humans and chimpanzees and under the assumption both that the species separation occurred 5 million years ago (MYA) and of a generation time of 20 years. Using this m and v maximum likelihood (v ML ), we estimated the effective population size parameter ( ). With the assumption of a N e generation time of 20 years and the estimated , the N e coalescence time, scaled in units, was converted into 2N e years. The coalescence process implemented in SIM-COAL2 (Laval and Excoffier 2004 ) allowed us to estimate the probability of the T MRCA for CD209, through simulations, with use of both the number of 4 2 # 10 observed segregating sites and the estimated . N e We determined sequence diversity in the CD209 and CD209L genes ( fig. 1 ) as well as length variation of the neck region in 254 chromosomes originating from three major ethnic groups: sub-Saharan Africans, Europeans, and East Asians. In addition, the orthologous sequences were obtained in four chimpanzees, to infer the ancestral state at each site, to estimate the divergence between humans and chimpanzees, and to perform a number of interspecies neutrality tests. For CD209, we identified a total of 79 SNPs and 2 indels, including 5 nonsynonymous, 5 synonymous, and 71 noncoding variants. The five nonsynonymous SNPs were all located in the neck region (exon 4): SNPs 1839 (ArgrGln), 1888 (GlurAsp), and 1908 (ArgrGln) achieved a frequency of ∼15%, and SNP 1970 (Leur Val), a frequency of 6%. These mutations were restricted to the African sample. SNP 1472 (AlarThr) was observed as a singleton in an East-Asian individual. For CD209L, we identified 64 SNPs and 2 indels, including 4 nonsynonymous and 62 noncoding variants. The four nonsynonymous variants were located in different exons: SNP 141 (ThrrAla) in exon 2, SNP 3476 (Aspr Asn) in exon 5, SNP 4268 (ThrrAla) in exon 6, and SNP 5580 (ArgrGln) in exon 7. All these mutations were singletons except SNP 3476, which presented high frequencies for its derived allele in all geographic regions: 97.6% in Africans, 57% in Europeans, and 77% in East Asians. All variable sites were in Hardy-Weinberg equilibrium for both CD209 and CD209L, after Bonferroni correction for multiple testing. The allelic composition of CD209 and CD209L haplotypes and their frequency distribution in the three major ethnic groups is illustrated in figure 2, along with the haplotype composed of the ancestral allelic state of each SNP inferred from chimpanzee data. For CD209, we identified 42 different haplotypes, with an overall heterozygosity of 84% (table 2) . Three major haplotypes (H2, H29, and H40) accounted for ∼50% of the African variability, whereas they were at very low frequency (H2 at ∼5%) or absent (H29 and H40) in Europeans and East Asians ( fig. 2A ). In turn, the two haplotypes (H1 and H3) that accounted for 58% and 83% of the European and East Asian variability, respectively, were observed at very low frequency (H1 at 6%) or even absent (H3) in Africa. However, H3, which had a frequency of 36% and 20% in Europe and East Asia, respectively, is just a one-step mutation (SNP 871) from H2, the most frequent haplotype in the African sample. The most in- Inferred haplotypes for CD209 (A) and CD209L (B). The chimpanzee sequence was used to deduce the ancestral state at each position, except for the CD209L positions 1232, 1236, and 1240. For those polymorphisms, the ancestral state was considered to be the most frequent allele. Dark boxes correspond to the derived state at each position. The numbers on the right of the figure indicate the absolute frequency of each haplotype in the different populations studied. Repeat-number variation in the neck region of each gene is reported in the gray columns with the column heads "NR." Indel polymorphisms are referred as to "1" for insertion and "0" for deletion. teresting observation of the CD209 haplotype variability was the presence of a highly divergent haplotype cluster. This cluster, which contains haplotypes 40-42 (referred to here as "cluster A"), differs from all other haplotypes (referred to here as "cluster B") by 35 fixed positions ( fig. 2A ). Cluster A is Africa specific and is present at a frequency of ∼15%, whereas cluster B is present in the remaining African and all non-African samples. It is worth noting that three (SNPs 1839 (SNPs , 1888 (SNPs , and 1908 of the five nonsynonymous mutations identified for this gene are unique to cluster A. In all cases, these three mutations were segregating together, with the exception of one haplotype, H41, which does not contain the SNP 1839. Samples from cluster A are geographically wide- 2B) , with an overall heterozygosity of 94% (table 2) . Only one haplotype (H38) at a frequency of ∼15% was shared in the three continental regions. To assess the degree of population differentiation, if any, we computed Wright's F ST (Wright 1931) , using haplotype frequencies. estimates were significant F ST ( ) for all population comparisons, indicating P ! .0001 continental differentiation for both CD209 and CD209L. However, substantial differences were observed between the two genes: the overall F ST for CD209 among Africans, Europeans, and East-Asians was 0.15, whereas CD209L presented a threefold lower F ST value of 0.05. For both genes, the larger F ST values were observed between African and East Asian populations, with F ST values of 0.22 for CD209 and 0.07 for CD209L. The average nucleotide diversity (p) was strikingly different, both between the two genes and among populations (table 2) . Globally, p values were three-to fivefold lower for CD209 ( ) than for CD209L Ϫ4 3-7 # 10 (∼16 # 10 Ϫ4 ), except for African populations, for whom the CD209 p value was unusually high ( ) be-Ϫ4 26 # 10 cause of the presence of the highly divergent cluster A. Indeed, when cluster A was excluded from the analysis, the African p value dropped to . To estimate Ϫ4 8 # 10 the substitution rate of each region and evince possible mutational differences that could explain the strong contrast observed in nucleotide-diversity patterns, we determined the human-chimpanzee divergence for both genes. The average net number of differences between the two species was 77.3 substitutions (or 0.0157 substitutions per nucleotide) for CD209 and 90.6 substitutions (or 0.0171 substitutions per nucleotide) for CD209L. Since the human-chimpanzee speciation occurred 5 MYA, we obtained similar nucleotide-substitution rates per site per year (CD209, ; Ϫ9 1.57 # 10 CD209L, ). Ϫ9 1.70 # 10 To assess the patterns of LD in the CD209/CD209L region, haplotypes for the entire genomic region were reconstructed using markers with an MAF of 10%. D measures among these markers were estimated for African and non-African populations independently; the graphical representation of LD levels is illustrated in figure 3. Two distinct regions, which correspond to either CD209 or CD209L, showed strong LD and are separated by a boundary that corresponds to the intergenic region. For CD209, a block of intragenic LD was observed in both African and non-African populations. For the African sample, 89% of all pairwise comparisons indicated significant levels of LD, whereas, for non-Africans, all D pairwise comparisons were significant. The magnitude of intragenic recombination (and/or gene conversion) of CD209L was slightly higher than for CD209. Nevertheless, considerable and significant levels of LD were observed between sites: 83% of all LD pairwise comparisons were significant in the African group, and 99% were in the non-African sample. Overall, CD209 exhibited a blocklike structure in both groups, whereas CD209L presented lower-although mostly Pairwise D LD plots in non-African and African populations. European and East Asian samples were plotted together as "non-Africans" because they showed similar levels of LD (data not shown). Red tags indicate the physical position of each SNP across the genomic region studied. Blue and green lines label the SNPs ( ) used for CD209 and CD209L, respectively, in the LD plot. For CD209, 47 MAF 1 10% SNPs presented an in the African sample and 5 in the non-African, whereas, for CD209L, 18 SNPs showed an in MAF 1 10% MAF 1 10% Africans and 20 in non-Africans. The high prevalence of SNPs with for CD209 in Africa is due to the presence of the highly MAF 1 10% divergent cluster A, which presents 35 diagnostic variants with a frequency of 15%. significant-LD levels, in particular among the non-African sample. The strong decay in LD observed in the intergenic region ( fig. 3) , which spans only ∼14 kb, suggests the occurrence of a number of recombination events. To test the hypothesis of a possible recombination hotspot situated within this region, recombination parameters across the entire CD209/CD209L region (∼26 kb) were computed for the three populations, by use of the recombination model implemented in Phase (v.2.1.1) ( fig. 4 ). This model (Stephens and Donnelly 2003) estimates the position and relative intensity of the hotspot (l) as compared with the background population recombination rate (r) (see the "Material and Methods" section). A l value of 1 corresponds to absence of recombinationrate variation, whereas l values 11 indicate the presence of a hotspot. The model detected the occurrence of a hotspot in the intergenic region, with Africans presenting a l of 18, whereas Europeans and East Asians exhibited l values of 63 and 53, respectively ( fig. 4) . We estimated the posterior probabilities of a hotspot of any kind, , and of at least 10 times the background re-Pr (l 1 1) combination rate, . was 100% for Pr (l 1 10) Pr (l 1 1) all population groups, and was 64% for Af-Pr (l 1 10) ricans, 97% for Europeans, and 92% for East Asians. Thus, our data clearly indicate a relative increase of the recombination levels between the two genes, which suggests the occurrence of a hotspot of recombination, the magnitude of which varies among the major ethnic groups. However, our data do not include intergenic SNPs; therefore, the exact location and width of the recombination hotspot within the intergenic region remains unclear, since this observation would be consistent with either an intense narrow hotspot or a weaker but wider hotspot. The identification of a strong decay in LD between CD209 and CD209L facilitated the interpretation of neutrality tests, because the noise introduced by hitchhiking effects between the genes is reduced. We applied Tajima's D and Fay and Wu's H tests to determine whether these statistics significantly deviated from expectations under neutrality, using both coalescent simulations and the empirical distribution obtained from Akey et al. (2004) . Globally, Tajima's D test indicated different tendencies for the two genes (table 2) . CD209 always yielded negative values for Tajima's D but never achieved significance to reject the hypothesis of neutrality, whereas CD209L yielded significantly positive values for non-African populations, with use of both Estimates of the hotspot intensity (l) for Africans, Europeans, and East Asians. Estimates of the population recombination rate (r) for each population as well as the posterior probabilities of and are also reported in the key. l 1 1 l 1 10 (table 2) . To evaluate the selective pressures at the protein level, we performed two interspecies tests: K A /K S , which gives the ratio of nonsynonymous and synonymous changes between species, and the McDonald-Kreitman test, which tests the null hypothesis that the ratio of the number of fixed differences to polymorphisms is the same for both nonsynonymous and synonymous mutations. For the K A /K S test, CD209 and CD209L showed similar values, 0.34 and 0.37, respectively. For the McDonald-Kreitman test, the hypothesis of neutrality was rejected for only CD209, because of a clear lack of nonsynonymous polymorphic sites (table 3) . The identical genomic organization of CD209 and CD209L is extended to the neck region, which, in both genes, encodes a track of seven coding repeats of 23 aa each ( fig. 1) (Soilleux et al. 2000) . A previous study has shown that the length of the neck region of CD209L varied between individuals of European descent (Bashirova et al. 2001) . To investigate the degree of polymorphism of the neck region in both CD209 and CD209L, we genotyped it in the entire HGDP-CEPH panel (1,064 individuals from 52 worldwide populations). Striking differences were observed between the two genes (see fig. 5 and table 4 for detailed allele frequencies in each population). For CD209, virtually no variation was observed, and the 7-repeat allele accounted for 99% of the total variability. Despite this limited variation, eight different alleles were observed, with an allele size range of 2-10 repeats, not including a 9-repeat allele. The geographic region that presented the highest variability was the Middle East, with five of the eight different alleles observed ( fig. 5A and table 4) . For CD209L, a com-pletely different pattern emerged, with strong variation in allelic frequencies of different repeat numbers. Of the seven alleles observed (from 4-10-repeat allele size classes), the three most common overall were the 7-(57.42%), the 5-(23.92%), and the 6-(11.37%) repeat alleles. European, Asian, and Pacific populations presented a mosaic composition of different allelic classes, whereas 7-and 6-repeat alleles accounted for most (96%) of the African diversity ( fig. 5B ). The strong difference in the neck-region lengths between the two genes was consequently visible in the heterozygosity values: CD209 exhibited an overall heterozygosity of only 2%, whereas CD209L presented a value of 54% (table 5) . Our results showed that the levels of heterozygosity observed at CD209 were considerably lower than expected, regardless of the mutation model considered (i.e., Infinite Site or Stepwise Mutation Models) (table 5). In strong contrast, although not statistically significant for individual populations, CD209L exhibited a pattern of an excess of heterozygosity in all populations. The table is available in its entirety in the online edition of The American Journal of Human Genetics. The low levels of intragenic recombination observed in CD209 allowed maximum-likelihood coalescent analysis (Griffiths and Tavare 1994) for estimation of the time scale of the origin and evolution of this gene. Since this method assumes an infinite-site model without recombination, the same analysis for CD209L was not conducted because of the substantial amount of recombinant haplotypes observed. For CD209, only 29 of the 254 chromosomes analyzed had to be excluded, as did a single segregating site (SNP 939). The resulting CD209 gene tree estimate, rooted with the chimpanzee sequence (i.e., the chimpanzee sequence was used to define ancestral/derived status of human mutations), is shown in figure 6 . The tree is partitioned into two deep branches that correspond to haplotype clusters A and B. African samples were observed in both sides of the deepest node of the tree (i.e., in both clusters A and B), whereas non-African samples are restricted to one branch of the tree (i.e., cluster B). The maximum-likelihood estimate of v (v ML ) for CD209 was 8.4. On the basis of this v ML value and the estimated mutation rate ( per gene Ϫ4 1.54 # 10 per generation), the effective population size ( ) was N e 13,636, a value comparable to most figures reported in the literature (for a review, see Tishkoff and Verrelli [2003] ). The T MRCA of the CD209 tree was then estimated at MYA, one of the oldest T MRCA val-2.8 ‫ע‬ 0.22 ues estimated so far in the human genome (Excoffier 2002 ). The CD209/CD209L region possesses a number of characteristics that make it a powerful tool for evolutionary inference. These two genes are not in LD, despite their very close physical vicinity (∼15 kb), and each of them behaves as an independent genetic entity. Moreover, our results suggest that the CD209/CD209L region is a uniform landscape of genomic forces, since the two lectincoding genes present similar mutation rates, as well as high nucleotide identity and conserved exon-intron organization ( fig. 1 ). Our diversity study revealed completely different patterns for the two genes. First, levels of nucleotide diversity (p) were found to be much lower for CD209 than for CD209L (table 2) . On the basis of 1.42 million SNPs, the International SNP Map Working Group defined as the average value of nucleotide di-Ϫ4 7.5 # 10 versity for the human genome and showed that 95% of all bins presented p values varying from to Ϫ4 2.0 # 10 (Sachidanandam et al. 2001 ). In addition, Ϫ4 15.8 # 10 an independent study analyzed nucleotide and haplotype diversity for 313 genes and defined the average p value as (Stephens et al. 2001) . In this context, the ) are at least twofold higher than average Ϫ4 16-18 # 10 genome estimates and fall into the upper limit of the 95% CI defined by the SNP Consortium (Sachidanandam et al. 2001) . This contrast in nucleotide diversity between the two genes can be explained either by a disparity in local mutation rates or by actual differences in selective pressures. However, no major differences in mutation rates ( vs. ) were ob- 1.57 # 10 1.70 # 10 served between the two homologues, nor was there substantial variation in GC content, which has been positively correlated with mutation rates and levels of polymorphisms (Sachidanandam et al. 2001; Smith et al. 2002; Waterston et al. 2002; Hellmann et al. 2003) . Indeed, the GC content for CD209 (53.7%) was slightly higher than that observed for CD209L (50.9%), which reinforces the idea that different selective pressures may indeed have been the driving force behind the distinct patterns of diversity observed. Second, the patterns of repeat variation in the neck region also turned out to be strikingly different between the two genes. CD209 showed levels of heterozygosity of only 2%, whereas CD209L presented an extraordinarily high level of worldwide diversity, with an overall heterozygosity of 54% (table 5 and fig. 5 ). Although the neck regions of both genes share 92% of nucleotide identity, nonuniform mutation rates could, again, explain the patterns observed. However, this does not seem to be the case, since mutation-rate variation should influence the number of alleles observed rather than their frequencies, which are subject either to genetic drift or to natural selection. Indeed, we observed an even higher number of repeat alleles for CD209 (eight alleles) than for CD209L (seven alleles) (table 4 and fig. 5 ). Overall, differences in genomic forces seem to be insufficient to explain the contrasting patterns observed at both the sequence and neck-region length variation levels; therefore, the action of differential selective pressures acting on these genes becomes the most plausible scenario. For CD209, not only nucleotide diversity but also F ST intercontinental values (0.15) were in conformity with previous worldwide estimations (Harpending and Rogers 2000; Akey et al. 2002; Cavalli-Sforza and Feldman 2003) . For frequency-spectrum-based tests, only Fay and Wu's H test detected an excess of highly frequently derived alleles for the African and East Asian samples, a picture that may be interpreted as the result of a selective sweep. However, the significantly negative value observed in Africa is, again, exclusively due to the presence of cluster A, since 22 of the 35 fixed SNPs distinguishing it from cluster B corresponded to the derived allelic status in the latter cluster. Because cluster B accounts for 85% of the African variability, a clear excess of frequently derived alleles was observed. The extent to which the presence of this cluster is due to either natural selection or population structure will be discussed in detail below. For East Asia, the significance of the H test is also questionable when accounting for the confounding effects of demography. Indeed, when we plotted our H value against the empirical distribution of 132 H values from non-African populations (Akey et al. 2004) , the East Asian P value became nonsignificant ( ). This observation reinforces the idea that the P p .36 H test is particularly sensitive to past bottlenecks and/ or population subdivision (Przeworski 2002) . Thus, regarding the global levels of sequence diversity, the CD209 locus seems to evolve under evolutionary neutrality. Nevertheless, when we focused our analyses at the protein level, signs of natural selection were uncovered. Indeed, the McDonald-Kreitman test rejected neutrality for this gene because of a clear excess of polymorphic synonymous sites (i.e., a lack of nonsynonymous variants). In addition, when the number of synonymous sites (146) versus nonsynonymous sites (499) was compared with the observed number of synonymous (5) versus nonsynonymous (0) mutations, we detected a significant lack of nonsynonymous mutations (twotailed Fisher exact test, ). These obser-Ϫ4 P p 6.3 # 10 vations point to a strong selective constraint acting on CD209 that prevents the accumulation of amino acid replacements over time. Further support for a functional constraint in CD209 comes from the patterns of diversity observed in the neck region. In contrast to CD209L, virtually no variation was observed at CD209 ( fig. 5A) , with the 7-repeat allele accounting for 99% of the total variability. Moreover, the low levels of heterozygosity observed resulted in a consistent rejection of mutation-drift equilibrium in almost all geographical regions (table 5). The probability of finding such a low heterozygosity value, given the overall number of alleles observed, was estimated to be !0.2%, independent of the mutational model considered (table 5) . Thus, the fact that no alleles other than the 7repeat allele have increased in frequency, together with recent studies addressing the functional consequences of a Populations are grouped as described by Rosenberg et al. (2002) . b AMOVA values are from our CD209L study; 95% CIs are defined from 377 autosomal microsatellites in the same population panel repeat-number variation in this region (Bernhard et al. 2004; Feinberg et al. 2005) , strongly suggests a clear reduced fitness of any allele other than the 7-repeat allele. Interestingly, it has been recently shown that a protein with two fewer repeats (a 5-repeat allele) results in a partial dissociation of the final tetramer, whereas a protein with !5 repeats exhibits a dramatic reduction in overall stability (Feinberg et al. 2005) , with all these differences having a direct impact on the quality of ligand-binding functions (Bernhard et al. 2004) . Taken together, the patterns of diversity observed at CD209 clearly point to a strong functional constraint acting on this gene and further support the proposed crucial role of this lectin in pathogen recognition and in the early steps of immune response (Geijtenbeek et al. 2000b (Geijtenbeek et al. , 2004 . In clear contrast to its homologue, CD209L presented extremely elevated nucleotide-diversity levels. High levels of diversity can result either from a relaxation of the functional constraint, which allows the stochastic accumulation of new mutations, or from the action of balancing selection, which maintains over time two or more functionally different alleles (and all linked variation) at intermediate frequencies. Several lines of evidence lend support to the selective hypothesis. First, if CD209L nucleotide diversity has been driven by the action of balancing selection, population-genetics relationships would have been accordingly altered. In this context, diversity studies in neutral, or assumedly neutral, regions of the genome-such as the Y chromosome (Underhill et al. 2000; Hammer et al. 2001; Jobling and Tyler-Smith 2003) , mtDNA (Wallace et al. 1999; Ingman et al. 2000; Mishmar et al. 2003) , Alu insertions (Watkins et al. 2001) , as well as some autosomal genes (Stephens et al. 2001; Akey et al. 2004 )-showed that African populations are genetically more diverse than are non-Africans, an observation generally interpreted as a support of the "Out of Africa" model for the origin of modern humans (Lewin 1987) . For CD209L, even if we observed 1.5 times more segregating sites in African than in non-African populations, as indicated by the higher value found in Africa, similar values of nucleotide v w diversity were detected in the three groups, with Europeans presenting even higher p values than do Africans. This unusual scenario, which is at odds with neutral expectations, has already been described for other regions of the genome, such as the b-globin gene and the 5 cis-regulatory region of CCR5, for which the action of balancing selection has been convincingly proposed (Harding et al. 1997; Bamshad et al. 2002) . Second, balancing selection tends to increase within-population diversity while decreasing F ST , compared with neutrally evolving loci (Cavalli-Sforza 1966; Harpending and Rogers 2000; Akey et al. 2002; Bamshad and Wooding 2003; Cavalli-Sforza and Feldman 2003) . Indeed, our data are compatible with these predictions, since the 5% F ST value observed for CD209L is threefold lower than that estimated for CD209 (15%) and is similar to that found, for example, for the bitter-taste receptor gene (5.6%), for which there is compelling evidence of balancing-selection action (Wooding et al. 2004) . Third, results of our Tajima's D analysis were significantly positive for European and East Asian populations, because of the skew of CD209L frequency spectrum toward an excess of intermediate-frequency alleles (table 2), a pattern that further supports the action of balancing selection. However, since the null model used to assess significance makes unrealistic assumptions about past population demography (i.e., constant population sizes), the rejection of the standard neutral model cannot be interpreted as unambiguous evidence of selection. Indeed, the observation that only non-African populations showed a significant departure from neutrality raises the question of whether these patterns could have resulted instead from the bottleneck that occurred during the Out of Africa exodus. A way to circumvent this conundrum is to analytically integrate the fact that demography affects all the genome equally, whereas selection directs its effects toward specific loci. Thus, to correct for the confounding effects of demography, we plotted our results against the empirical distributions of Akey et al. (2004) for Tajima's D statistics. Our values remained significant for CD209L, which therefore reinforces the idea that the pattern observed is unlikely to be the sole result of demography. Last, if the patterns of variation in CD209L represent the molecular signature of balancing selection, at least in non-Africans, then a functional target of such selective regime is needed. In this context, the neck region constitutes an excellent candidate, since it plays a major mediating role in the orientation and flexibility of the carbohydrate-recognition domain. Since this domain is directly involved in pathogen recognition, neck-region length variation has important consequences for the pathogen-binding properties of these lectins (Mitchell et al. 2001; Bernhard et al. 2004; Feinberg et al. 2005) . In perfect agreement with the results of our sequence-based data set, higher diversity in repeat variation was observed in the neck region among non-African populations (Native Americans excepted). Out of Africa, at least three alleles account for most population diversity, whereas, in Africa, the 6-and 7-repeat alleles alone account for 96% of the global variability ( fig. 5B ). Again, the higher diversity observed out of Africa could be due to a higher level of relaxation of the functional constraint of the neck region in non-African compared with African populations, which would lead to a random accumulation of proteins with varying neck-region lengths among non-Africans. Conversely, these patterns could also be explained by the action of balancing selection in non-Africans and could therefore point to the neck region as the functional target of such selective regime. To evaluate the plausibility of these two conflicting scenarios, we compared the variation in the CD209L neck region with that inferred from 377 neutral autosomal microsatellites typed elsewhere for the same population panel . We reasoned that if CD209L diversity has been shaped only by demography (i.e., bottleneck out of Africa), the distribution of genetic variance at different hierarchical levels should be comparable to that inferred through the neutral markers. On the other hand, if selection has driven the CD209L neckregion diversity, population-genetics distances would be influenced accordingly and would therefore differ from neutral expectations. Indeed, the AMOVA values inferred for CD209L fell systematically outside the 95% CI defined for the microsatellite data set (table 6). We observed that populations within Europe, Asia, the Middle East, and Oceania exhibited lower-than-expected diversity among populations within the same region. A reduction of genetic distances between populations is expected under balancing selection; therefore, the results from the CD209L neck region favor, once again, the action of this selective regime in most non-African populations, in detriment of the neutral hypothesis. One may argue that the differences in the proportions of genetic variance between our data and those of could be due to differences in the pace of mutation between microsatellite loci and our neck repeated region that could be considered a "coding minisatellite." However, under neutrality, differences in mutation rate should have a similar and proportional effect in all population comparisons and should influence all values with a similar tendency (i.e., higher or lower values). Indeed, this is not the case: populations within Europe, the Middle East, Central/South Asia, East Asia, and Oceania turned out to be genetically closer than expected, whereas populations within Africa and the Americas exhibited the opposite pattern (table 6) , which makes it highly unlikely that mutation-rate differences influenced our conclusions. Taken together, the integration of the results from levels of nucleotide and amino acid diversity, neutrality tests, population-genetics distances, and neck-region length variation in CD209 and CD209L clearly points to a situation in which CD209 has been under a strong selective constraint that prevents accumulation of any of amino acid changes over time, whereas CD209L variability has most likely been driven by the action of balancing selection, at least in non-African populations. In apparent dichotomy with the strong selective constraint described for CD209, we observed an unusual excess of diversity of 35 fixed differences separating the two basal branches of the gene tree ( fig. 6 ). In addition, we estimated a T MRCA of MYA, a time that 2.8 ‫ע‬ 0.22 places the most recent common ancestor of CD209 back in the Pliocene epoch, before the estimated time for the origins of the genus Homo ∼1.9 MYA (Wood 1996; Wood and Collard 1999) . A number of studies have already reported loci that present unusually deep coalescent times (Harris and Hey 1999; Zhao et al. 2000; Webster et al. 2003; Garrigan et al. 2005a Garrigan et al. , 2005b , but our estimation for CD209 remains one of the deepest T MRCA values yet reported (Excoffier 2002) . The probability of finding such a deep coalescence time under a scenario of a random-mating population was estimated, through a coalescent process (Laval and Excoffier 2004) , to be very low ( ) (see fig. 7 ). In addition to the P p .018 unexpected antiquity of the CD209 locus, we observed a peculiar tree topology made of two highly divergent and frequency-unbalanced lineages, cluster A embracing only 2 internal haplotypes and cluster B comprising the remaining 23 ( fig. 6 ). Different hypotheses can account for such elongated and divergent haplotype patterns. Indeed, the high levels of nucleotide identity between CD209 and CD209L could have led to gene conversion between the two genes, an event that would explain the outlier position of cluster A in the context of CD209 phylogeny. We reasoned that if gene conversion has occurred, we expect that the derived alleles distinguishing clusters A and B in CD209 would correspond to the allelic state observed in their homologous positions in CD209L. Of all positions, only four fit this criterion. In addition, these positions were not physically clustered, which therefore excludes a major gene-conversion event as the explanation of the divergent CD209 phylogeny. Two other circumstances may be responsible for the topology and the time depth of the CD209 gene tree: long-standing balancing selection or ancient population structure, with Africa, in both cases, being the arena of such events (i.e., cluster A is restricted to Africa). Several lines of evidence argue against the balancing-selection hypothesis. First, under this selective regime, one would expect that Tajima's D test would also point in this direction by yielding significantly positive values, which is not the case (table 2) . Second, such a long-standing balancing selection in Africa would have entailed a number of recombinant haplotypes between clusters A and B, which, again, is not the case, as illustrated by the high LD levels at CD209 (fig. 3) . Third, a claim of balancing selection at this locus must imply a functional difference between the two balanced alleles. Indeed, three nonsynonymous mutations, situated in the neck region, separate cluster A and B, and they could correspond to the alleles under selection. But, if the neck region is the target of selection, it is more likely that the balanced alleles would correspond to different numbers of repeats rather than punctual nucleotide variation within each track, as observed for CD209L and suggested by functional studies (Bernhard et al. 2004; Feinberg et al. 2005) . Since no variation in the number of repeats was detected between both clusters, we predict that there are no major functional differences between the two lineages. Taken together, maintenance of ancient lineages by balancing selection does not seem to be responsible for the observed haplotype divergence. In this view, the patterns observed are best explained by an ancestral population structure on the African continent. Indeed, several studies have already proposed that African populations must have been more strongly subdivided and isolated than non-African ones (Harris and Hey 1999; Labuda et al. 2000; Excoffier 2002; Goldstein and Chikhi 2002; Harding and McVean 2004; Satta and Takahata 2004; Garrigan et al. 2005a) . In particular, a recent study of the Xp21.1 locus presented convincing statistical evidence that supports the hypothesis that our species does not descend from a single, historically panmictic population (Garrigan et al. 2005a ). The divergent haplotype pattern observed at the Xp21.1 locus prompted those authors to explain their data under the isolation-and-admixture (IAA) model and/or a metapopulation model (Harding and McVean 2004; Wakeley 2004) . Indeed, as observed for CD209, under an IAA model, the two basal branches are expected to be longer than those under a Wright-Fisher model, depending on the length of time subpopulations spent in isolation. The extent to which the IAA model fits the data depends on the number of mutations, referred as to "congruent sites," occurring in the two basal branches of the genealogy. For Xp21.1, 10 congruent sites over 24 polymorphisms were observed (i.e., ∼42% of the total number of sites). We applied the same approach to CD209 and obtained a very similar percentage of ∼45%, in good accordance with the IAA model. Our observations, together with a number of autosomal diversity studies, show that modern human diversity appears to have kept genetic traces of admixture among archaic hominid populations. However, a number of questions remain unanswered, such as the time when these admixture events occurred (i.e., before or after the appearance of anatomically modern humans), the precise quantitative contribution of ancient genetic material to our modern gene pool, and the geographic provenance of these genetic vestiges. The need of continuous evolution for both the human host and the pathogens is predicted by the Red Queen hypothesis (Van Valen 1973; Bell 1982) , in reference to the remark of the Red Queen to Alice in Through the Looking Glass (Carroll 1872) : "Now, here, you see, it takes all the running you can do, to keep in the same place." This metaphor provides a conceptual framework for understanding how interactions between the two species lead to constant natural selection for adaptation and counteradaptation. In this context, one feature exploited by the host immunity genes to increase their defense potential is gene duplication by retention, through conservation of one duplicate, of the currently useful function of the encoded protein, while its twin is liberated to mutate and possibly acquire novel functions (Ohno 1970; Trowsdale and Parham 2004) . The lectins CD209 and CD209L represent a prototypic model of a duplicated progeny of ancestral genes that interact with a vast spectrum of pathogens. Our results clearly indicate that these duplicated genes have evolved, and might still evolve, under completely different evolutionary pressures. Whereas one, CD209, shows signals of strong conservation, its paralogue, CD209L, exhibits an excess of sequence diversity compatible with the action of balancing selection. In addition, the strong contrast observed in length variation of the neck region between the two genes may have important consequences in medical genetics. In this context, association studies are now needed that correlate length variation of the neck region and susceptibility to infectious diseases whose etiological agents are known to interact with one (or both) of these lectins. More generally, our study has revealed that even a short segment of the human genome can help uncover an extraordinarily complex evolutionary history, including different pathogen pressures on host immunity genes, as well as traces of ancient population structure in the African continent. The coming years will certainly bring unprecedented large data sets of sequence diversity, genomewide and populationwide, with each genomic region possibly revealing a different aspect of human history. The integration of all these apparently independent pieces of the same reality will provide us with a much broader and more realistic view of the demographic history of the human species, as well as of human adaptation to the different environmental conditions imposed not only by pathogens but also by other major factors such as climate and nutritional resources. GOLD-graphical overview of linkage disequilibrium Population history and natural selection shape patterns of genetic variation in 132 genes Interrogating a high-density SNP map for signatures of natural selection C-type lectins DC-SIGN and L-SIGN mediate cellular entry by Ebola virus in cis and in trans Cutting edge: carbohydrate profiling identifies new pathogens that interact with dendritic cell-specific ICAM-3-grabbing nonintegrin on dendritic cells Signatures of natural selection in the human genome A strong signature of balancing selection in the 5 cis-regulatory region of CCR5 A dendritic cell-specific intercellular adhesion molecule 3-grabbing nonintegrin (DC-SIGN)-related protein is highly expressed on human liver sinusoidal endothelial cells and promotes HIV-1 infection Novel member of the CD209 (DC-SIGN) gene family in primates Helicobacter pylori modulates the T helper cell 1/T helper cell 2 balance through phase-variable interaction between lipopolysaccharide and DC-SIGN Proteomic analysis of DC-SIGN on dendritic cells detects tetramers required for ligand binding but no association with CD4 A human genome diversity cell line panel Through the looking glass. Macmillan, London Cavalli-Sforza LL (1966) Population structure and human evolution The application of molecular genetic approaches to the study of human evolution Dendritic cell (DC)-specific intercellular adhesion molecule 3 (ICAM-3)-grabbing nonintegrin (DC-SIGN, CD209), a C-type surface lectin in human DCs, is a receptor for Leishmania amastigotes Toll-like receptors and the genetics of innate immunity Genetics of susceptibility to human infectious disease Description and power analysis of two tests for detecting recent population bottlenecks from allele frequency data Sequence and expression of a membrane-associated C-type lectin that exhibits CD4-independent binding of human immunodeficiency virus envelope glycoprotein gp120 Human demographic history: refining the recent African origin model Hitchhiking under positive Darwinian selection Extended neck regions stabilize tetramers of the receptors DC-SIGN and DC-SIGNR The population genetics of the haemoglobinopathies The lectin-complement pathway-its role in innate immunity and evolution L-SIGN (CD 209L) is a liverspecific capture receptor for hepatitis C virus Deep haplotype divergence and long-range linkage disequilibrium at Xp21.1 provide evidence that humans descend from a structured ancestral population Evidence for archaic Asian ancestry on the human X chromosome DC-SIGN, a dendritic cell-specific HIV-1-binding protein that enhances trans-infection of T cells Identification of DC-SIGN, a novel dendritic cell-specific ICAM-3 receptor that supports primary immune responses Self-and nonself-recognition by C-type lectins on dendritic cells Mycobacteria target DC-SIGN to suppress dendritic cell function Evidence for positive selection and population structure at the human MAO-A gene Human migrations and population structure: what we know and why it matters Sampling theory for neutral alleles in a varying environment Human cytomegalovirus binding to DC-SIGN is required for dendritic cell infection and target cell trans-infection Disease and evolution Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus Hierarchical patterns of global human Y-chromosome diversity Archaic African and Asian lineages in the genetic ancestry of modern humans A structured ancestral population for the evolution of modern humans Genetic perspectives on human origins and differentiation X chromosome evidence for ancient human histories A neutral explanation for the correlation of diversity with recombination rates in humans Natural selection at the class II major histocompatibility complex loci of mammals Mitochondrial genome variation and the origin of modern humans Innate immune recognition CD209L (L-SIGN) is a receptor for severe acute respiratory syndrome coronavirus The human Y chromosome: an evolutionary marker comes of age The evolution and genetics of innate immunity Evolutionary rate at the molecular level Archaic lineages in the history of modern humans SIMCOAL 2.0: a program to simulate genomic diversity over large recombining regions in a subdivided population with a complex history Haldane (1949) on infectious disease and evolution Africa: cradle of modern humans The interaction of selection and linkage. II. Optimum models DC-SIGN and L-SIGN are high affinity binding receptors for hepatitis C virus glycoprotein E2 DC-SIGN and DC-SIGNR interact with the glycoprotein of Marburg virus and the S protein of severe acute respiratory syndrome coronavirus Adaptive protein evolution at the Adh locus in Drosophila Divergent roles for C-type lectins expressed by cells of the innate immune system Toll-like receptors and innate immunity Decoding the patterns of self and nonself by the innate immune system Natural selection shaped regional mtDNA variation in humans A novel mechanism of carbohydrate recognition by the C-type lectins DC-SIGN and DC-SIGNR: subunit organization and binding to multivalent ligands Evolution by gene duplication Role of diversifying selection and gene conversion in evolution of major histocompatibility complex loci Evolution of vertebrate immunoglobulin variable gene segments DC-SIGNR, a DC-SIGN homologue expressed in endothelial cells, binds to human and simian immunodeficiency viruses and activates infection in trans The signature of positive selection at randomly chosen loci Linkage disequilibrium in the human genome Genetic structure of human populations DnaSP, DNA polymorphism analyses by the coalescent and other methods Detecting recent positive selection in the human genome from haplotype structure A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms The distribution of the ancestral haplotype in finite stepping-stone models with population expansion Deterministic mutation rate variation in the human genome DC-SIGN (dendritic cell-specific ICAMgrabbing non-integrin) and DC-SIGN-related (DC-SIGNR): friend or foe? DC-SIGN; a related gene, DC-SIGNR; and CD23 form a cluster on 19p13 Placental expression of DC-SIGN may mediate intrauterine vertical transmission of HIV Constitutive and induced expression of DC-SIGN on dendritic cell and macrophage subpopulations in situ and in vitro Haplotype variation and linkage disequilibrium in 313 human genes A comparison of Bayesian methods for haplotype reconstruction from population genotype data DC-SIGN is the major Mycobacterium tuberculosis receptor on human dendritic cells Statistical method for testing the neutral mutation hypothesis by DNA polymorphism Automated identification of single nucleotide polymorphisms from sequencing data DC-SIGN (CD209) mediates dengue virus infection of human dendritic cells Haplotype diversity and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial resistance Patterns of human genetic diversity: implications for human evolutionary history and disease Mini-review: defense strategies and immunity-related genes Y chromosome sequence variation and the history of human populations Positive selection on the human genome Van Die I (2004) Molecular basis of the differences in binding properties of the highly related C-type lectins DC-SIGN and L-SIGN to Lewis X trisaccharide and Schistosoma mansoni egg antigens A new evolutionary law Evidence for balancing selection from nucleotide sequence analyses of human G6PD Metapopulation models for historical inference Mitochondrial DNA variation in human evolution and disease Initial sequencing and comparative analysis of the mouse genome Patterns of ancestral human diversity: an analysis of Aluinsertion and restriction-site polymorphisms On the number of segregating sites in genetical models without recombination Common 5 bglobin RFLP haplotypes harbour a surprising level of ancestral sequence mosaicism Human evolution The human genus Natural selection and molecular evolution in PTC, a bitter-taste receptor gene Evolution in Mendelian populations Worldwide DNA sequence variation in a 10-kilobase noncoding region on human chromosome 22 We warmly acknowledge Guillaume Laval for useful suggestions on the use of SIMCOAL software, Laurent Excoffier and Francesca Luca for stimulating discussions, and two reviewers for constructive comments on the first version of the manuscript. L.B.B. was supported by Fundaçã o para a Ciência e a Tecnologia fellowship SFRH/BD/18580/2004. The URLs for data presented herein are as follows: