key: cord-0948457-8h848lsx authors: Zeberg, Hugo; Pääbo, Svante title: The MERS-CoV receptor gene is among COVID-19 risk factors inherited from Neandertals date: 2020-12-12 journal: bioRxiv DOI: 10.1101/2020.12.11.422139 sha: 7f74c0fe8ee64563e7ed88f9d319f50eba489d53 doc_id: 948457 cord_uid: 8h848lsx In the current SARS-CoV-2 pandemic, two genetic regions derived from Neandertals have been shown to increase and decrease, respectively, the risk of falling severely ill upon infection. Here, we show that 2-8% of people in Eurasia carry a variant promoter region of the DPP4 gene inherited from Neandertals. This gene encodes an enzyme that serves as a receptor for the coronavirus MERS-CoV and is currently not believed to be a receptor for SARS-CoV-2. However, the Neandertal DPP4 variant doubles the risk to become critically ill in COVID-19. 3 2020). We find that under the rare disease assumption, the Neandertal-like alleles are associated with ~80% increased risk per allele of being hospitalized upon infection with SARS-CoV-2 (Supplementary Table S1 ). The most strongly associated SNP (rs117888248) has an odds ratio of 1.84 (95% CI: 1.41-2.41, p = 7.7e-6). The risk for carriers of this allele of requiring mechanical ventilation is increased by ~109% (OR = 2.09, 95% CI: 1.44-3.03, p = 1.2e-4). The Neandertal-like alleles form a ~26.3 kb-haplotype (r 2 >0.8). Of the 15 SNP defining the haplotype (Supplementary Table S1 ), 14 carry alleles seen in hetero-or homozygous forms in a Neandertal genome (Prüfer et al. 2017 ). This haplotype is derived from Neandertals (p = 0.023) according to a published formula (Huerta-Sanchez et al. 2014 ) and using parameters as previously described ( 1B ). Both these risk haplotypes have stronger effect sizes than the protective Neandertal haplotype on chromosome 12, which decreases the risk of becoming severely ill by ~23% (Zeberg and Pääbo 2020b). The Neandertal DPP4 haplotype is present in ~1% of Europeans, ~2.5% in South Asians ~4% in East Asians, and ~0.7% in admixed Americans (Fig. 1C) . It is absent among Africans south of the Sahara. We calculated the statistical significance of the association between the Neandertal DPP4 haplotype and severe COVID-19 under the null-hypothesis that Neandertal haplotypes have no impact on COVID-19. Because only a fraction of the Neandertal genome is found among presentday humans, and because Neandertal haplotypes are on average longer than other haplotypes, the statistical power to detect associations with Neandertal haplotypes is better than for genome-wide analyses. When we consider Neandertal haplotypes that are present in a frequency of >1% among Europeans in the 1000 Genomes Project and are identified in previously published maps of Neandertal contributions, we find that the effective number of hypotheses is 5,761, yielding an 'introgression wide' significance threshold of 8.7e-6 (Supplementary Material). Thus, under the nullhypothesis that Neandertal gene variants has no impact on COVID-19, the association of the DPP4 haplotype with severe disease is significant. It was recently shown that the spike protein of SARS-CoV-2 binds to DPP4 (Li et al. The combination of large effect sizes and small number of Neanderal loci (and correspondingly smaller number of the multiple tests requiring correction) may allow associations with infection disease susceptibility to be detected in smaller cohorts than if all variants across the genome are considered. For the DPP4 locus, we estimate that approximately two times more patients than currently available in HGI will be needed to achieve a 80% probability to detect the association between DPP4 and severe COVID-19 with the standard genome-wide significance threshold (p<5e- The three Neandertal genomes available to date, which vary in age between ~120,000 years and ~50,000 years and come from Europe and southern Siberia, are all homozygous for the risk variants on chromosome 2. Furthermore, the late Neandertal genome in Europe, which is most closely related to the Neandertals that mixed with modern humans, was homozygous also for the risk variants To exclude Neandertal-like variants due to incomplete lineage sorting, we further required the resulting haplotypes to have a length of at least 10 kb. Using these criteria, we identify 40,055 SNPs. We use these SNPs to estimate the effective number of hypotheses in European genomes from the 1000 Genomes Project using the Genetic Type I Error Calculator (Li et al. 2011 ). This yields a significance threshold of 8.7e-6 and a suggestive threshold of 1.7e-4, for a Neandertal "introgression-wide association study" ("IWAS"). Sample size needed to detect the DPP4 haplotype As shown above, the power to detect a variant is improved if only introgressed Neandertal haplotypes are considered, although under different null-hypotheses. We calculated the sample size needed to achieve genome-wide significance (p<5e-8), using standard techniques (Pirinen et al. 2020 ). If there is a non-zero effect, i.e., β≠0, then the z-score is distributed as z∼N(β/SE,1) and z 2 ∼χ 2 1((β/SE) 2 ). The parameter (β/SE) 2 is known as the non-centrality parameter and scales linearly with sample size. The 7 non-central chi-squared distribution was used to calculated the probability of observing a sufficiently strong association, i.e. statistical power. To reach 80% power to detect the Neandertal DPP4 haplotype we find that we need approximately twice the sample size. For 99% detection probability the sample size needs to tripple. The archaic genomes are availability at the server of the Max Planck Institute for Evolutionary Anthropology (http://cdna.eva.mpg.de/neandertal/Vindija/VCF/ and https://bioinf.eva.mpg.de /jbrowse/) and the modern human genomes at the 1000 Genomes Project server Table Supplementary Table S1 . SNPs in linkage disequilibrium with rs117888248 and the corresponding Neandertal alleles. LD data from the 1000 Genomes Project, "Vindija" refers to a Croatian Neandertal genome (https://bioinf.eva.mpg.de/jbrowse/). A global reference for human genetic variation Identifying and Interpreting Apparent Neanderthal Ancestry in African Individuals Middle East respiratory syndrome coronavirus (MERS-CoV): announcement of the Coronavirus Study Group SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Altitude adaptation in Tibetans caused by introgression of Denisovanlike DNA Ancient gene flow from early modern humans into Eastern Neanderthals Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets The MERS-CoV Receptor DPP4 as a Candidate Binding Target of the SARS-CoV-2 Spike. iScience COVID-19 and diabetes mellitus: from pathophysiology to clinical management A high-coverage Neandertal genome from Chagyrskaya Cave Dipeptidyl-peptidase IV (CD26)-role in the inactivation of regulatory peptides Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins Genetic mechanisms of critical illness in Covid-19. medRxiv The evolutionary history of Neanderthal and Denisovan Y chromosomes Deeply divergent archaic mitochondrial genome provides lower time boundary for African gene flow into Neanderthals The complete genome sequence of a Neanderthal from the Altai Mountains A high-coverage Neandertal genome from Vindija Cave in Croatia The Genomics of Human Local Adaptation The date of interbreeding between Neandertals and modern humans DPP-4 inhibition and COVID-19: From initial concerns to recent expectations The nature of Neanderthal introgression revealed by 27,566 Icelandic genomes Dipeptidyl peptidase 4 is a functional receptor for the emerging human coronavirus-EMC Model-based detection and analysis of introgressed Neanderthal ancestry in modern humans The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic Emerging WuHan (COVID-19) coronavirus: glycan shield and structure prediction of spike glycoprotein and its interaction with human CD26 Mathematical properties of the r 2 measure of linkage disequilibrium Resurrecting surviving Neandertal lineages from modern human genomes Structure of MERS-CoV spike receptor-binding domain complexed with human receptor DPP4 The major genetic risk factor for severe COVID-19 is inherited from Neanderthals A genetic variant protective against severe COVID-19 is inherited from Neandertals A pneumonia outbreak associated with a new coronavirus of probable bat origin We are indebted to the COVID-19 Host Genetics Initiative (HGI) for making the GWAS data available, to Tomislav Maricic for valuable input, and to the Max Planck Society and the NOMIS Foundation for funding.