key: cord-1035109-bd5mdgmf authors: Roberts, G. H. L.; Park, D. S.; Coignet, M. V.; McCurdy, S. R.; Knight, S. C.; Partha, R.; Rhead, B.; Zhang, M.; Berkowitz, N.; Science Team, A.; Haug Baltzell, A. K.; Guturu, H.; Girshick, A. R.; Rand, K. A.; Hong, E. L.; Ball, C. A. title: AncestryDNA COVID-19 Host Genetic Study Identifies Three Novel Loci date: 2020-10-09 journal: nan DOI: 10.1101/2020.10.06.20205864 sha: 6b38db793fd1596b7a5e617e56202378fd547b1b doc_id: 1035109 cord_uid: bd5mdgmf Human infection with SARS-CoV-2, the causative agent of COVID-19, leads to a remarkably diverse spectrum of outcomes, ranging from asymptomatic to fatal. Recent reports suggest that both clinical and genetic risk factors may contribute to COVID-19 susceptibility and severity. To investigate genetic risk factors, we collected over 500,000 COVID-19 survey responses between April and May 2020 with accompanying genetic data from the AncestryDNA database. We conducted sex-stratified and meta-analyzed genome-wide association studies (GWAS) for COVID-19 susceptibility (positive nasopharyngeal swab test, ncases=2,407) and severity (hospitalization, ncases=250). The severity GWAS replicated associations with severe COVID-19 near ABO and SLC6A20 (P<0.05). Furthermore, we identified three novel loci with P<5x10-8. The strongest association was near IVNS1ABP, a gene involved in influenza virus replication, and was associated only in males. The other two novel loci harbor genes with established roles in viral replication or immunity: SRRM1 and the immunoglobulin lambda locus. We thus present new evidence that host genetic variation likely contributes to COVID-19 outcomes and demonstrate the value of large-scale, self-reported data as a mechanism to rapidly address a health crisis. Novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of 27 COVID-19, precipitated a pandemic with >21 million cases and >760,000 deaths worldwide as of 28 August 2020. 2 Outcomes of SARS-CoV-2 infection in the United States are diverse; most infections 29 result in mild illness that can be managed at home, yet ~14% of cases are hospitalized and ~5% are 30 fatal. 3 Epidemiological studies have identified clinical risk factors for severe COVID-19 that include 31 common health conditions such as hypertension, diabetes, obesity, older age, and male sex. 4,5 Reports of 32 higher susceptibility to 6,7,8 and severity of 9 SARS-CoV infections in men could suggest important 33 biological differences in immune response to SARS-CoV-2 in men relative to women. 10 34 35 In addition to clinical risk factors, emerging evidence suggests that host genetic variation may contribute 36 to COVID-19 susceptibility and severity. Ellinghaus et al. conducted a genome-wide association study 37 (GWAS) of COVID-19 cases with respiratory failure and identified two loci that achieved genome-wide 38 significance: one signal on chromosome (chr) 9 near the ABO gene, which determines blood type, and 39 one signal on chr 3 near a cluster of genes with known immune function including SLC6A20, CXCR6, 40 CCR1, CCR2, and CCR9. 11 Additionally, a small whole-exome sequencing study identified TLR7, an 41 X-chromosome gene involved in interferon signal induction, in four male patients with severe 42 To validate rapidly emerging results from new studies such as the Ellinghaus study, 43 further investigation in independent datasets is needed. Furthermore, investigation in larger datasets with 44 increased statistical power may detect additional, novel host genetic variation relevant to susceptibility and severity. 46 To replicate and discover non-genetic 6 and genetic associations with COVID-19 outcomes, we engaged 48 AncestryDNA members who have consented to research in the United States, with 18 million total 49 individuals in the global network. On April 22, 2020, we released a 54-question COVID-19 survey 50 intended to assess exposure, risk factors, symptomatology, and demographic information that had been 51 previously identified as associated with COVID-19 susceptibility and severity in the evolving pandemic. 52 In under two months, over 500,000 COVID-19 survey responses were collected with a 95% survey 53 completion rate. 54 55 From these self-reported outcomes, we constructed two phenotypes: one intended to assess 56 susceptibility, in which individuals who reported a positive COVID-19 test were compared to those who 57 reported a negative test (referred to throughout as "susceptibility") and one intended to assess severity, 58 in which who were hospitalized with COVID-19 were compared to COVID-19 positive individuals who 59 were not hospitalized (referred to throughout as "hospitalization"). To identify novel genetic 60 determinants of these outcomes, we conducted a GWAS for each phenotype in a cohort of European-61 descent individuals. Sex-stratified GWAS were performed to investigate potential biological sex-driven 62 differences in immune response to SARS-CoV-2 infection, and the GWAS results were meta-analyzed 63 to maximize statistical power for variant discovery. 64 65 To validate the COVID-19 survey, we examined how representative our cohort is with respect to the 68 Table 1 ). The first survey question assesses testing status, and in total, 3,733, or 13.4% of respondents 72 who were tested, reported a positive test result (Supplementary Table 2 ) -a proportion comparable to 73 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 9, 2020. hospitalization assess aspects of COVID-19 infection severity. As summarized in Table 2 , we observed 105 nominal replication of the lead SNPs at these two loci in our hospitalization analysis at both the ABO 106 locus (P=0.022) and the SLC6A20 locus (P=0.020). For both loci, consistent risk alleles and directions 107 of effect were observed, but with generally smaller odds ratios (ORs) than those reported by Ellinghaus 108 and colleagues. We did not observe significant associations at either locus in the susceptibility analysis 109 (Supplementary Table 4) . 110 111 We conducted a sex-stratified GWAS adjusted for orthogonal age, orthogonal age 2 , array version, and 113 PC1-12 (Supplementary Table 5) , to investigate possible differences in genetic associations with 114 COVID-19 outcomes in males and females and meta-analyzed the resulting summary statistics to 115 maximize statistical power. For the susceptibility phenotype, the male GWAS, female GWAS, and 116 meta-analysis had genomic inflation factors of 1.00, 1.01, and 1.00, respectively (Supplementary 117 Figure 3 ). The hospitalization GWAS had genomic inflation factors of 1.04, 1.01, and 0.99 for males, 118 females, and the meta-analysis, respectively (Supplementary Figure 4) . In total, three novel loci 119 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint surpassed the genome-wide significance threshold of 5x10 -8 in at least one study: two separate loci on 120 chr 1 and one locus on chr 22 (Table 3 and Figure 1, Supplementary Figures 5 and 6) . 121 In the susceptibility analysis, the most significant association was represented by lead SNP rs6668622, 123 with the association present in males only (P=3.28x10 -9 ; OR=0.69) (Figure 1c-d) and absent in females 124 (P=0.37; OR=0.96) (Figure 1e-f ). In the sex-combined meta-analysis, the rs6668622 association did not 125 reach genome-wide significance (P=3.83x10 -5 ; OR=0.87). Consistent with the differential association 126 observed in men and women, there is significant heterogeneity of effect (I 2 ) for rs6668622 between the 127 male and the female studies (I 2 =94; P=1.6x10 -5 ; Supplementary Table 6 ). This signal is intergenic and 128 the nearest protein-coding genes to rs6668622 are IVNS1ABP (~150Kb) and SWT1 (~288Kb) 129 Figure 7) . In the hospitalization meta-analysis, a locus on chr 1 surpassed genome-wide significance. The lead 140 SNP, rs111972040, is uncommon with a MAF of approximately 1%, but with a large estimated effect 141 size in the hospitalization meta-analysis (P=8.38x10 -9 ; OR=8.29). Although rs111972040 did not 142 achieve genome-wide significance in either sex-stratified GWAS, the estimated ORs in the male 143 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint (P=3.46x10 -3 ; OR=6.50) and female (P=8.01x10 -7 ; OR=9.37) studies were large and there was no 144 significant heterogeneity of effect between males and females (I 2 =0; P=0.63; Supplementary Table 6) . 145 The variant rs111972040 is a non-coding transcript variant in the gene SRRM1, and NCMAP, CLIC4, 146 RCAN3, NIPAL3, and RUNX3 are all within 500kb (Supplementary Figure 9) . 147 148 To assess whether clinical risk factors other than age and sex had an effect on these associated loci, we 149 additionally adjusted for other associated risk factors including body mass index (BMI) and having one 150 or more pre-existing health conditions (asthma, bone marrow transplant, cancer, cardiovascular disease, 151 kidney disease, chronic obstructive pulmonary disease (COPD), diabetes, hypertension, organ 152 transplant, autoimmune disease, immunodeficiency, or lung conditions). For all three novel lead SNPs, 153 the estimated effect sizes remained relatively consistent, though the P-value dropped below 154 genome-wide significance in the hospitalization analysis, likely in part due to the small sample size and 155 the additional decrease in sample size for these extended analyses (Supplementary Table 7 ). Both loci 156 associated with susceptibility remained genome-wide significant even after adjusting for these additional 157 clinical risk factors. 158 159 We estimated narrow-sense heritability (h 2 ) -defined as the proportion of phenotypic variation due to 161 additive genetic factors 17 -using the linear mixed model approach (GCTA-GREML) 18 and autosomal 162 genome-wide imputed variants. The underlying assumption of this linear mixed model approach is that 163 the included common variants all contribute equally small effects to heritability. 19 164 165 To our knowledge, these are the first estimates of heritability for COVID-19 susceptibility and severity 166 (hospitalization) based on empirical genetic similarity, though one twin-based study estimated 167 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint heritability of COVID-19 symptoms. 20 As shown in Supplementary To identify genetic determinants of COVID-19 susceptibility and severity, we conducted GWAS of self-174 reported COVID-19 outcomes in a population of survey respondents with European ancestry. To explore 175 possible differences in biological response to SARS-CoV-2 infection, we analyzed both susceptibility 176 and severity outcomes via sex-stratified GWAS and sex-combined meta-analyses. In total, three novel 177 loci surpassed genome-wide significance in one or more study, with lead SNPs rs6668622 (male 178 susceptibility P=3.2x10 -9 ), rs73166864 (susceptibility P=1.56x10 -8 ), and rs111972040 (hospitalization 179 P=8.38x10 -9 ) near IVNS1ABP, the immunoglobulin lambda locus, and SRRM1, respectively. 180 The most significantly associated SNP in any study was rs6668622 with the susceptibility outcome. The 182 nearest gene to rs6668622 is IVNS1ABP, which encodes Influenza Virus NS1A Binding Protein. This 183 protein is known to bind with influenza virulence factor NS1 and this interaction appears to promote 184 influenza viral gene expression. 21 The variant rs6668622 is a known, strong expression quantitative trait 185 locus (eQTL) in lung tissue for IVNS1ABP 22, 23 , suggesting that risk variation might impact mRNA 186 abundance of IVNS1ABP. Strikingly, haploinsufficiency of IVNS1ABP appears to associate with primary 187 immunodeficiency 24 , suggesting IVNS1ABP may play a role in cellular response to other pathogens 188 besides influenza. It is unclear why this association is only present only in males, though it may provide 189 a clue as to why males appear to be at higher risk of COVID-19 infection, hospitalization, and 190 mortality. 7-12 We speculate that sex hormones or behavioral differences might trigger generally different 191 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint cellular responses to SARS-Cov-2 infection in men and in women 10 , and one such difference may 192 involve differential expression of IVNS1ABP. 193 194 Another locus, represented by the intergenic lead SNP rs73166864, was associated with the 195 susceptibility outcome with similar effects in men and women. This signal is ~75Kb away from the 196 immunoglobulin lambda locus, a region that undergoes somatic recombination in B-cells and encodes 197 proteins used to construct the antigen-binding light chain region of antibodies. 16 It is unclear what the 198 functional consequence of this intergenic variation might be, but proximity to such an important region 199 for antibody generation is intriguing. In addition to identifying novel associations, the severity GWAS replicated findings from a previous 208 COVID-19 respiratory failure GWAS 11 that identified two loci: the blood type ABO gene and a cluster 209 of immune genes near SLC6A20. We observed consistent directions of effect at both loci, but the 210 replication P-values were nominal, and we observed smaller estimated ORs; however, these 211 observations are not surprising. The original study considered a more severe outcome (respiratory 212 failure) than our analogous severity study (hospitalization). Furthermore, we included only 250 cases 213 that were hospitalized in our study relative to the 1,980 cases with respiratory failure in the Ellinghaus 214 study 11 , thus the nominal P-values in this study may simply reflect lower statistical power for our 215 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint severity analysis. Lastly, the winner's curse suggests that overestimated effect sizes are expected in the 216 discovery study. 26 We estimated heritability for both outcomes to assess the contribution of common genetic variation to 219 COVID-19 susceptibility and severity. The liability heritability estimates were generally small: h 2 =0.00 220 (SE=0.07) for susceptibility and h 2 =0.14 (SE=0.58) for hospitalization. However, for hospitalization, the 221 sample sizes were small for this type of analysis and standard errors were correspondingly high. A key 222 assumption underlying the heritability estimates is that many small effects were distributed across the 223 entire genome 19 . Thus, the low heritability estimates might not simply reflect low genetic contribution, 224 but rather could suggest a genetic architecture involving a limited number of high-effect variants or a 225 large contribution by uncommon and rare variants. 226 A key limitation of our data is that COVID-19 cases who suffered very severe or fatal infections are less 228 likely to participate in our survey, which consequently results in undersampling cases with severe 229 outcomes. We also restricted to individuals of European ancestry due to small sample sizes in other 230 genetic ancestry cohorts for the susceptibility and severity outcomes in this early phase of COVID-19 231 survey data collection. As the COVID-19 survey cohort grows, future analyses will focus on increased 232 ancestral diversity to increase generalizability. Finally, we lack an independent replication cohort for our 233 novel findings and will rely on future ascertainment of additional survey respondents and COVID-19 234 GWAS consortia 27 efforts to determine whether our findings are reproducible. 235 In summary, we collected over 500,000 self-reported COVID-19 outcomes in under two months and 237 conducted one of the largest genetic studies of infection susceptibility and severity to date, thus 238 demonstrating the value of large-scale self-reported data as a mechanism to rapidly address a serious 239 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint health crisis. We identified three novel loci, all of which harbor genes with established roles in viral 240 replication or immunity, and one of which may provide insight into why men appear to be differently 241 affected by COVID-19 than women. We thus add to growing evidence that host genetic variation 242 contributes to COVID-19 susceptibility and severity and suggest identification of such genetic risk 243 factors may provide profound insight into pathogenesis of the novel coronavirus. 244 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint Methods 245 Ethics statement. All data for this research project was from subjects who provided prior informed 246 consent to participate in AncestryDNA's Human Diversity Project, as reviewed and approved by our 247 external institutional review board, Advarra (formerly Quorum). All data was de-identified prior to use. 248 Study population. Self-reported COVID-19 outcomes were collected through the Personal Discoveries 250 Project ® , a survey platform available to AncestryDNA customers via the web and mobile applications. To participate in the COVID-19 survey, participants must meet the following criteria: they must be 18 258 years of age or older, a resident of the United States, be an existing AncestryDNA customer who has 259 consented to participate in research 28 , and be able to complete a short survey. The survey is designed to 260 assess self-reported COVID-19 positivity and severity, as well as susceptibility and known risk factors 261 including community exposure and known contacts with individuals diagnosed with COVID-19. 262 263 Phenotype definitions. Two phenotypes were assessed: one for susceptibility and one for severity of 264 COVID-19. Cases for the COVID-19 susceptibility phenotype were defined as individuals who 265 responded to the question, "Have you been swab tested for COVID-19, commonly referred to as 266 coronavirus?" as "Yes, and was positive". Responders who answered "Yes, and was negative" were 267 defined as controls for the susceptibility study. Cases for the severity phenotype were defined as 268 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint individuals who reported testing positive for COVID-19 and responded to the question, "Were you 269 hospitalized due to these symptoms?" as "Yes". Severity controls reported testing positive for 270 COVID-19, but reported no hospitalization related to COVID-19. Table 3 and Supplementary Figure 10) . 292 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. Statistical analyses. We used the COVID-19 Host Genetics Initiative (HGI) analysis plan version 1 to 315 guide our analyses. 35 A key recommendation in this plan is to analyze males and females separately 316 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint when possible; therefore, we conducted four separate GWAS in total: susceptibility in males, 317 susceptibility in females, hospitalization in males, and hospitalization in females. Sex was determined 318 from genotype data. For each GWAS, a fixed-effects logistic regression model was implemented with 319 PLINK2.0 with either the susceptibility or severity phenotype as the primary outcome and imputed 320 genotype dosage value as the primary predictor. The following were included as fixed-effect covariates: 321 PCs 1-12 (described above), array platform, orthogonal age, and orthogonal age 2 . Orthogonal 322 polynomials were used to eliminate collinearity between age and age 2 and were calculated in R version 323 3.6.0 with base function poly(age, degree=2). We additionally used PLINK2. For each phenotype, summary statistics for males and females were combined using a fixed-effects 342 inverse-variance weighted meta-analysis, implemented with METAL 36 (version released 25 March 343 2011). Unless otherwise noted, all variant ORs are adjusted for the 15 covariates described above. 344 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint For all individual GWAS and meta-analyses, we considered the European genome-wide significance 346 threshold 37 of two-tailed P<5x10 -8 to represent a significant association. For lead SNPs at loci 347 surpassing P<5x10 -8 , we performed extended analyses in which we additionally adjusted for BMI and 348 self-reported affliction with one or more of any of the following health conditions: asthma, bone marrow 349 transplant, cancer, cardiovascular disease, kidney disease, COPD, diabetes, hypertension, organ failure 350 requiring transplant, autoimmune disease, immunodeficiency, and/or "other" lung condition. 351 352 Replication. To our knowledge, the only published GWAS of COVID-19 outcomes to date was 353 performed by Ellinghaus et al. 11 We compared P-values and OR estimates from our analyses to the two 354 lead variants reported by Ellinghaus: rs657152, representing a region on chr 9 near ABO; and 355 rs11385942, representing a region on chr 3 near SLC6A20. The variant rs11385942 is a small indel and 356 indels are not present in the HRC reference panel; we therefore examined the association with 357 rs11385942 using rs17713054, a SNP in perfect LD (R 2 =1, D'=1) in a European population from 358 LDpair. 38 The allele rs17713054-A corresponds to rs11385942-AG. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 9, 2020. . https://doi.org/10.1101/2020.10.06.20205864 doi: medRxiv preprint Presence of genetic variants among young men with severe COVID-19 COVIDView: a weekly surveillance summary of U.S. COVID-19 activity -key updates for Week 22, ending American Community Survey (ACS) 1-year estimates data profiles To Mail or To Web: Comparisons of Survey Response Rates and Respondent Characteristics Immunoglobulin light chain gene rearrangements, receptor editing and the development of a self-tolerant antibody repertoire Heritability in the genomics era -concepts and misconceptions GCTA: a tool for genome-wide complex trait analysis Common SNPs explain a large proportion of heritability for human height Self-reported symptoms of covid-19 including symptoms most predictive of SARS-CoV-2 infection, are heritable Structural-functional interactions of NS1-BP protein with the splicing and mRNA export machineries for viral and host gene expression Genetic effects on gene expression across human tissues Combined Expression QTLs from 44 Tissues from GTEx (midpoint release, V6) (IVNS1ABP) Whole-genome sequencing of a sporadic primary immunodeficiency cohort Quantitative phosphoproteomics reveals extensive cellular reprogramming during HIV-1 entry Quantifying and correcting for the winner's curse in genetic association studies The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic A prospective analysis of genetic variants associated with human lifespan. G3 (Bethesda) AncestryDNA Ethnicity Estimate 2020 White Paper AncestryDNA Matching White Paper FlashPCA2: principal component analysis of biobank-scale genotype datasets Reference-based phasing using the Haplotype Reference Consortium panel Eagle v2.4.1 user manual The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. COVID-19 Host Genetics Pilot Analysis Plan METAL: fast and efficient meta-analysis of genomewide association scans Estimation of the multiple testing burden for genomewide association studies of nearly all common variants Estimating missing heritability for disease from genome-wide association studies