key: cord-0749812-x1el4pmd authors: Dickson, Samuel P.; Hendrix, Suzanne B.; Brown, Bruce L.; Ridge, Perry G.; Nicodemus‐Johnson, Jessie; Hardy, Marci L.; McKeany, Allison M.; Booth, Steven B.; Fortna, Ryan R.; Kauwe, John S.K. title: GenoRisk: A polygenic risk score for Alzheimer's disease date: 2021-09-30 journal: Alzheimers Dement (N Y) DOI: 10.1002/trc2.12211 sha: afaa192a0a2f0517f273307ee3d96af41ec99645 doc_id: 749812 cord_uid: x1el4pmd INTRODUCTION: Recent clinical trials are considering inclusion of more than just apolipoprotein E (APOE) ε4 genotype as a way of reducing variability in analysis of outcomes. METHODS: Case‐control data were used to compare the capacity of age, sex, and 58 Alzheimer's disease (AD)–associated single nucleotide polymorphisms (SNPs) to predict AD status using several statistical models. Model performance was assessed with Brier scores and tenfold cross‐validation. Genotype and sex × age estimates from the best performing model were combined with age and intercept estimates from the general population to develop a personalized genetic risk score, termed age, and sex‐adjusted GenoRisk. RESULTS: The elastic net model that included age, age x sex interaction, allelic APOE terms, and 29 additional SNPs performed the best. This model explained an additional 19% of the heritable risk compared to APOE genotype alone and achieved an area under the curve of 0.747. DISCUSSION: GenoRisk could improve the risk assessment of individuals identified for prevention studies. optimal-performing model that improves upon prior scores. These results were combined to develop a personalized genetic risk score, termed "age and sex-adjusted GenoRisk" that can be used for reducing variability in clinical outcome models as well as personal risk assessment. 3 . Future Directions: This first version of GenoRisk was restricted to genetic variation that had been previously associated with AD. Future studies will incorporate genome-wide screens for variation as well as additional environmental factors. A variety of risk factors can alter the likelihood of AD and disease course including sleep, 7 exercise, [8] [9] [10] nutrition, 8, 9, 11, 12 sex, and age. 13 Moreover, AD heritability estimates range from 50% to 80%, [14] [15] [16] [17] suggesting a large genetic component. The main genetic determinants of AD risk are variants in the gene apolipoprotein E (APOE), which account for one quarter of the heritability. 14, 17 Areas under the curve (AUCs) associated with APOE alone range from 0.65 to 0.8 across different populations. It is likely the range in APOE association is due to additional genetic variation or environmental factors that differ across populations influencing AD onset and progression. Genome-wide association studies (GWAS) have identified more than 50 additional single nucleotide polymorphisms (SNPs) that contribute to the heritability of AD. 18, 19, 28, [20] [21] [22] [23] [24] [25] [26] [27] Identification of these genetic risk factors explains an additional 25% to 55% heritability and also highlights possible biological pathways of disease onset and progression. 17, 29 Genetic risk scores that incorporate additional genetic variation to predict AD status range from an AUC of 0.57 (20 SNPs) to 0.8 (using > 200,000 SNPs including APOE). Despite the increase in genetic associations derived from the GWAS era, and thus increase in predictive ability afforded by genetic and lifestyle factors, most studies only control for APOE ε4 genotype (≈25% of heritability) in their predictive algorithms. Moreover, studies derive their risk scores from individual populations that may not accurately estimate risk. Polygenic risk score (PRS) approaches that combine literaturederived odds ratios benefit from the large sample sizes that are usually available for the estimation of the odds ratios for each individual SNP. However, many recent PRS are based on and also validated against results from the International Genomics of Alzheimer's Project meta-analysis. [29] [30] [31] [32] [33] [34] [35] These PRS models rely on correlational assumptions that are difficult to verify and that do not frequently allow adjustment for patient-level factors such as sex and age. The phenotypic risk score using raw data has the advantage of simultaneous calculation of risk for correlated SNPs but is often based on a smaller sample size. However, because the model is built on raw data instead of meta-analysis data using odds ratios from each SNP considered for the model, the age effect can be directly estimated. Although case-control studies are limited because individuals may eventually develop AD, we accounted for this limitation by adjusting for the age of the individuals and weighted our estimates by population-based age distributions. Phenotypic prediction model approaches afford the most personalized approach because they are based on raw data. In this article we used data from four independent genetic studies of AD to predict genetic risk. Moreover, we compared the efficacy of var-ious phenotypic prediction models and used this information to create and validate a model comprised of 29 SNPs that predicts an individual's risk of developing AD, independent of age, sex, and genetic risk factors, termed the GenoRisk score. The GenoRisk model was found to be highly efficacious in its prediction of AD risk with an AUC of 0.747, a score that is comparable to PRS' reported in the literature with an equivalent number of SNPs 29 (AUC = 0.73, N = 28 SNPs) for earlyonset AD. The age and sex-adjusted GenoRisk model further personalizes an individual's AD risk while detecting a broad range of risks within each APOE isoform allowing refinement in risk assessment for ɛ4 carriers and additional isoform combinations. Overall, the GenoRisk model is an accurate predictive model able to measure genetic risk within a population beyond the basic APOE genotypes. The final GenoRisk model was calibrated using data from the 1000 Genomes Project participants to match the prevalence of SNPs in the general population; 1000 Genomes Project data were also used for GenoRisk score transformation. A total of 58 SNPs were selected through a literature search of all articles investigating the genetics of AD. 18, 19, 36, 20, [22] [23] [24] [25] [26] [27] [28] The SNPs considered are listed in Table S1 and Table S2 in supporting information. Because the ADGC data were collected on several different platforms, not all SNPs were available in all datasets, therefore the panel of SNPs listed in Table S1 were imputed using data from the 1000 Genomes Project. Imputation was performed within each population and subsequently combined as described in Ridge et al. 37 38 APOE has three isoforms-ɛ2, ɛ3, and ɛ4-that are characterized by varying combinations of two SNPs, rs4293518 and rs7412. Therefore, two different genetic models were tested for APOE: 1. Classification of APOE allelic variants (ɛ2, ɛ3, and ɛ4) using rs429358 and rs7412 genotypes and incorporating them into an allelic model; 2. Coding rs429358 and rs7412 into ɛ2, ɛ3, and ɛ4 isoforms and using a genotypic model for the six possible APOE genotypes. Modeling: Creating a prediction score for a specific phenotype, such as AD, usually uses one of two broad approaches: 1. PRS: A risk score is created by combining odds ratios from the literature using an approach that assumes independence between SNPs. SNPs are either pruned 29 To this end, four general statistical methods were tested: logistic, probit, lasso, and elastic net regression. Lasso and elastic net were based on logistic regression. The elastic net initially used λ = 0.5 but was refined through cross-validation. All statistical models included a genetic component along with age and sex. Models were tested with and without the age × sex interaction term. In addition to the standard genetic models described above, a genetic model that incorporated the odds ratios estimated from previous studies was tested. In this class of models, all the estimates for the odds ratios from previous studies were combined into a single score for each individual, and that score was included in the model as a covariate with age, sex, and APOE status. The scores were either the sum of the effects of the SNPs (additive) or the product (multiplicative) of the effects of the SNPs. SNPs that tends to increase for SNPs that are physically closer to each other on the chromosome, can lead to overfitting when combining estimates that were calculated independently. To account for this, we used LD pruning, which is a method of eliminating lower risk SNPs that are correlated with higher risk SNPs, to create an LD-pruned multiplicative score and an LD-pruned additive score. LD pruning was performed by calculating the LD between each pair of SNPs that shared a chromo-some. When two SNPs had r 2 > 0.2, the SNP with the lower effect as estimated from the literature was dropped. is 20 when the probability of having AD at age 85 is 10%. Unconditional: The Silverman dataset 41 was used to derive age, sex, and model intercept estimates that are more representative of a normal population. A preliminary logit curve was fit to the Silverman results. The curve was assumed to represent the quantile median (ɛ3/ɛ3 genotype) and to be balanced between males and females. The model also assumes that the data are representative of median risk from the other 29 SNPs in the GenoRisk model. These Silverman age and intercept estimates were combined with the GenoRisk genotypic and sex estimates above to make the final absolute AD algorithm, which outputs an estimated probability of developing AD. Conditional: Assuming that a person's unconditional risk is r 0 , then his or her risk conditional on not-currently-having-AD is: Violin plots comparing the Brier scores derived from 10-fold cross-validation for 21 of the 25 models tested. Elastic net, lasso, logistic, and probit are shown in purple, green, red, and blue, respectively. The scores for the other four models are not shown here because their Brier scores were so much higher than the other models that it altered the scale of the figure and made comparison between the remaining models more difficult The summaries of the Brier scores from the 40 repetitions of crossvalidation on the 25 models are presented in Table 1 and in Figure 1 . The eight highest performing models were the regularization methods The GenoRisk model is independent of age and sex; however, age and sex effects are known to alter absolute risk for various diseases including AD. While the ADGC data are useful for estimating coefficients for the genetic risk factors and are appropriate to make estimates for differences between sexes, only an appropriately designed prospective study can accurately estimate the age-related incidence of AD. To finalize the model and be able to provide accurate estimates of the agerelated risk of developing AD based on the GenoRisk score, we used estimates of cumulative risk of developing AD from Silverman et al. 41 A logit curve was fit to the dataset that excluded parents and siblings of individuals with early onset AD and very late onset AD. This is expected to give a more accurate representation of the risk of developing AD in the general population as it excludes individuals known to be biased Figure S1 in supporting information. The logit curve was fit using the data from individuals ≤ 85 years of age. Incorporation of individuals over 85 years resulted in a less reliable model as this subpopulation appears to have a reduced rate of risk accrual over time; this is likely due to competing risk within this age group. 42 Estimands (age and intercept) from the logit fit were incorporated with GenoRisk genotypic and sex estimands to generate age and sex adjusted GenoRisk estimates appropriate for the general population. Age-and sex-specific risk given a subject who does not currently have AD (unconditional risk) may also be calculated. An example is presented in Figure 4 . We developed a genetic prediction model of AD, termed GenoRisk Although APOE ɛ4 is the single greatest genetic determinant of AD risk (apart from rare mutations in PS1, PS2, and APP responsible for autosomal-dominant familial AD), there is a relatively wide range of risk within a given APOE genotype, as shown in this and other studies. 29 In some cases, individuals with a low-risk APOE genotype (e.g., ɛ3/ɛ3) may be revealed to have a higher overall genetic risk than some patients with a high-risk APOE genotype (e.g., ɛ3/ɛ4) when other genetic variants are accounted for, and vice versa. Currently, many trials use APOE ɛ4 status alone to segregate participants into high-risk and low-risk categories, which may reduce study power. The utility of GenoRisk for assessing personalized risk can be demonstrated using an example of a 72-year-old male who wants to know his risk of developing AD. Initial assessment by sex only indicates that his risk will follow the population average for males ( Figure 5 ; dashed blue line This research was funded by AFFIRMATIV diagnostics. Suzanne Hendrix, as the owner of Pentara, received support from Affirmative Diagnostics for the manuscript and received contract work from AC Immmune, Acumen, ADCS, ADDF, Affirmativ Dignpostics, Alector, Alkahest, Allergan, Alzheon, Amylyx, Apodemus, Athira, The "rights" of precision drug development for Alzheimer's disease Alzheimer's disease drug development pipeline Alzheimer's disease drugdevelopment pipeline: few candidates, frequent failures Alzheimer's disease polygenic risk score as a predictor of conversion from mild-cognitive impairment Apolipoprotein E genotype and sex risk factors for Alzheimer's disease: a meta-analysis Neuropathologically defined subtypes of Alzheimer's disease with distinct clinical characteristics: a retrospective study Sleep fragmentation and the risk of incident Alzheimer's disease and cognitive decline in older persons Potentially modifiable lifestyle factors, cognitive reserve, and cognitive function in later life: A cross-sectional study Defining optimal brain health in adults: a presidential advisory from the American Heart Association/American Stroke Association Aerobic exercise and neurocognitive performance: a meta-analytic review of randomized controlled trials Association of mediterranean diet with mild cognitive impairment and Alzheimer's disease: a systematic review and meta-analysis MIND diet associated with reduced incidence of Alzheimer's disease. Alzheimer's Dement Sex and gender differences in Alzheimer's disease dementia The genetic landscape of Alzheimer's disease: clinical implications and perspectives Role of genes and environments for explaining Alzheimer's disease How heritable is Alzheimer's disease late in life? Findings from Swedish twins Alzheimer's disease: analyzing the missing heritability Gene-Wide analysis detects two new susceptibility genes for Alzheimer's disease Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease A rare mutation in UNC5C predisposes to late-onset Alzheimer's disease and increases neuronal cell death Replication of CLU, CR1, and PICALM associations with Alzheimer's disease TREM2 variants in Alzheimer's disease Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease Meta-analysis Confirms CR1, CLU, and PICALM as Alzheimer's disease risk loci and reveals interactions with APOE genotypes Genome-wide association study of Alzheimer's disease Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease Linkage, whole genome sequence, and biological data implicate variants in RAB10 in Alzheimer's disease resilience The genetic risk of Alzheimer's disease beyond APOE ε4: systematic review of Alzheimer's genetic risk scores Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis Genetic assessment of ageassociated Alzheimer's disease risk: development and validation of a polygenic hazard score Using an Alzheimer's disease polygenic risk score to predict memory decline in black and white Americans over 14 years of follow-up Polygenic score prediction captures nearly all common genetic risk for Alzheimer's disease Polygenic risk of Alzheimer's disease is associated with early-and late-life processes Replication of CLU, CR1, and PICALM associations with Alzheimer's disease Assessment of the genetic variance of late-onset Alzheimer's disease Progranulin polymorphism rs5848 is associated with increased risk of Alzheimer's disease International Schizophrenia Consortium S. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder APOE and Alzheimer's disease: a major gene with semi-dominant inheritance Familial patterns of risk in very late-onset Alzheimer's disease Smoking, death, and Alzheimer's disease: a case of competing risks PCT/US2013/065327; and participated in Alzheon, Cortexyme, and Janssen DSMB or Advisory Board in the last 36 months. Samuel P.Dickson received support from Pentara, which in turn received support from Affirmativ Diagnostics for the manuscript. S. B. Booth received support from Affirmative Diagnostics for the manuscript; received travel support from her employer for work events; ADX provides profit sharing incentives with additional salary which were broadly applied to any work performed for the company (including manuscript generation). J. S. K. Kauwe received support for the manuscript from Brigham Young university and has received honoraria from UH Hilo.