key: cord-0962752-or00ec5p
authors: Sanchez-Roige, Sandra; Fontanillas, Pierre; Jennings, Mariela V.; Bianchi, Sevim B.; Huang, Yuye; Hatoum, Alexander S.; Sealock, Julia; Davis, Lea K.; Elson, Sarah L.; Palmer, Abraham A.
title: Genome-wide association study of problematic opioid prescription use in 132,113 23andMe research participants of European ancestry
date: 2021-11-02
journal: Mol Psychiatry
DOI: 10.1038/s41380-021-01335-3
sha: f2d2ba1faff1c8a314ce0620d06438995dc6aa35
doc_id: 962752
cord_uid: or00ec5p

The growing prevalence of opioid use disorder (OUD) constitutes an urgent health crisis. Ample evidence indicates that risk for OUD is heritable. As a surrogate (or proxy) for OUD, we explored the genetic basis of using prescription opioids ‘not as prescribed’. We hypothesized that misuse of opiates might be a heritable risk factor for OUD. To test this hypothesis, we performed a genome-wide association study (GWAS) of problematic opioid use (POU) in 23andMe research participants of European ancestry (N = 132,113; 21% cases). We identified two genome-wide significant loci (rs3791033, an intronic variant of KDM4A; rs640561, an intergenic variant near LRRIQ3). POU showed positive genetic correlations with the two largest available GWAS of OUD and opioid dependence (r(g) = 0.64, 0.80, respectively). We also identified numerous additional genetic correlations with POU, including alcohol dependence (r(g) = 0.74), smoking initiation (r(g) = 0.63), pain relief medication intake (r(g) = 0.49), major depressive disorder (r(g) = 0.44), chronic pain (r(g) = 0.42), insomnia (r(g) = 0.39), and loneliness (r(g) = 0.28). Although POU was positively genetically correlated with risk-taking (r(g) = 0.38), conditioning POU on risk-taking did not substantially alter the magnitude or direction of these genetic correlations, suggesting that POU does not simply reflect a genetic tendency towards risky behavior. Lastly, we performed phenome- and lab-wide association analyses, which uncovered additional phenotypes that were associated with POU, including respiratory failure, insomnia, ischemic heart disease, and metabolic and blood-related biomarkers. We conclude that opioid misuse can be measured in population-based cohorts and provides a cost-effective complementary strategy for understanding the genetic basis of OUD.

Opioid use disorders (OUD) represent a global epidemic [1] . Every day, 128 people in the United States die after overdosing on opiates. The pathway to opiate addiction has changed over the last few decades. In the 1960s, more than 80% of people who began using opioids initiated with heroin [2] ; by 2013, nearly 80% of opioid users reported that their first regular opioid was a prescription pain reliever [3] . The misuse of and addiction to opiates-including prescription pain relievers, heroin, and synthetic opioids such as fentanyl-is thus a serious emergency that affects public health as well as social and economic welfare. More recently, the COVID-19 pandemic made it increasingly difficult for individuals with OUD to access treatment and impacted mental health, triggering both initial and continued use of opioids [4] , which is likely to further increase rates of OUD [5] .

Although OUD is known to be moderately heritable [6, 7] , genomic discovery has been severely limited due to the complexity of obtaining large, well-characterized samples of cases and opiate-exposed controls [6, [8] [9] [10] [11] [12] [13] [14] . The largest genome-wide association study (GWAS) to date (~10 K cases) identified only one locus in OPRM1, which encodes the µ-opioid receptor [7] . Larger sample sizes are urgently needed to identify additional loci associated with OUD.

An alternative, complementary approach is to study the genetic liability for OUD across different stages, particularly the transition from use to misuse. By taking this approach, we can explore specific aspects of OUD liability, such as initial use, subjective drug response, transition to hazardous use, dependence, withdrawal, and relapse. This strategy can be applied in population-based cohorts, allowing for a dramatic increase in sample sizes for a fraction of the cost.

In the present study, we pursued a novel strategy in which we measured problematic prescription opioid use (POU) in 23andMe research participants (n = 132,113) of European ancestry. We asked a single question "Have you ever in your life used prescription painkillers (taken not as prescribed), e.g., Vicodin, Oxycontin?", and conducted a GWAS defining cases as those who answered 'yes'. We used a variety of bioinformatic analyses to explore the relationship between POU and OUD as well as other substance use disorder-related behaviors. We also explored POU's genetic relationship with various psychiatric, behavioral, and medical conditions.

We utilized a cohort of 132,113 male and female research participants of European ancestry. All participants were drawn from the customer base of 23andMe, Inc., a direct-to-consumer genetics company. Participants provided informed consent and participated in the research online, under a protocol approved by the external AAHRPP-accredited IRB, Ethical & Independent Review Services (www.eandireview.com). During 4 months in 2015 and 14 months in 2018-2020, participants responded to a decisionmaking survey that, depending on branching logic, included up to 139 questions pertaining to aspects of impulsivity and substance use and abuse. A single item in this survey asked, "Have you ever in your life used prescription painkillers (taken not as prescribed), e.g., Vicodin, Oxycontin?". A total of 132,113 individuals responded, with 27,805 (10,164 males) answering 'yes' (cases) and the remaining 104,308 (36,246 males) answering 'no' (controls). Controls were not screened for prior opioid use, so they represent a combination of individuals who have taken opioids but only as prescribed, and others who have never taken an opioid. Only individuals who were categorized as being of European ancestry based on empirical genotype data [15] were included in this study. Demographic information about this sample is presented in Supplementary Table 1 .

DNA extraction and genotyping were performed on saliva samples by CLIA-certified and CAP-accredited clinical laboratories of Laboratory Corporation of America. Quality control, imputation, and genome-wide analysis were performed by 23andMe (Supplementary Table 2 ; see Supplemental Material and [16, 17] for further information about genotyping, imputation, and quality control).

As previously described [17, 18] , 23andMe's analysis pipeline performs logistic regression assuming an additive model for allelic effects (Supplementary Material). Covariates included age (inverse-normal transformed), sex, the top five principal components of ancestry, and indicator variables for genotyping platforms. P values were not corrected for genomic control.

We used a variety of bioinformatic methods to further characterize the loci identified by the GWAS. First, we used the default version (v1.3.6a) of the FUMA web-based platform [19] to identify independent SNPs (r 2 < 0.10) and to study their functional consequences. We also used MAGMA v1.08 [19, 20] to perform competitive gene-set and pathway analyses. SNPs were mapped to 18,546 protein-coding genes from Ensembl (build 85). We applied a Bonferroni correction based on the total number of genes tested (p < 2.56E−06). Gene sets were obtained from Msigdb v7.0 ("Curated gene sets", "GO terms"). We also used Hi-C coupled MAGMA (H-MAGMA; [21] ) to assign non-coding (intergenic and intronic) SNPs to genes based on their chromatin interactions. Exonic and promoter SNPs were assigned to genes based on physical position. H-MAGMA uses four Hi-C datasets, which were derived from fetal brain, adult brain, iPSC-derived neurons, and iPSCderived astrocytes (https://github.com/thewonlab/H-MAGMA). We applied a Bonferroni correction based on the total number of gene-tissue pairs tested (p < 9.55E−07).

Lastly, we used S-MultiXcan v0.7.0 (an extension of S-PrediXcan v0.6.2 [22] ) to identify specific eQTL-linked genes associated with POU. This approach uses genetic information to predict transcript abundance in 13 brain tissues, and tests whether the predicted transcripts correlate with POU. S-PrediXcan uses pre-computed tissue weights from the Genotype-Tissue Expression (GTEx) v8 project database (https://www.gtexportal. org/) as the reference transcriptome dataset. For S-PrediXcan and S-MultiXcan analyses, we chose to use sparse (elastic net) prediction models, which are available at http://predictdb.hakyimlab.org/. We applied a conservative Bonferroni correction based on the total number of gene-tissue pairs tested (14, 159 gene-tissue pairs tested; p < 3.53E−06).

We examined the 17 genes that were significantly associated with POU in the MAGMA gene-based analysis (10% FDR) for known interactions with prescription medications using the Drug Gene Interaction Database v3.0 (dgidb.genome.wustl.edu) [23] . We used the Anatomical Therapeutic Chemical (ATC) classification system to determine the second-level classification of each medication we identified. ATC classifications for medications were retrieved from the Kyoto Encyclopedia of Genes and Genomics (KEGG; https://www.genome.jp/kegg/drug/) and the World Health Organization Collaborating Center for Drug Statistics Methodology (https://www.whocc.no/atc_ddd_index/). We used the R package circlize v0.4.1 [24] to visualize the interactions between each gene and the ATC classifications of drugs it interacts with.

We used the LD Score regression (LDSC [25] ) python package to estimate the heritability explained by SNPs (SNP-h 2 ). We used pre-computed LD scores ("eur_w_ld_chr/"), which are publicly available (https://data. broadinstitute.org/alkesgroup/LDSCORE/). LD scores were computed for every SNP using individuals from European ancestry from the 1000 Genomes Project. We restricted the analysis to well-imputed SNPs, filtered to HapMap3 SNPs, with MAF above 1%. We removed InDels, structural variants, strand-ambiguous SNPs and SNPs with extremely large effect sizes (χ 2 > 80). Heritability was calculated on the liability scale by accounting for differences in population prevalence (4%) and sample prevalence (21%). The population prevalence was retrieved from the 2018 National Survey on Drug Use and Health [26] .

We used LDhub or local LDSC [25] to calculate genetic correlations (r g ) between POU and 935 other traits or diseases (852 from LDHub and 83 local) [25] . Local traits were selected based on previously known phenotypic associations between OUD and other substance use disorder phenotypes and related traits (e.g., cannabis use disorder, various measures of impulsivity) not available on LDhub. We used the standard Benjamini-Hochberg false discovery rate correction (FDR 5%) to correct for multiple testing. We also calculated a Bonferroni correction for 935 comparisons (p < 5.35E−05); however, this correction is overly conservative because many of the 935 traits are highly correlated with one another.

We used mtCOJO [24] to individually condition the POU summary statistics on loci associated with other comorbid traits, including risk-taking behavior [27] , smoking [28] , cannabis use disorder [29] , alcohol dependence [30] , chronic pain [31] , and OUD [32] . This analysis allowed us to examine whether the genetic associations with POU would be preserved when controlling for those covariate phenotypes. To test as many SNPs while preserving computational efficiency, we used a p value threshold of 2.00E−05, 5.00E−08,1.00E−05, 1.00E−05, 5.00E−08, and 1.00E−05, respectively, for risk-taking behavior, smoking, cannabis use disorder, alcohol dependence, chronic pain, and OUD. We then computed genetic correlations using the POU summary statistics adjusted for the covariates of interest.

Previous studies have shown that consumption and misuse/dependence phenotypes have a distinct genetic architecture. To explore whether POU clustered more with consumption or misuse/dependence phenotypes, we used a data-driven unsupervised machine learning method known as agglomerative hierarchical clustering analysis (HCA) [33] . HCA forms clusters iteratively by creating groups and successively joining or splitting those groups based on a prespecified algorithm [33] . Agglomerative nesting (AGNES) is a bottom-up process focused on individual traits to structure. Agglomerative clustering was chosen as this allowed us to compare different algorithms to maximize for the dissimilarity on each branch, with Ward's minimum variance method performing best. All models were fit in R using the cluster package [33] .

The product of HCA is a dendrogram, formed with multiple brackets called "branches". Phenotypes on the same branch are more similar to Polygenic risk score phenome-wide and lab-wide association scans in BioVU

To explore whether pleiotropic effects for POU are associated with an array of other medical ailments and biomarkers, we performed phenome-wide and lab-wide association scans (PheWAS, LabWAS), respectively. These analyses were conducted using data from the Vanderbilt University Medical Center (VUMC). The project was approved by the VUMC Institutional Review Board (IRB #160302, #172020, #190418). VUMC is an integrated health system with individual-level health data from electronic health records (EHR) for about 3.2 million patients. The VUMC biobank contains clinical data from EHR as well as biomarkers obtained from laboratory assessments. A portion of the individuals from VUMC also have accompanying array genotyping data. This cohort, with over 66,903 patients, is called BioVU [34, 35] .

For each of the unrelated genotyped individuals of European ancestry from BioVU, we computed polygenic risk scores (PRS) for POU using the PRS-CS "auto" version [34] . Genotyping and quality control for this cohort have been extensively described [35, 36] .

To identify associations between the PRS for POU and clinical phenotypes, we performed a PheWAS in BioVU. We fitted a logistic regression model to each of 1338 case/control disease phenotypes ("phecodes") to estimate the odds of each diagnosis given the POU PRS, while adjusting for sex, median age of the longitudinal EHR, and the first ten PCs. Analyses were conducted using the PheWAS v0.12R package [37] . We required the presence of at least two International Disease Classification codes mapped to a PheWAS disease category (Phecode Map 1.2; https://phewascatalog.org/phecodes) and a minimum of 100 cases for inclusion of a phecode. The disease phenotypes included 145 circulatory system, 120 genitourinary, 119 endocrine/metabolic, 125 digestive, 117 neoplasms, 91 musculoskeletal, 85 sense organs, 76 injuries & poisonings, 65 dermatological, 76 respiratory, 69 neurological, 64 mental disorders, 42 infectious diseases, 42 hematopoietic, 34 congenital anomalies, 37 symptoms, and 31 pregnancy complications.

To identify associations between POU PRS and biomarkers, we performed a LabWAS [36] in BioVU. We implemented the pipeline already established by Dennis et al. [36] . Broadly, LabWAS uses the median, INT-transformed ageadjusted values from the QualityLab pipeline in a linear regression to determine the association with the input POU PRS variable. We controlled for the same covariates as for the PheWAS analyses, excluding median age because the pipeline corrects for age using cubic splines with four knots.

Lastly, to explore whether the POU PRS effects could be mediated by the diagnosis of substance use disorders (SUD) or OUD in EHR, we repeated the PheWAS and LabWAS analyses after adjustment for SUD (N = 2304 cases) and OUD (N = 1799 cases). We applied FDR (5%) correction for both PheWAS and LabWAS analyses.

Genome-wide association analysis, biological annotation, gene, and transcriptome-based association analyses We examined 11,311,983 SNPs in all 132,113 study participants. The inflation factor of the GWAS was λ GC = 1.097 with an LDSC intercept of 1.004 (SE = 0.008), suggesting that the majority of the inflation was due to polygenicity. The SNP heritability of POU on the liability scale was SNP-h 2 = 0.04 ± 0.01.

We identified two genome-wide significant loci: rs3791033 near the genes KDM4A and PTPRF (p = 3.80E−08), and rs640561 near LRRIQ3 (p = 3.80E−08; Fig. 1 , Supplementary Table 3 and Supplementary Fig. 1 and 2 for locus zoom plots). Both loci are on chromosome 1, but rs3791033 and rs640561 are independent (r 2 < 0.001).

A phenome-wide scan in the UK Biobank (UKB; N > 360,000) revealed that rs3791033 has been previously implicated in smoking phenotypes (e.g., ever smoked regularly, p = 1.01E−12), other psychiatric conditions, such as ADHD (p = 2.76E−10), and educational outcomes, such as educational attainment (p = 1.01E −10). KDM4A, which is the nearest gene, has been previously implicated with similar traits across several independent GWAS studies (e.g. [27, 38, 39] ; Supplementary Table 4 ). The other gene in this region, PTPRF, has been previously identified in studies of smoking and other substance use behaviors [18, 27, 40] ; Supplementary Table 4) .

For the second genome-wide significant SNP (rs640561), a phenome-wide scan in UKB revealed nominal associations (p > 7.00E-03) with neurological, metabolic and other psychiatric traits, such as ADHD (Supplementary Table 4 ). LRRIQ3, which is the nearest gene, has previously been associated with lifetime smoking [27] .

We did not observe any GWAS-significant (p = 5.00E−08) or even nominally significant (p < 0.05) associations with any of the SNPs that have been previously associated with OUD and opioid dependence ( [6, [8] [9] [10] [11] [12] [13] Supplementary Table 5 ). However, the largest available GWAS of OUD supported our finding for rs640561 (p = 6.42E-03; Supplementary Table 5 ).

In addition to the GWAS, we performed several gene-based analyses. KDM4A was implicated by both MAGMA (FDR 5%) and H-MAGMA (Supplementary Tables 6 and 7) . PTPRF was implicated by H-MAGMA (Supplementary Table 7 ). None of these analyses identified LRRIQ3. H-MAGMA analysis also identified ARTN (previously showing a nominal association with tea consumption in UKB; Supplementary Table 7) . S-MultiXcan did not identify any significant genes (Supplementary Table 8) .

Lastly, gene-set analysis in MAGMA identified one significant gene set, which is involved in the activation of Phospholipase D (Supplementary Table 9 ). Intriguingly, activation of Phospholipase D2 modulates agonist-induced µ-opioid receptor desensitization and resensitization [41] .

For this analysis we relaxed the MAGMA FDR threshold from 5 to 10%, which produced a set of 17 genes; 3 of these 17 genes (KDM4A, PTPRF, CACNA2D2) had a total of 464 interactions with drugs belonging to ATC drug classes (see Fig. 2 Table 10 ). The most abundant second-level ATC classifications identified were antibacterials for systemic use, psycholeptics, and rs640561) . The x-axis shows chromosomal position and the y-axis shows significance on -log10 scale. The horizontal red line denotes genomewide significance (p = 5.00E-08).

gynecological antiinfectives and antiseptics. Interactions with KDM4A included both dopamine agonists and agents, as well as adrenergic agents, drugs used in alcohol dependence such as disulfiram, opioid anesthetics, such as sufentanil citrate, and antidepressants.

We performed a series of genetic correlations and identified consistent, moderate-to-strong associations with well-known OUD comorbid factors. We considered 935 traits, which represented 17 categories (substance use, psychiatric, impulsivity, personality, lifestyle, education, cognitive, health, longevity, metabolic, eyes, teeth, sleep, pain, reproductive, medication and anthropometric). We identified significant genetic associations between POU and 253 of these 935 traits (FDR 5%; Fig. 3, Supplementary Table 11) .

Notably, POU showed positive genetic correlations with OUD as measured by the Million Veteran Program (MVP, r g = 0.64, p = 9.97E−09, European), with opioid dependence versus unexposed controls (European) as measured by the Psychiatric Genomics Consortium (PGC, r g = 0.80, p = 3.24E−04; opioid dependence versus opioid exposed controls showed no significant heritability and therefore could not be tested), and with opioid medication use (r g = 0.46, p = 2.25E−12) in participants from UKB.

Beyond opioid-related traits, we also identified strong genetic correlations with alcohol, nicotine, and cannabis-related traits, from initiation (nicotine, r g = 0.63, p = 7.14E−38; cannabis, r g = 0.35, p = 1.06E−06), to high levels of consumption (drinks per week, r g = 0.36; p = 2.11E−15), to misuse (AUDIT-P: r g = 0.42, p = 5.60E−08) and dependence (alcohol, r g = 0.74, p = 7.21E−08; the Fagerström Test for Nicotine Dependence, r g = 0.60, p = 2.98E −10; cannabis use disorder, r g = 0.63, p = 2.36E−13), and other smoking behaviors, such as maternal smoking around birth (r g = 0.64, p = 3.44E−26). Interestingly, hierarchical HCA with AGNES found that POU clustered closely with substance use disorders, as opposed to consumption phenotypes ( Supplementary Fig. 3) . We also identified a number of associations with UKB pain phenotypes, including positive genetic correlations with several pain ICD diagnoses (e.g., pain in throat, pelvic pain, r g range = [0.37-0.39], p < 8.22E−05), and prevalent pain conditions such as back and knee pain, and headaches (r g range = [0.21-0.42], p < 8.50E−03). As expected, we found positive genetic associations with chronic pain (r g = 0.42, p = 4.04E−17) and higher pain relief medication intake (e.g., Ibuprofen, r g = 0.49; p = 1.61E−10).

We also identified positive genetic associations between POU and other behavioral and psychiatric traits, such as mood swings (r g = 0.40, p = 1.04E−13), risk-taking behavior (r g = 0.38, p = 2.09E −08), major depressive disorder (r g = 0.44, p = 1.59E−10), insomnia (r g = 0.39, p = 6.88E−11), loneliness (r g = 0.28, p = 4.81E−06), and irritability (r g = 0.27, p = 1.74E−05).

Lastly, POU was negatively correlated with educational attainment (e.g., obtaining a college or university degree, r g = −0.47; p = 6.13E−25) and intelligence (r g = −0.41; p = 7.22E−11).

These genetic correlations generally remained consistent after conditioning POU on risk-taking behavior, nicotine, and cannabis use disorders (Supplementary Tables 12, 13 , and 14, respectively; Fig. 4 ). These associations were also broadly consistent when we conditioned on pain (Supplementary Tables 16) and OUD  (Supplementary Tables 17) . Intriguingly, when we corrected for alcohol dependence some of the associations increased, particularly with OUD and opioid dependence, cannabis use disorder, and depressive symptoms, whereas some associations dramatically decreased, such as nicotine dependence as measured via FTND (Supplementary Table 15 ) [42] .

We conducted a PheWAS with EHR data to test the association between polygenic risk for POU and liability across thousands of medical conditions from hospital-based cohorts. Our PheWAS identified significant associations between POU and 53 medical traits across 13 categories (i.e., psychiatric disorders, respiratory, circulatory system, symptoms, sense organs, dermatologic, injuries and poisonings, neurological, digestive, infectious diseases, neoplasms, genitourinary, endocrine/metabolic; Supplementary  Table 18 and Fig. 5 ). Similar to the genetic correlations, the strongest associations were with substance use disorders, including tobacco use disorder (OR = 1.15, p = 7.47E−25), alcoholrelated disorders (OR = 1.12, p = 7.69E−05) and substance addiction and disorders (OR = 1.17, p = 4.67E−12), which included both opiates and other substances.

Another similarity between the PheWAS and the genetic correlations was the positive association between POU and pain phenotypes measured in EHR, such as acute pain (OR = 1.05, p = 2.09E−04) and chest pain (OR = 1.04, p = 8.90E−04). POU was also significantly associated with various psychiatric disorders, including mood disorders (OR = 1.08, p = 1.01E−11), anxiety disorders (OR = 1.09, p = 1.98E−11), depression (OR = 1.07, p = 1.29E−08) and posttraumatic stress disorder (OR = 1.18, p = 1.60E−07).

Finally, we discovered additional associations between POU and other EHR traits, including respiratory failure (OR = 1.06, p = 4.70E −05), insomnia (OR = 1.06, p = 1.75E-03) and ischemic heart disease (OR = 1.06, p = 3.57E−06).

We repeated the PheWAS analyses after adjusting for SUD and OUD diagnoses (Supplementary Tables 19 and 20 ). The magnitude of the effects persisted for most traits even though the strength of the association for a few conditions was diminished, particularly when controlling for general SUD.

In addition to the PheWAS, we also examined laboratory results, which we refer to as LabWAS. The LabWAS identified 15 biomarkers significantly associated with POU (Supplementary  Table 21 and Fig. 5 ). For instance, POU was associated with metabolic biomarkers, including increased triglyceride in serum or plasma (beta = 0.04, p = 1.41E−12) and blood glucose (beta = 0.02, p = 4.02E−5), and lower HDL cholesterol (beta = −0.03, p = 7.21E−07), calcium (beta = −0.01, p = 8.00E-04), folate (beta = −0.04, p = 8.70E-04), and calcitriol (beta = −0.02, p = 1.97E−03). We also found associations with blood-related biomarkers, such as high erythrocyte distribution (beta = 0.03, p = 3.14E−11), increased leukocytes (beta = 0.02, p = 9.68E −07), reduced carbon dioxide in serum/plasma (beta = −0.02, p = 3.42E−7) and creatinine (beta = −0.04, p = 2.15E−04).

Similar to the PheWAS results, the above-mentioned associations persisted after we corrected for SUD and OUD diagnoses (Supplementary Tables 22 and 23) , suggesting the biomarkers are primarily affected by genetic liability for POU rather than substance use status.

In this study, we performed a GWAS of problematic opioid use ('ever taking prescription painkillers not as prescribed') in 132,113 23andMe research participants of European ancestry. This represented a novel approach to studying OUD in a population-based cohort. Our results show that this single question captured a genetic signal that is correlated with signals from well-characterized cohorts that have been clinically diagnosed with OUD. Notably, the genetic correlations with OUD persisted even after correcting for risk-taking behavior and other putatively similar dimensional phenotypes. We also identified novel associations with laboratory-based biomarkers, demonstrating the overarching impact of POU on health. While previous GWAS of OUD have used clinically ascertained cohorts, our results suggest that POU provides a cost-effective alternative to diagnosed OUD that is viable in non-clinically ascertained populations, making it possible to rapidly obtain large sample sizes that can aid in OUD gene discovery.

Obtaining a large enough sample size to effectively identify risk loci has been a common obstacle for OUD GWAS [43] . Our study represents a new and qualitatively different way of characterizing individuals at high risk for OUD. Our approach is stimulated by the idea of fractionating OUD and looking at POU as an early stage of misuse [44] . In particular, we were motivated to characterize the common mechanism of taking opioids not as prescribed, which can lead to abuse. Sometimes referred to as minimal phenotyping, where a complex trait is reduced to a single (yes or no) question [44] , the polygenic signal of POU is nevertheless informative for aspects of OUD risk that is not intended to be a 'noisy' measurement of the true underlying disease.

The most important finding in this study is the identification of the polygenic architecture of POU that is highly comparable to the findings from GWAS of clinically ascertained OUD cohorts. We observed a positive association of POU with OUD and opioid dependence measured by two of the largest available GWAS, MVP, and PGC, and with a GWAS of opioid medication use in UKB participants. As might have been expected, we also observed strong positive genetic correlations with alcohol dependence and tobacco smoking, as well as with various psychiatric traits associated with OUD, including mood swings, risk-taking, anxiety, depression, and insomnia. These sets of genetic correlations mirror those from previous GWAS of clinically ascertained OUD samples [6] [7] [8] . However, we also showed that the overlap is not complete, and whether POU could be an early manifestation of risk for subsequent OUD is not directly explored by our study.

We initially speculated that some of the genetic signal associated with POU could be confounded by genetic factors unrelated to opioid misuse, such as risk-taking behaviors, comorbid substance use (such as tobacco or cannabis use disorders, and even OUD diagnosis) or pain. Previous reports have indicated that genetic studies using an unscreened sample for relevant comorbid conditions [6] , including all individuals being exposed to opioids at least once, can introduce biases in the genetic analyses. However, when we adjusted for these known risk factors, the signal remained consistent, suggesting that we are capturing a signal that is specific to opioids and not merely confounded by signals associated with the secondary phenotypes. This study has also revealed novel biological insights for opioid misuse. While we were unable to replicate the previously reported GWAS signals associated with OUD, we discovered two novel loci associated with POU. Rs640561 replicated in the largest available GWAS of OUD ( [7] ). Both lead SNPs (rs3791033, rs640561) and nearby genes have shown previous associations with other psychiatric traits, particularly smoking phenotypes, which are known to be highly comorbid with OUD and were also strongly correlated with POU. In particular, one of the strongest signals included variants in KDM4A, which is a protein-coding gene that belongs to the Jumonji domain 2 (JMJD2) family of histone demethylases [45] . KDM4 demethylases catalyze the removal of the methyl marks (H3K9me3, H3K36me3) , thereby regulating a range of crucial biological functions [46] , including cell differentiation and proliferation. Intriguingly, the JMJD2 or KDM4 family have been shown to be important epigenetic regulators in cancer cells [47] [48] [49] and, relevant to this study, reward circuitry in depression [50] and alcohol withdrawal [51, 52] . The fact that KDM4A has known interactions with medications that are used to treat other psychiatric conditions, such as depression, or target dopaminergic or serotonergic systems, as well as drugs used in opioid anesthetics, suggests that KDM4 demethylases may be implicated in relevant biology for opioid behaviors and could represent promising druggable targets [53] . However, the biology of this locus could be more complex. Although the credible set analysis suggested that the most probable causal variants are located in KDM4A, the lead variant in this region, rs3791033, is an eQTL for several other nearby genes PTPRF, HYI, ST3GAL3, MED8, CCDC24, ATP6V0B, WDR65, ARTN, TIE1. Our second strongest signal was rs640561, which is near the gene LRRIQ3, which is a protein-coding gene of leucine rich repeats and IQ Motif Containing 3. rs640561, or SNPs in strong LD, have been previously associated with other substance use traits, including smoking [54] [55] [56] and alcohol consumption [56] , schizophrenia [57] , education attainment and math ability [58] , but the mechanism whereby this SNP affects OUD liability is largely unknown. Lastly, the gene-set analyses revealed that the polygenic signal for POU is implicated in pathways that modulate µ-opioid receptor sensitization. Taken together, these findings suggest a potential avenue for identifying new therapeutic targets for problematic opioid use.

We examined the genetic associations between POU PRS and thousands of clinical diagnoses and biomarkers measured in a hospital-based setting via PheWAS and LabWAS analyses. These analyses revealed significant associations for a wide range of health conditions, implicating POU across all major body systems, some of which had been previously associated with various clinical OUD cohorts and others that are novel.

For example, we consistently identified strong associations between POU and poor metabolic function (e.g., increased triglyceride levels, lower HDL cholesterol, high blood glucose levels), which can result in serious downstream health consequences. These results are consistent with previous studies showing poor metabolic function in individuals who misuse opioids [59] , and in morphine-induced rats [60] . Some metabolic markers, such as calcium and folate, have been previously associated with opioid misuse [61, 62] , whereas others (calcitriol) were identified as novel. Although it has been previously speculated that reduced metabolic biomarkers (calcium, folate) in high opioid users could be a direct consequence of repeated exposure to opioids, our results suggest that at least part of the previously reported correlations are due to common genetic factors that influence opiate use, calcium and folate. POU PRS was also associated with blood-related biomarkers, including increased erythrocyte and decreased lymphocyte counts, consistent with previous phenotypic studies [63, 64] . We observed an association between POU PRS and decreased carbon dioxide and creatinine levels, contrary to a previous finding showing that opioids can decrease sensitivity of peripheral chemoreceptors in the lung, leading to increased carbon dioxide levels [65] , or no significant difference of creatinine levels among the heroin and opium participants studied [61] , suggesting that some of these discrepancies may be associated with environmental consequences of OUD. It is also possible that some of the relationships that we identified could be a consequence of smoking and alcohol consumption, since high POU PRS was associated with tobacco and alcohol use disorders; whether the associations we report are a consequence of those relatively common behaviors is not directly tested in this study.

This study is not without limitations. The screening and composition of the control group is crucial for studies of substance use and abuse [6, 31, 43] . Our controls indicated they had never used opiates not as prescribed, which could include individuals who had simply never used an opiate, along with individuals who had anywhere from minimal to extensive experience with opiates but had never deviated from their prescribed use. POU might be even more useful if we had additional data that allowed us to exclude individuals that have never used opioids. Using an unscreened control group can lead to considerable phenotypic heterogeneity across samples [7, 66] . Similarly, it is unclear what type of prescription painkillers the subjects included in our study may have used. Future studies with improved phenotyping around this topic and greater sample size could be even more productive. A second important limitation is our inability to evaluate whether the individuals included in our analyses suffered from mild versus severe pain. Although high genetic predisposition for chronic pain may itself be a risk factor for OUD, this concern was partially addressed by our mtCOJO analysis in which we conditioned on pain. That analysis revealed that the association between POU and OUD persevered even after correcting for pain, suggesting that POU was not correlated with OUD via its ability to capture genetic predisposition to pain. Lastly, the 23andme participant base was not ascertained for OUD and is more educated and has a higher socioeconomic status than the broader population; therefore, it is possible that a similar misuse phenotype captured in other higher risk populations may yield different results.

In summary, we have shown that the genetic signature for opioid misuse that we were able to capture via self-report in an unselected population is similar to genetic risk for OUD. Our work sets the stage for future analyses incorporating a multivariate framework (e.g., genomic Structural Equation Modeling), and larger sample sizes. Our approach provides new, cost-efficient tools for genetic research related to OUD and provides insights into an intermediate behavior that may set the stage for later transition to OUD.

We have provided summary statistics for the top 10,000 SNPs (Supplementary  Table 24 ). Full GWAS summary statistics will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Interested investigators should email dataset-reques-t@23andme.com and reference this paper for more information.

The prescription opioid and heroin crisis: a public health approach to an epidemic of addiction

The changing face of heroin use in the United States: a retrospective analysis of the past 50 years

Associations of nonmedical pain reliever use and initiation of heroin use in the United States

Opioid use disorder and COVID-19: crashing of the crises

The changing opioid crisis: development, challenges and opportunities

Leveraging genome-wide data to investigate differences between opioid use vs. opioid dependence in 41,176 individuals from the Psychiatric Genomics Consortium

Association of OPRM1 functional coding variant with opioid use disorder: a Genome-Wide Association Study

Cisexpression quantitative trait loci mapping reveals replicable associations with heroin addiction in OPRM1

Evidence of CNIH3 involvement in opioid dependence

Genome-wide association study identifies a regulatory variant of RGMA associated with opioid dependence in European Americans

Genomewide association study of opioid dependence: multiple associations mapped to calcium and potassium pathways

Genome-wide association study of phenotypes measuring progression from first cocaine or opioid use to dependence reveals novel risk genes

Genome-wide association study of therapeutic opioid dosing identifies a novel locus upstream of OPRM1

Genome-wide association of heroin dependence in Han Chinese

Ancestry composition: a novel, efficient pipeline for ancestry deconvolution. bioRxiv

Webbased, participant-driven studies yield novel genetic associations for common traits

Identification of 15 genetic loci associated with risk of major depression in individuals of European descent

Large-scale cognitive GWAS meta-analysis reveals tissue-specific neural expression and potential nootropic drug targets

Functional mapping and annotation of genetic associations with FUMA

MAGMA: generalized gene-set analysis of GWAS Data

A computational tool (H-MAGMA) for improved prediction of brain-disorder risk genes by incorporating brain chromatin interaction profiles

Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics

DGIdb: mining the druggable genome

Circlize implements and enhances circular visualization in R

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies

Key substance use and mental health indicators in the United States: results from the 2018 National Survey on Drug Use and Health

Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences

Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use

A large-scale genome-wide association study meta-analysis of cannabis use disorder

Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders

Genome-wide association study of multisite chronic pain in UK Biobank

GWAS including 82,707 subjects identifies functional coding variant in OPRM1 gene associated with opioid use disorder

Cluster: cluster analysis basics and extentions. R Software

Polygenic prediction via Bayesian regression and continuous shrinkage priors

Development of a large-scale de-identified DNA biobank to enable personalized medicine

Genetic risk for major depressive disorder and loneliness in sex-specific associations with coronary artery disease

R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment

Attention-Deficit/Hyperactivity Disorder and lifetime cannabis use: genetic overlap and causality

Gene discovery and polygenic prediction from a 1.1-million individuals

Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N = 112 151)

Phospholipase D2 modulates agonist-induced µ-opioid receptor desensitization and resensitization: PLD2 modulates µ-opioid receptor desensitization

Expanding the genetic architecture of nicotine dependence and its shared genetics with multiple traits

A review of opioid addiction genetics

Emerging phenotyping strategies will advance our understanding of psychiatric genetics

Reversal of histone lysine trimethylation by the JMJD2 family of histone demethylases

Histone lysine demethylase (KDM) subfamily 4: structures, functions and therapeutic potential

Targeting USP1-dependent KDM4A protein stability as a potential prostate cancer therapy

KDM4/JMJD2 histone demethylases: epigenetic regulators in cancer cells

Genetic alterations of KDM4 subfamily and therapeutic effect of novel demethylase inhibitor in breast cancer

Histone lysine demethylases of JMJD2 or KDM4 family are important epigenetic regulators in reward circuitry in the etiopathology of depression

Genome-wide association study identifies 5q21 and 9p24.1 (KDM4C) loci associated with alcohol withdrawal symptoms

Dysregulation of the histone demethylase KDM6B in alcohol dependence is associated with epigenetic regulation of inflammatory signaling pathways

PRDM12: new opportunity in pain research

Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences

Leveraging polygenic functional enrichment to improve GWAS power

Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use

Genome-wide association analysis identifies 13 new risk loci for schizophrenia

Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals

Which criteria to use to identify metabolic syndrome among patients with addictive disorders?: Observations among patients with alcohol and opioid dependence syndrome

Morphine-induced alterations in plasma and tissue cholesterol levels

Opium and heroin alter biochemical parameters of human's serum

Metabolomics reveals biomarkers of opioid use disorder

Complete blood count alterations due to the opioid use: what about the lymphocyte-related ratios, especially in monocyte to lymphocyte ratio and platelet to lymphocyte ratio?

Changes in mu opioid receptors and rheological properties of erythrocytes among opioid abusers

The effects of opioids on the lung

Genomewide association analysis of opioid use disorder: a novel approach using clinical data

FUNDING MVJ, SBB, YH, SSR, and AAP were supported by funds from the California Tobacco-Related Disease Research Program (TRDRP; Grant Number 28IR-0070 and T29KT0526). SBB was also supported by P50DA037844, SSR was also supported by NIDA DP1DA054394. JS was supported by 1F31MH124306-01A1. ASH was funded by T32DA007261. The dataset(s) used for the PheWAS/LabWAS analyses described were obtained from Vanderbilt University Medical Center's BioVU, which is supported by numerous sources: institutional funding, private agencies, and federal grants. These include the NIH funded Shared Instrumentation Grant S10RR025141; and CTSA grants UL1TR002243, UL1TR000445, and UL1RR024975. Genomic data are also supported by investigator-led projects that include U01HG004798, R01NS032830, RC2GM092618, P50GM115305, U01HG006378, U19HL065962, R01HD074711; and additional funding sources listed at https://victr.vumc.org/biovu-funding/. LKD obtained support from 1R01MH113362, 1R01MH118233 and 1R56MH120736.

PF and SLE are employees of 23andMe, Inc., and hold stock or stock options in 23andMe. The other authors report no conflict of interest.

Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41380-021-01335-3.Correspondence and requests for materials should be addressed to 23andMe Research TeamSandra Sanchez-Roige or Abraham A. Palmer.Reprints and permission information is available at http://www.nature.com/ reprintsPublisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.