key: cord-0439618-w5wk0ojw authors: Nian, Yi; Hu, Xinyue; Zhang, Rui; Feng, Jingna; Du, Jingcheng; Li, Fang; Chen, Yong; Tao, Cui title: Mining On Alzheimer's Diseases Related Knowledge Graph to Identity Potential AD-related Semantic Triples for Drug Repurposing date: 2022-02-17 journal: nan DOI: nan sha: 61bb6381d0bfb221babe5cac7cd819195200c862 doc_id: 439618 cord_uid: w5wk0ojw To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study relations between Alzheimer's disease (AD) and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression. We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110 filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates that might be helpful for AD treatment or prevention. Among three knowledge graph completion models, TransE outperformed the other two (MR = 13.45, Hits@1 = 0.306). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge. This paper shows that our graph mining model can predict reliable new relationships between AD and other entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses. Neurodegenerative diseases are a heterogeneous group of disorders that are characterized by the progressive degeneration of the structure and function of the central nervous system or peripheral nervous system [1] . Common neurodegenerative dis-eases, such as Alzheimer's disease(AD) and related dementias (ADRD), are usually incurable and irreversible and difficult to stop. AD/ADRD are multi-factorial and complex neurodegenerative diseases characterized by progressive memory loss and severe dementia with neuropsychiatric symptoms [2] . An estimated 5.8 million Americans aged 65 and older (12.6%) are living with AD/ADRD in 2020, and this number is projected to reach 13.8 million by 2050 [3] . High prevalence of AD/ADRD creates huge medical and social burdens. The total costs for health care, long-term care and hospital services for all Americans with AD/ADRD are estimated at 305 billion in 2020 [3] . The high failure rate of the development of AD/ADRD drugs amplifies demographic and financial challenges. Given the increasing prevalence of the disease, finding innovative ways to develop effective drugs is an urgent need. Drug repurposing is a strategy for identifying new usages of approved or investigational drugs that are outside the scope of their original medical indications [4] . There are majorly three computational methods for discovering drug repurposing evidence: the network-based methods, text mining and natural language processing (NLP) based approaches, as well as machine learning-based approaches [5] . Inspired by the fact that biologic entities in the same module of biological networks share similar characteristics, network-based approach aims to find several modules(subnetworks or cliques) using algorithms according to the topology structures of networks. NLP approaches usually includes processes of identifying biological entities and mining new knowledge from scientific literature. While machine learning-based approaches can apply different machine learning models such as logistic regression, support vector machine (SVM), random forest(RF), and deep learning (DL) to identify drug repurposing signals The computational drug repurposing strategy offers various advantages over developing entirely new drugs, including the possibilities to lower failure risks and risk of unknown side effects/complications, efficient utilization of development funds and shortened development timelines [6] . Developments in high-throughput screening technologies have catapulted computational drug repurposing to the forefront of attractive drug discovery approaches because the vast amounts of available data could potentially lead to new clues for drug repurposing that individual projects could not possibly reveal. Knowledge graphs can provide comprehensive and semantic representations for heterogeneous data, which has been successfully leveraged in many biomedical applications including drug repurposing [7] . For example, a few recent research focused on using knowledge graph-based approaches to drug repurposing for COVID-19 [8] [9] [10] . Sosa et al. applied knowledge graph embedding methods in drug repurposing for rare diseases [11] . Malas et al. leveraged the semantic properties of a knowledge graph to prioritize drug candidates for Autosomal Dominant Polycystic Kidney Disease (ADPKD) [12] . However, to the best of our knowledge, knowledge graph-based approaches have rarely been applied in AD/ADRD drug repurposing. The objective of this paper is to study potential relations between Alzheimer's diseases and dietary supplements, chemicals, and drugs using a knowledge graph-based approach. Studies have indicated that some drugs, chemicals or food supplements could be related to preventing or delaying neurodegeneration and cognitive decline [13] . However, further research is needed to better understand the back-end mechanisms and to reveal the potential interactions with clinical and pharmacokinetic factors. In this paper, we encode biomedical concepts and their rich relations into a knowledge graph through literature mining [14] . Literature Mining is a data mining technique that identifies the entities such as genes, diseases, and chemicals from literature, discovers global trends, and facilitates hypothesis generation based on existing knowledge. Literature mining enables researchers to study a massive amount of literature quickly and reveal hidden relations between entities that were hard to be discovered by manual analysis. More specifically, we introduce a biomedical knowledge graph that specifically focuses on AD/ADRD and discovers underlying relations between chemicals, drugs, dietary supplements and AD/ADRD. More details of how to construct the knowledge graph and how to leverage graph embedding methods to predict candidates with scoring will be described in the methods section. We also present several rankings of candidates and comparisons of different graph embedding algorithms. Knowledge graph construction There are 113,863,366 triples and 20,943,461 entities in total obtained from SemMedDB including 68 types of relations and 133 pairs of subject/object. After the rule-based filtering process described in the Preprocessing section, there are 2,811,329 triples left with a total of 128,177 subjects and objects. With further BERT-based filtering, 1,672,110 triples and 128,177 objects/subjects are left. After deduplicating triplets before training of graph embedding algorithms, there are 791,827 triples and 128,177 objects/subjects left. All 791,827 triples are split into 649,924/113,031/28,872 as training/test/validation sets respectively. The split is done in a way that we can use data from 2019 to 2020 to validate our model and triples before 2019 as the training set and triples after 2020 as the test set. Table 1 shows the performance of three widely used graph completion methods that are trained on our knowledge graph: TransE is based on translational distance and DistMult and ComplEx are based on semantic information. We can see that the TransE model performs the best among all these graph embedding algorithms with a Mean Rank (MR) of 10.53 and a Hit Ratio of 10 (Hits@10) 0.58. We then use TransE model for the prediction of potential candidates. Specifically, the final model embeds nodes into a size of 250 with a learning rate of 0.01 with an L2 distance metric. We found that some potential candidates might be relevant to AD prevention and treatment. Based on the training data and our scoring function, we identified the top-ranked subjects that connect with AD-related concepts with predicates treat or prevent. Tables 2, 4 , and 6 show the top 10 entities according to their numbers of appearances for the drug, chemical, and Dietary Supplement categories respectively. For the treatment relation, We were able to find evidence supporting seven out of ten entities ( Table 2 ) and six out of ten triples (Table 3 ) through related literature and clinical trials for triples. All drugs appear in Table 4 appear in Table 2 while Table 2 has some extra drugs: Local corticosteroid, acyclovir, metronidazole, Cam, and Dexamethasone. Specifically, corticosteroids might become part of a multi-agent regimen for Alzheimer's disease and also have applications for other neurodegenerative disorders [16] . Our model indicates that Valacyclovir, an antiviral medication might also have an effect in AD/ADRD prevention. While we did not find evidence that Acyclovir is directly related to AD/ADRD, a recent study shows that Valacyclovir Antiviral therapy could be used to reduce the risk of dementia [17] . A study demonstrated that antibiotic (ABX) cocktail-mediated perturbations (high dose kanamycin, gentamicin, colistin, metronidazole, vancomycin) of the gut microbiome in two independent transgenic lines leads to a reduction in Aβ deposition in male mice and underlie the observed reductions in brain amyloidosis, which is the hallmark of Alzheimer's Disease. [18] . Tacrolimus [19] has been in phase two clinical trial which investigates neurobiological effect in persons with MCI and dementia starting 12/1/2021. Early study also indicated that high doses of prednisolone have the effect of reducing amyloid reduction which resulted in some delay of the cognitive decline [20] [21] . Propranolol [22] has shown efficacy in reducing cognitive deficits in Alzheimer's transgenic mice. According to Joseph [16] , a short pulse of high dose intrathecal methylprednisolone, dexamethasone or triamcinalone will result in detectable slowing of Alzheimer's disease. As for the prevent relation, we found evidence that supports seven among ten triple predictions (Table 3 ) and all drugs in this table also appear in the Table 3 . For example, a recent study in 2021 shows that Amifostine, which appears in our top 4 triple predictions, could mitigate cognitive injury induced by heavy-ion radiation [23] . Betaine could be a promising candidate for arresting Hcy-induced D-like pathological changes and memory deficits [24] . Mazurek et al. show that Oxytocin could interfere with the formation of memory in experimental animals and contribut to memory disturbance associated with Alzheimer's disease [25] . For the treat relationship prediction, we found supporting evidence for seven out of the top ten entities (Table 4 ) and eight out of the top ten triple predictions (Table 5 ). For the treat relations, Table 4 and Table 5 have some overlaps: Amifostine, Chlorhexidine, Amiloride, Etazolate, and licopyranocoumarin. As we discussed in the Drug section, Amifostine, which appears in our top 1 triple predictions, could mitigate cognitive injury induced by heavy-ion radiation [23] . Moreover, a study has shown that oral pathogens in some circumstances can approach the brain, potentially affecting memory and causing dementia [26] . Since chlorhexidine could be used to reduce Methicillin-resistant Staphylococcus aureus (MRSA) to improve oral health, it might be a potential candidate for the treatment of Alzheimer's Disease. Several studies mentioned the neuroprotective activity of Tetracycline and its derivatives [27] [28] . Amiloride is a Na+/H+ exchangers (NHEs), which is proved to be associated with the development of mental disorders or Alzheimer's disease [29] . In addition, we found in an earlier clinical trial that Etazolate was used to moderate AD [30] . Licopyranocoumarin, as a compound from herbal medicine, was proved to have neuroprotective effect to Parkinson disease [31] . Dexrazoxane and Forskolin only appear in Table 4 . A study in 2019 implies that Dexrazoxane may serve as an effective neuroprotectant to treat neurodegeneration and has potential clinical value in term of PD therapeutics [32] . Forskolin shows neuroprotective effects in APP/PS1 Tg mice and may be a promising drug in the treatment of patients with AD [33] . In addition, Tetracyline and proparglyamine only show up in Table 5 . There are several studies mentioned that the neuroprotective activity of Tetracycline and its derivatives [27] [28] . Propargylamine was discussed on its beneficial effects and pro-survival/neurorescue inter-related activities relevant to Alzheimer's Disease in several studies [34] [35] . For prevention relation, we found six out of ten triples that are related to AD and all six corresponding chemicals also appear in Table 4 . Recent studies show that antibiotic chemicals such as Fluoroquinolones, Amoxicillin, Clarithromycin, and Ampicillin can produce therapeutic effects to Alzheimer's Disease [36] [37] . Although we have not found that Cortisone has a direct effect on Alzheimer's Disease, common anti-inflammatory drugs do have some treatment effects [38] . Earlier study has shown that allopurinol has treatment of aggressive behaviour in patients with dementia [39] . In addition, Ceftriaxone(CEF) appears in Table 4 . It significantly attenuated amyloid deposition and neuroinflammatory response and a study has confirmed the potential of CEF as a promising treatment against cognitive decline from the early stages of AD progression [40] . Since there is little evidence that food can directly treat or prevent the Alzheimer's Disease, we focus on the triples with affect relationships. In the rank of the top 10 predictions of Table 7 , we found dietary fiber (three times), tea (three times), rice, and honey all have the possibility to reduce the risk of AD/ADRD and they also appear in Table 6 . Dietary fiber has the potential that protects impact on brain Aβ burden in older adults and the finding may assist in the development of dietary that prevent AD onset [41] . Moreover, according to [42] , green tea intake might reduce the risk of dementia and cognitive impairment. Another study shows that honey can be a rich source of cholinesterase inhibitors and therefore may play a role in AD treatment [43] . Previous studies have also shown that dietary choline intake (e.g. eggs (egg yolk) and fruits) are associated with better outcomes on cognitive performance [44] . Increasing dietary intake of minerals could also reduce the risk of dementia. For example, research found a link between potassium levels and diagnosis of cognitive impairment in Mexican-Americans. [45] . In addition, one recent study indicates that highly water pressurized brown rice could ameliorate cognitive dysfunction and reduce the levels of amyloid-β, which is a major protein responsible for AD/ADRD [46] . Coffee drinking may be associated with a decreased risk of dementia/AD. This may be mediated by caffeine and/or other mechanisms like antioxidant capacity and increased insulin sensitivity. [47] Existing literature provides a reasonably strong scientific rationale to encourage testing whether ketamine (or its metabolites) has procognitive effects on Alzheimer's patients. [48] . Last but not least, based on the available literature, a nutraceutical formulation containing N-acetylcysteine among other compounds has shown some pro-cognitive benefits in Alzheimer's patients [49] . In this study, we built a framework to construct and analyze a knowledge graph that links AD/ADRD-related biomedical knowledge from PubMed to facilitate drug repurposing. More specifically, we focused on identifying potentially new relationships between AD/ADRD and chemical, drug and food supplements respectively. Our analysis indicated that the pipeline can be used to identify biomedical concepts that are semantically close to each other as well as to reveal relationships between biomedical elements and diseases of interest. Linking sparse knowledge from fastgrowing literature would be beneficial for existing knowledge/information retrieval, and may promote uncovering of new knowledge. This framework is flexible and can be used for other applications such as multi-omics applications, therapeutic discovery, and clinical decision support for neurodegenerative diseases as well as other diseases. The knowledge graph we constructed can facilitate data-driven knowledge discovery and new hypothesis generation. A breadth of possibilities exists to further improve this framework. First, our knowledge graph leveraged SemMedDB, an existing database that contains triples extracted from PubMed article. While we tried to improve the accuracy using a BERT-based approach, other NLP techniques could be implemented to further improve the accuracy of information extraction. Second, in addition to include knowledge extracted from literature, we could also incorporate triples from wellacknowledged biomedical databases to further enrich the knowledge graph. Third, we leveraged three state of the art knowledge graph embedding models in this research. In the future, we will investigate new strategies to extend embeddings to cope with sparse and unreliable data as well as multiple relationships. Last but not least, we only focused on the top 10 ranked triples for evaluation in this paper. We were able to identify supporting evidence for most of them, which indicates that our approach can inform reliable new knowledge. In addition, we only incorporate 2.8M triples for our knowledge graph due to computational resource limits, further investigation needs to be done on additional triples, which could potentially lead to new hypotheses for AD treatment and prevention. We constructed a knowledge graph using biomedical concepts and relations extracted from PubMed literature using NLP tools. The extracted triples were then further filtered based on statistics and NLP models. The rest of the subject-relationobject triples were used to build the knowledge graph. We then applied graph embedding algorithms to identify potential candidates for AD treatment and prevention. An overview of this is also described in Figure 1 . To construct the knowledge graph, we directly obtained triples from SemMedDB [50] , which is a database of triples that are automatically extracted from the biomedical literature using Natural Language Processing (NLP) tools through SemRep [51] . Subject and object arguments are normalized to concepts defined in the UMLS with unique identifiers (CUIs). The triples are in the form of subject-predicate-object. The original data directly obtained from the SemMedDB contained a large number of triples, but not all of them are useful for finding candidates for AD/ADRD treatment/prevention. We applied rules that are similar to [8] to exclude unrelated subject/object and predicate types. More specifically, we eliminated triples involving generic biomedical concepts such as Activities & Behaviors, Concepts & Ideas, Objects, Occupations, Organizations, and Phenomena. The rest of the triples were eliminated based on their degree of centrality (A in , A out ) and G 2 score that indicates the strength of association between a subject and an object. Specifically, the degree centrality(A in , A out ) was calculated with the adjacency matrix M as: And the G 2 score is calculated from the statistical relation between two contingency tables: Observation table and Expectation table. [52] where O ijk represents the items in the observation table and represents the items in the expectation table. At last, these three scores were normalized to [0, 1] and summed up into a final score. To keep the knowledge graph in a reasonable size that the graph embedding algorithms could handle, we only kept about 2.5M triples. In order to ensure that AD-related triples are included in the knowledge graph, we kept all triples that are related to Alzheimer's Diseases terms in the UMLS during triple elimination using the above criteria. Table 9 in the additional file section summarizes the AD-related UMLS concepts we kept in this process. At last, we have 2.8M triples left in our knowledge graph. We leveraged about 6,000 annotations from a previous study [15] and used them as the training data for the PubMedBERT fine-tuning. These annotations were manually labeled with 1 or 0, where 1 indicates that the triples and their relationships do exist and are correct (triples labeled with 1); and 0 means that the triples do not exist or are incorrect (triples labeled with 0). PubMedBERT took the text input of subject, object, predicate type as well as the sentence that these were extracted from. The model obtained an F-1 score of 0.82, Recall of 0.91 and Precision of 0.75 on the validation set; and F-1 score of 0.83, Recall of 0.89 and Precision of 0.78 on these annotations. Knowledge graph embedding is a promising approach to graph completion tasks [53] . It embeds entities and relations into vector space to evaluate the probability that a given triplet (h,r,t) is true through a scoring function. We leveraged three popular knowledge graph embedding methods, TransE, DistMult and ComplEx for our knowledge graph completion task. To train this knowledge graph, these three models do negative sampling by corrupting triplets (h,r,t) to either form (h',r,r) or (h,r,t'), where h' and t' are the negative samples. Therefore, if y=±1 is the label for positive and negative triplets and f is the scoring function, then the logistic loss is computed as according to [54] : Tables Bilinear Function h T Re(diag(r))t TransE [55] is one of the earliest translational distance models. The model projects head, tail and relations into the same space where the relation is interpreted as a translation vector r so that the head and tail can be connected by relations with low error. And the score function is the negative of the distance of this error as shown in Table 8 . TransE does have disadvantages in dealing with 1-to-N, N-to-1, and N-to-N relations. For example, if Alzheimer's Disease could be affected by different food supplements, then TransE model might learn similar results for all these food supplements. Semantic matching models like DistMult [56] use similarity-based scoring functions that associate each entity with a vector to capture its latent semantics. In this model, each relation is represented as a diagonal matrix which models pairwise interactions between latent factors by a bilinear function as shown in Table 8 . Since the scoring function of DistMult is symmetric in terms of h and t, the function cannot handle asymmetric relationships. Complex Embeddings (ComplEx) [57] introduces complex-valued embeddings to solve this problems. Specifically, the scoring function can be expanded as: Candidates scoring for repurposing We focused on three kinds of predictions for the candidate selection in this research: dietary supplements candidates, chemical candidates, and clinical drug candidates. The clinical drug and chemical categories were extracted from the UMLS and we used the iDISK [15] as a reference for dietary supplements. For each type of candidates, the model iterates over all possible triples, i.e. (h i ,r i ,t k ), and h i ∈ all nodes for particular type of candidates ,r j ∈ all relations, and t k ∈ all nodes related to Alzheimer's Disease. In knowledge graph embedding-based approaches, the scoring function φ(h, r, t) is defined in terms of the embeddings of entities and relations; i.e., h, r, and t are embedded into vector space, and φ is defined in terms of operations or scoring functions over these objects. They all project the node and entities to lowerdimensional embeddings but with different scoring functions. TransE simply uses the distance between the embeddings of the head, sum with the relation embedding and tail as the scoring function, while DistMult and ComplEx use bilinear map to define scoring functions. For drugs and chemicals, we used two types of relations (i.e., treat and prevent) for prediction in this paper since the focus of the paper is drug repurposing. For dietary supplements, on the other hand, we focus on the "affect" relationship since it might be relatively challenging to detect top-ranked direct relationships between dietary supplements and AD treatment/prevention. We leveraged the time-slicing technique that is commonly used in literature mining [58] to evaluate our triple prediction approach. We trained all three models using data before 1/1/2019 to see whether we can predict triples that were first published after this date. Neurodegenerative diseases Latest research and news Neurodegeneration and alzheimer's disease (ad). what can proteomics tell us about the alzheimer's brain? Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm Drug repositioning: identifying and developing new uses for existing drugs A review of computational drug repurposing Drug repurposing: progress, challenges and recommendations A review of biomedical datasets relating to drug discovery: A knowledge graph perspective Drug repurposing for covid-19 via knowledge graph completion Drug repurposing for the treatment of covid-19: A knowledge graph approach. Epub ahead of print Knowledge graph-based approaches to drug repurposing for covid-19 A literature-based knowledge graph embedding method for identifying drug repurposing opportunities in rare diseases Drug prioritization using the semantic properties of a knowledge graph Nutrition, brain aging, and neurodegeneration iDISK: the integrated DIetary Supplements Knowledge base Intrathecal corticosteroids might slow alzheimer's disease progression. Neuropsychiatric disease and treatment Antiviral therapy: Valacyclovir treatment of alzheimer's disease (valad) trial: protocol for a randomised, double-blind,placebo-controlled, treatment trial Synergistic depletion of gut microbial consortia, but not individual antibiotics, reduces amyloidosis in appps1-21 alzheimer's transgenic mice A Pilot Open Labeled Study of Tacrolimus in Alzheimer's Disease Intrathecal corticosteroids might slow alzheimer's disease progression The amyloid cascade hypothesis in alzheimer's disease: It's time to change our mind Propranolol reduces cognitive deficits, amyloid and tau pathology in alzheimer's transgenic mice Amifostine (wr-2721) mitigates cognitive injury induced by heavy ion radiation in male mice and alters behavior and brain connectivity Betaine attenuates alzheimer-like pathological changes and memory deficits induced by homocysteine Oxytocin in alzheimer's disease: postmortem brain levels Oral microbiota and alzheimer's disease: Do all roads lead to rome? Impact of minocycline on neurodegenerative diseases in rodents: a meta-analysis Tetracycline repurposing in neurodegeneration: focus on parkinson's disease Implications of sodium hydrogen exchangers in various brain diseases A Study to Determine the Clinical Safety/Tolerability and Exploratory Efficacy of EHT 0202 as Adjunctive Therapy to Acetylcholinesterase Inhibitor in Mild to Moderate Alzheimer's Disease (EHT0202/002) Identification of licopyranocoumarin and glycyrurol from herbal medicines as neuroprotective compounds for parkinson's disease Antioxidant and anti-inflammatory effects of dexrazoxane on dopaminergic neuron degeneration in rodent models of parkinson's disease Protective effects of forskolin on behavioral deficits and neuropathological changes in a mouse model of cerebral amyloidosis Propargylamine-derived multi-target directed ligands for alzheimer's disease therapy The novel multitarget iron chelating and propargylamine drug m30 affects app regulation and processing activities in alzheimer's disease models Association between antibiotic treatment of chlamydia pneumoniae and reduced risk of alzheimer dementia: A nationwide cohort study in taiwan Antibiotics, gut microbiota, and alzheimer's disease Aspirin, steroidal and non-steroidal anti-inflammatory drugs for the treatment of alzheimer's disease. Cochrane Database Syst Rev Allopurinol for the treatment of aggressive behaviour in patients with dementia Neuroprotective effects of ceftriaxone involve the reduction of aβ burden and neuroinflammatory response in a mouse model of alzheimer's disease Associations of dietary protein and fiber intake with brain and blood amyloid-β Green tea intake and risks for dementia, alzheimer's disease, mild cognitive impairment, and cognitive impairment: A systematic review Honey as the potential natural source of cholinesterase inhibitors in alzheimer's disease Associations of dietary choline intake with risk of incident dementia and with cognitive performance: the kuopio ischaemic heart disease risk factor study The link between potassium and mild cognitive impairment in mexican-americans Highly water pressurized brown rice improves cognitive dysfunction in senescence-accelerated mouse prone 8 and reduces amyloid beta in the brain Caffeine as a protective factor in dementia and alzheimer's disease Ketamine: A neglected therapy for alzheimer disease Evaluation of the neuroprotective potential of n-acetylcysteine for prevention and treatment of cognitive aging and dementia Semmeddb: a pubmed-scale repository of biomedical semantic predications Broad-coverage biomedical relation extraction with semrep Extending the log-likelihood measure to improve collocation identification Learning entity and relation embeddings for knowledge graph completion Dgl-ke: Training knowledge graph embeddings at scale Translating embeddings for modeling multi-relational data Embedding entities and relations for learning and inference in knowledge bases Complex embeddings for simple link prediction Literature based discovery: Models, methods, and trends Funding Publication costs are funded by the National Institute of the Aging of NIH under Award Number RF1AG072799. This research was supported by NIH grants under Award Numbers RF1AG072799, R01AI130460, and R01AT009457. Abbreviations AD -Alzheimer's Disease MR -Mean Rank Hits@1/3/10 -Hit ratio at one/three/ten ADRD -Alzheimer's disease and related dementias NLP -natural language processing SVM -support vector machine RF -random forest The authors declare that they have no competing interests. Author's contributions CT conceived the research project. YN, JD and CT designed the pipeline and method. YN implemented the deep learning model of the study and prepared the manuscript. JF and FL conducted the result interpretation. XH and LB prepared the data and proceed the pipeline. RZ, YC, and YZ provided expertise and suggestions on data filtering and model design especially for the dietary supplement data. All authors proofread the paper and provided valuable suggestions. All the authors have read and approved the final manuscript.