key: cord-0810117-9p95k43s authors: Sato, Kenichiro; Ihara, Ryoko; Suzuki, Kazushi; Niimi, Yoshiki; Toda, Tatsushi; Jimenez‐Maggiora, Gustavo; Langford, Oliver; Donohue, Michael C.; Raman, Rema; Aisen, Paul S.; Sperling, Reisa A.; Iwata, Atsushi; Iwatsubo, Takeshi title: Predicting amyloid risk by machine learning algorithms based on the A4 screen data: Application to the Japanese Trial‐Ready Cohort study date: 2021-03-24 journal: Alzheimers Dement (N Y) DOI: 10.1002/trc2.12135 sha: 9ca25ceecb40be2cd5475612fecda895118d07ae doc_id: 810117 cord_uid: 9p95k43s BACKGROUND: Selecting cognitively normal elderly individuals with higher risk of brain amyloid deposition is critical to the success of prevention trials for Alzheimer's disease (AD). METHODS: Based on the Anti‐Amyloid Treatment in Asymptomatic Alzheimer's Disease study data, we built machine‐learning models and applied them to our ongoing Japanese Trial‐Ready Cohort (J‐TRC) webstudy participants registered within the first 9 months (n = 3081) of launch to predict standard uptake value ratio (SUVr) of amyloid positron emission tomography. RESULTS: Age, family history, online Cognitive Function Instrument and CogState scores were important predictors. In a subgroup of J‐TRC webstudy participants with known amyloid status (n = 37), the predicted SUVr corresponded well with the self‐reported amyloid test results (area under the curve = 0.806 [0.619–0.992]). DISCUSSION: Our algorithms may be usable for automatic prioritization of candidate participants with higher amyloid risks to be preferentially recruited from the J‐TRC webstudy to in‐person study, maximizing efficiency for the identification of preclinical AD participants. by amyloid positron emission tomography (PET) or lowered levels of Aβ 42 in the cerebrospinal fluid (CSF), 8 there has been an increasing concern about the labor and cost of eligibility screening for amyloid status. 9, 10 Importantly, individuals with preclinical AD cannot be identified through memory clinics because of the lack of symptoms and motivation to visit hospitals. Thus, there has been a compelling need for a sustainable system that facilitates efficient recruitment of eligible asymptomatic amyloid-positive participants, who are willing to be enrolled in AD clinical trials. 11 Recently, there have been a couple of worldwide movements to build cohorts of preclinical AD individuals who are eligible for clinical trials of disease-modifying therapy (DMTs) for AD. Among these projects, the Trial-Ready Cohort for Preclinical/Prodromal Alzheimer's Disease (TRC-PAD) in the United States has applied an innovative, twolayered structure consisted of a web-based feeder registry (APT webstudy), from which eligible individuals are referred for in-person, clinical, PET, and biomarker assessments, to systematically screen participants who have high risks for elevated brain amyloid deposition and construct a trial-ready cohort (TRC-PAD) for prevention trials. [12] [13] [14] In Japan, we have started a close collaboration with the TRC-PAD team and adopted the basic framework of the webstudy and TRC-PAD. In the Japanese Trial-Ready Cohort (J-TRC) for preclinical and prodromal AD launched in October 2019, cognitive normal elderly volunteers are at first invited to register to the J-TRC webstudy at home by themselves and provide basic demographics, to be monitored for their web-based cognitive performance every 3 months. Among the J-TRC webstudy population, those who may have a higher probability for brain amyloid deposition are further referred to the in-person, J-TRC on-site study. As of summer 2020, more than 3000 elderly volunteers have registered in the J-TRC webstudy within the first 9 months since its launch, despite the global impact of COVID-19 outbreak, and ≈50% of the registrants have been repeating the scheduled remote cognitive tests. To recruit eligible individuals for the J-TRC onsite study, we need to identify J-TRC webstudy participants with a higher likelihood to have elevated brain amyloid deposition or a higher risk of cognitive decline (Figure 1) , in reference to the risk factors clarified by earlier preclinical AD studies such as having APOE ε4 allele(s), 6,15-17 older age, 6, 15, 17 family history of dementia, 6 worse Preclinical Alzheimer Cognitive Composite (PACC) score at screening, 6 or worse serial change in Cognitive Function Instrument (CFI), 6 along with the results of online cognitive tests. Importantly, establishing machine learning-based algorithms by incorporating these potential risk factors 9,10,14 will greatly help us to determine at which priority we should invite the individual webstudy participants. At the phase of J-TRC webstudy that is conducted totally online without in-person visits, however, we cannot use some of the important associated factors (e.g., APOE genotype or PACC scores), and have to rely solely on the demographic data of age, sex, family history of dementia or AD, education years, current employment status, degree of alcohol intake, degree of regular exercise, as well as the online cognitive scores of CFI and CogState. The limitations in the kinds of available data, as well as the lack of reference amyloid results from the J-TRC onsite study, will inevitably lessen the predictive performance. PubMed database was searched to identify large-scale web-based clinical study trying to identify those with preclinical Alzheimer's disease (AD) to enroll them into the Trial-Ready Cohort for future AD prevention clinical trials. The present study was identified as the first attempt conducted in the Japanese population using the data from the Japanese Trial-Ready Cohort (J-TRC) webstudy. 2. Interpretation: Our models using Anti-Amyloid Treatment in Asymptomatic Alzheimer's Disease trial data predicted amyloid burden in each J-TRC webstudy participant, and the predicted amyloid accumulation corresponded well with the self-reported prior amyloid status in a small subgroup of J-TRC webstudy participants. 3. Future directions: Our prediction algorithms may be usable for automatic prioritization of candidate participants with higher amyloid risks to be preferentially recruited from the J-TRC webstudy to the in-person study, and maximize the efficiency for the identification of preclinical AD participants. The Anti-Amyloid Treatment in Asymptomatic Alzheimer's Disease (A4) study 18 is a phase 3 randomized, double-blind, placebo-controlled secondary prevention trial of solanezumab versus placebo in clinically normal older individuals with evidence of elevated Aβ on screening PET being conducted at 67 sites in the United States, Canada, Australia, and Japan. The initial screening data of the A4 study were recently made publicly available for AD studies, 6 which encompass most of the corresponding variables as the J-TRC webstudy, including two Cogstate tests performed 2 to 3 months apart prior to randomization and CFI, as well as the standard uptake value ratio (SUVr) by 18 F-florbetapir amyloid PET, providing us with the ideal training reference for developing algorithms to predict the amyloid risks in asymptomatic elderly individuals. In this study, we describe our attempts to establish machine-learning algorithms based on the A4 screening data and apply the predicted SUVr calculated from the variables available in the J-TRC webstudy, to the efficient recruitment of the participants to the J-TRC onsite study by prioritizing the invitation to those who potentially have the highest risks for elevated amyloid in brain. The following data handling and analyses were performed using R 3.5.1 (R Foundation for Statistical Computing). The J-TRC study for F I G U R E 1 Schematic outline of the J-TRC study. Schematic outline of the J-TRC study. Cognitively normal volunteers of 50 to 85 years participate in the J-TRC webstudy by web-based remote cognitive assessment of CFI and CogState every 3 months (A). Those who may have an increased risk of elevated amyloid are further referred to the J-TRC onsite study (B), to conduct detailed assessment including cognitive functions and amyloid status. The J-TRC onsite study eventually aims to build a large (e.g., n > 300) Japanese cohort of asymptomatic, amyloid-positive (i.e., preclinical AD) cases being ready for clinical trials of disease modifying drugs in Japan Figure 1A) , and those who are predicted to have an increased risk of elevated brain amyloid or cognitive decline will further be referred to the J-TRC onsite study ( Figure 1B) , to conduct detailed inperson cognitive assessments, APOE genotyping, blood biomarker testing, and determination of brain amyloid status by amyloid PET. The J-TRC onsite study, which is designed based on the TRC-PAD in-person study in the United States, eventually aims to build a large (e.g., n > 300) Japanese cohort of preclinical AD individuals being ready for clinical trials in Japan ( Figure 1C ). We reviewed the datasets of the J-TRC webstudy participants who registered from October 31, 2019 to June 17, 2020, comprising 4429 registered in total (whether eligible or not). General inclusion criteria in this analysis were defined as follows: participants who completed the registration and demographics input, gave informed consent for study participation, have no prior history of being diagnosed with dementia or AD, and are between 50 and 85 years at the time of registration. We used the following clinical and cognitive features from the J-TRC webstudy data, which are available in common in the A4 screening and J-TRC webstudy datasets, to include in the predictive models: age, sex (male or female: binary), education years, with/without family (either parents or siblings) history of AD or dementia (binary), online CFI score completed by study participants under an unsupervised condition at screening, and online CogState total score completed up to two times (second at 3 months after initial CogState). We included serial CogState scores because of the potential usefulness of "loss of practice effect" in the cognitive scores of amyloid-positive participants. 16 We converted the final education of each participant to numerical education years as follows: graduated from high school = 12 years, graduated from university/college = 16 years, and graduated from postgraduate school = 18 years. We eventually included n = 3081 unique eligible cases from the J-TRC webstudy cohort. We used the screening datasets of the A4 study obtained from the Laboratory of Neuro Imaging (LONI) (https://ida.loni.usc.edu) in October 2019 with the approval of the data access committee. As a target to predict, the degree of amyloid accumulation in the A4 study cohort, as represented by the SUVr (value corresponding to the "Composite_Summary" in the "A4_PETSUVR.csv" file) was used: the threshold ≥ 1.15 was used to define elevated brain amyloid 6 (visual evaluation was not taken into account). We used the clinical and cognitive features at the screening stage of the A4 study as obtained from the J-TRC webstudy data. The CogState score was obtained from the two Computerized Cognitive Composite tests conducted during screening (first time at screening visit 1, the second at screening visit 3 prior to amyloid disclosure). The Z score of each of the following items in CogState normalized within the eligible A4 cases, that is, log response time in Detection, log response time in Identification, accuracy in One Card Learning, and accuracy in One Back, 20 was calculated, and the four Z scores were summed to obtain the total CogState score. The intervals between the two CogState F I G U R E 2 Processing workflow of our study. Because the A4 study is mostly composed of participants with non-Asian race, while the participants in J-TRC webstudy are Asian (Japanese), we at first built a model fitted to either the A4 non-Asian training subgroup (A[a]) or the A4 random-split training subgroup (A[b]), then evaluated its performance on the A4 test subgroup (B), and applied the model to the J-TRC webstudy participants (C & D). To evaluate the predictive performance on the A4 test subgroup (B), we calculated MAE and RMSE. The consistency of the informed previous amyloid results with the predicted SUVr (C) was tested by AUC in a subset of J-TRC webstudy participants who reported the results. A4, Anti-Amyloid Treatment in Asymptomatic Alzheimer's Disease; AUC, area under the curve; J-TRC, Japanese Trial-Ready Cohort; MAE, mean absolute error; RMSE, root mean squared error; SUVr, standard uptake value ratio screenings at visit 1 and visit 3 in the A4 study were estimated to be ≈3 months, which were close to those in the J-TRC webstudy, although those in the A4 study might be slightly shorter by study protocol. We also used participants' racial data to separate the whole A4 screening data into the non-Asian and Asian subgroup datasets. Samples with missing data in the above modeling features were excluded from the analysis. Eventually, we included n = 4446 unique eligible cases from the A4 screening cohort. In this study we intend to build a prediction model for the degree of Aβ deposition by fitting to the A4 screening data as a training reference and to apply the model to the J-TRC webstudy data, to obtain predicted SUVr in each registered J-TRC webstudy participant. Because the A4 study is mostly composed of participants with non-Asian race, whereas the participants in J-TRC webstudy are mostly Japanese, we used two different types of data splitting into the training and test datasets: non-Asian training subgroup and Asian test subgroup (Figure 2A [a]), and randomly split training subgroup and test subgroup (Figure 2A[b] ). We at first built a model fitted to the A4 training subgroup (Figure 2A Noets: The number of the J-TRC webstudy participants represents that of those registered between October 31, 2019 and July 17, 2020. * The variable of race (i.e., Asian or not here) was not included in case of the race-based data splitting (Figure 2A have participated in the A4 study screening conducted in Tokyo, Japan, we could not confirm how they actually knew their own amyloid status, due to the webstudy data specifications. The J-TRC webstudy has been approved by the University of Tokyo Basic characteristics are shown in Table 1 , revealing some differences among the included three cohorts (A4 non-Asian subgroup, A4 Asian subgroup, and the whole J-TRC webstudy). We also listed the two sub- As plotted in Figure 3 , the predictive performance of the models evaluated on the A4 non-Asian test subgroup was generally limited: the MAE was ≈0.10-0.125 ( Figure 3A ) and the RMSE was ≈0.15 ( Figure 3A) , regardless of the type of models (x-axis in Figure 3 ) or the algorithms (in drawn lines). The correlation coefficients ( Figure 3A ) in the models including CFI (i.e., models 2, 4, and 5) were higher than those in other models (i.e., models 1 and 3), especially in the algorithms of GLM, Elas-ticNet, GBM, and XGB. The predictive performance evaluated on the A4 random-split test subgroup also showed similar performance distribution ( Figure 3B ). Therefore, we mainly used the algorithm of GLM for the following calculations, because it is conventional and simple to calculate. Figure S1 in supporting information shows an example of the Y-Y plot between the predicted SUVr versus true SUVr in the race-based data splitting (i.e., non-Asian and Asian here; Figure S1A ) or in the random-splitting ( Figure S1B) , showing a significant but low level of association in either cases. In addition, being consistent with such slight differences depending on the type of model, age, CFI scores, and family history were the top three important variables in the A4-fitted, GLMbased model ( Figure S2 in supporting information). The variable importance revealed that the CogState score (score on the first time, and the difference in scores between the second and the first) also is valid as a predictive variable, although its significance is lower than that of the above three variables. We then obtained the predicted SUVr on each of the J-TRC webstudy participants ( Figure 2E Figure 4B ). In this study, we built predictive models for the degree of amyloid depo- It should, however, be noted that the model achieved relatively suboptimal performance even upon predicting the SUVr in the A4-Asian ) and random-splitting ( Figure 2A[b] )-both result in similar distribution in the performance metrics across different models/algorithms ( Figure 3 ). This suggests that the models including CFI (i.e., models 2, 4, and 5 in Table 2 ) with the GLM algorithm consistently yielded good prediction performance regardless of the variability between the training and test datasets, and these models/algorithm settings might be a rough basis in the future actual training of predictive models within the J-TRC study participants only. Although the prediction performance of the models in the A4 test subgroup was relatively poor (R ≈0.2-0.3), we consider that this does not always matter because the main purpose of our current study was to seek for a good prediction model available for the J-TRC webstudy participants, but not for A4 screening cases. Our current approach has some limitations. First, the demographic registration to the webstudy is based on self-reporting, so the registered data are not validated, especially on the accuracy of the previous amyloid tests. Second, the age distribution of the A4 screening cohort (65-85 years old as inclusion criteria) is significantly older than that of the J-TRC webstudy (50-85 years old as inclusion criteria), which might lessen the applicability of A4-fitted models to J-TRC cases, especially to younger participants between 50 and 65 years of age. Third, although we excluded those who noted a prior diagnosis of dementia or AD (247 among 3365 [7.4%] of those with eligible age, registration completed, and consent given) from the J-TRC webstudy population, they are not qualified for not having dementia. And fourth, simply prioritizing those with too-high CFI scores (e.g., CFI > 10) for invitation may lead to an increase in the number of individuals with mild cognitive impairment or dementia among the participants invited to J-TRC onsite study; this may facilitate the inclusion of prodromal AD, but might confound the prediction of asymptomatic amyloid-positive (i.e., preclinical AD) participants. We may need to examine whether we can define the eligible range of CFI for more-reliable recruitment of the asymptomatic amyloid-positive participants. To conclude, we described our current provisional attempts that will automatically extract the list of candidate participants of the J-TRC webstudy to identify who should be preferentially invited to the inperson, J-TRC onsite study, to increase the predictability of amyloidpositive, asymptomatic individuals. To make this invitation process more systematic and efficient, we will continue to update the predicting models along with the progress of the identification of amyloidpositive individuals in the J-TRC onsite study, to confirm and secure the validity of this approach. This study was supported by Japan Agency for Medical Research and Development grants JP17DK0207028 and JP19DK0207048. The authors have no conflicts of interest to report. Toward defining the preclinical stages of Alzheimer's disease: recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease Tracking pathophysiological processes in Alzheimer's disease: an updated hypothetical model of dynamic biomarkers Australian Imaging, Biomarkers, and Lifestyle Flagship Study of Ageing; Alzheimer's Disease Neuroimaging Initiative Alzheimer's Disease Cooperative Study. The preclinical Alzheimer cognitive composite: measuring amyloidrelated decline The National Institute on Aging-Alzheimer's Association Framework on Alzheimer's disease: application to clinical trials Prevalence of Cerebral Amyloid Pathology in Persons Without Dementia Association of Factors With Elevated Amyloid Burden in Clinically Normal Older Individuals Japanese and North American Alzheimer's Disease Neuroimaging Initiative studies: harmonization for international trials A/T/N: an unbiased descriptive classification scheme for Alzheimer disease biomarkers Assessing risk for preclinical β-amyloid pathology with APOE, cognitive, and demographic information Alzheimer's Disease Neuroimaging Initiative* and the INSIGHT-preAD study. Reduction of recruitment costs in preclinical AD trials: validation of automatic pre-screening algorithm for brain amyloidosis Global Alzheimer's Platform Trial Ready Cohorts for the Prevention of Alzheimer's Dementia The Trial-Ready Cohort for Preclinical/Prodromal Alzheimer's Disease (TRC-PAD) Project: an Overview TRC-PAD: accelerating Recruitment of AD Clinical Trials through Innovative Information Technology Predicting Amyloid Burden to Accelerate Recruitment of Secondary Prevention Clinical Trials The ARIC-PET amyloid imaging study: brain amyloid differences by age, race, sex, and APOE Clinical and cognitive characteristics of preclinical Alzheimer's disease in the Japanese Alzheimer's Disease Neuroimaging Initiative cohort Prescreening for European Prevention of Alzheimer Dementia (EPAD) trial-ready cohort: impact of AD risk factors and recruitment settings The A4 study: stopping AD before symptoms begin? Tracking early decline in cognitive function in older individuals at risk for Alzheimer disease dementia: the Alzheimer's Disease Cooperative Study Cognitive Function Instrument Performance of the CogState computerized battery in the Mayo Clinic Study on Aging caret: Classification and Regression Training A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models Machine Learning Evaluation Metrics Predicting amyloid risk by machine learning algorithms based on the A4 screen data: Application to the Japanese Trial-Ready Cohort study