key: cord-0935078-vq4djoi6
authors: Stricker, Nikki H.; Stricker, John L.; Karstens, Aimee J.; Geske, Jennifer R.; Fields, Julie A.; Hassenstab, Jason; Schwarz, Christopher G.; Tosakulwong, Nirubol; Wiste, Heather J.; Jack, Clifford R.; Kantarci, Kejal; Mielke, Michelle M.
title: A novel computer adaptive word list memory test optimized for remote assessment: Psychometric properties and associations with neurodegenerative biomarkers in older women without dementia
date: 2022-03-09
journal: Alzheimers Dement (Amst)
DOI: 10.1002/dad2.12299
sha: 8434bb0f57cb11e3aa453b6f09ff354eb7ef2f6f
doc_id: 935078
cord_uid: vq4djoi6

INTRODUCTION: This study established the psychometric properties and preliminary validity of the Stricker Learning Span (SLS), a novel computer adaptive word list memory test designed for remote assessment and optimized for smartphone use. METHODS: Women enrolled in the Mayo Clinic Specialized Center of Research Excellence (SCORE) were recruited via e‐mail or phone to complete two remote cognitive testing sessions. Convergent validity was assessed through correlation with previously administered in‐person neuropsychological tests (n = 96, ages 55–79) and criterion validity through associations with magnetic resonance imaging measures of neurodegeneration sensitive to Alzheimer's disease (n = 47). RESULTS: SLS performance significantly correlated with the Auditory Verbal Learning Test and measures of neurodegeneration (temporal meta‐regions of interest and entorhinal cortical thickness, adjusting for age and education). Test–retest reliabilities across two sessions were 0.71–0.76 (two‐way mixed intraclass correlation coefficients). DISCUSSION: The SLS is a valid and reliable self‐administered memory test that shows promise for remote assessment of aging and neurodegenerative disorders.

Remote cognitive assessment has transitioned from an important research goal 1 to an immediate research and clinical need due to . This need has underscored the lack of well-validated, reliable, and well-normed tests available for remote assessment. 2 internet users. [4] [5] [6] Identification and monitoring of early cognitive decline due to

Alzheimer's disease (AD) is an important priority for the field. To help address the critical need for a sensitive and brief remote memory measure, we developed a computer adaptive word list learning test to detect the early changes in learning in preclinical and prodromal AD, 7, 8 the Stricker Learning Span (SLS). We transformed the traditional verbal word list memory test paradigm in several ways, resulting in a novel supra-span learning and memory paradigm that takes full advantage of computer-based administration. The SLS uses computer adaptive testing principles that alter the difficulty of the test to match participant performance to extend the floor and ceiling. In addition, we included an open-source measure of processing speed, the Symbols Test. 9 Processing speed measures are routinely incorporated in composite cognitive measures designed to detect early preclinical changes due to their known sensitivity to cognitive aging, AD, and other neurodegenerative disorders. 10 The aims of this study were to (1) 

Participants were recruited from the Specialized Center of Research 

Words are visually presented to facilitate reliable self-administration and ensure consistency across device types. Item memory after each learning trial is tested via four-choice recognition (see Figure 1 ). Participants receive a one-word practice item to ensure comprehension of task instructions. If incorrect across three practice trial attempts, the SLS is discontinued. The first learning trial consists of eight words to remember. Words are presented sequentially for 1 second on, 1 second off. Following a computer adaptive testing approach, the number of words presented on each subsequent learning trial stays the same, increases, or decreases based on percentage of correct responses. 11 This computer adaptive testing method helps to determine the maximum "learning span" over five trials. High performers will be exposed to up to 23 words, whereas low performers will be shown a decreasing number of words across learning trials (floor = 2 items presented).

The SLS uses an item bank of 92 high-frequency words extracted from SUBTLEX US corpus, 12 as common words are easier to recall but harder to recognize. 13 Four-word item bins were matched based on word characteristics (imageability, length, semantic category, syllables), with a range of difficulty based on imageability ratings. 14 Subsequent word bins have successively declining imageability ratings, increasing the difficulty level to raise the ceiling. We predict this will increase sensitivity to early changes in preclinical AD or other disorders with subtle impact on memory performance. Ran-domization of words (target vs. foil in each bin) occurs at each testing session to provide endless alternative forms and reduce practice effects. Bin order is randomized for each trial to increase difficulty and reduce recency effects, and the last item presented is never the first tested.

The primary outcome variable is the maximum learning span, defined as the maximum number of words correctly identified on any learning trial (max span, range 0-23). Secondary outcome measures include learning total correct (trials 1-5 correct, range 0-85), delay (range 0-23), and sum of trials (trials 1-5 + delay; range 0-108). Use of a composite score was also explored by creating a z-score using the mean (standard deviation [SD]) of all session 1 data for max span, 1-5 total and delay, then averaging across these three z-scores. Not all participants had the opportunity to complete the delay trial because the delay was added mid-study following our planned iterative approach to test development. We initially hypothesized that the max span would correlate well with traditional measures of delayed memory; however, correlations were lower than expected between max span and Auditory Verbal Learning Test (AVLT) delayed recall in our initial subset of participants (Pearson r = 0.17, n = 23). Thus, we added a delay trial after the Symbols Test on January 17, 2021. The maximum items presented during any learning trial are tested at delay (mean delay = 3.7 minutes).

The Symbols Test is an open-source measure of processing speed with previously demonstrated validity and reliability. 9 This measure was developed by Jason Hassenstab, Ph.D., and is part of the Ambulatory Research and Cognition app (ARC). 15 For each trial, participants identify which of two symbol pairs on the bottom of the screen matches one of three symbol pairs presented at the top of the screen. The original version used in ARC studies includes up to 28 brief 12-item trials taken over the course of 7 consecutive days. In this shortened version, the primary outcome variable is average correct item response time (correct RT, sec) across four 12-item trials. Secondary outcome variables were also explored (see Tables for definitions) .

A neuropsychological test battery was administered by a psychometrist under the supervision of a board-certified neuropsychologist (JAF). We examined validity of the SLS using AVLT sum of trials 16 as the primary outcome (secondary outcomes included trial 5, trials 

Brain MRI was conducted on 3T scanners (Prisma, Siemens) with a 3D magnetization prepared rapid acquisition gradient-echo (MPRAGE)

sequence. These were tissue-class segmented using Unified Segmentation 17 in SPM12 with population-optimized priors and settings from the Mayo Clinic Adult Lifespan Template (MCALT) (https://www. nitrc.org/projects/mcalt/). These segmentations were used to sum the total intracranial volume (ICV) and estimate cortical thickness using Advanced Normalizaion Tools (ANTs) diffeomorphic registrationbased cortical thickness (DiReCT). 18, 19 ANTs' symmetric normalization was used to warp the MCALT_ADIR122 atlas for computing regional measurements. 20 We derived entorhinal cortical thickness and a temporal meta-region of interest (ROI; previously referred to as an ADsignature composite ROI). The temporal meta-ROI is composed of the voxel-number weighted average cortical thickness of six temporal lobe ROIs (entorhinal cortex, fusiform, parahippocampal, midtemporal, inferior temporal, angular gyrus). 19 This temporal meta-ROI was previously derived using Youden's index criteria to separate cognitively unimpaired from clinically diagnosed and autopsy-confirmed AD patients and tested for diagnostic reliability and accuracy; it is sensitive to but not specific for AD. 19 Test Drive; SD, standard deviation. a n = 89 White, n = 1 African American, n = 2 Asian, n = 4 unknown. b n = 87 Non-Hispanic, n = 2 choose not to disclose, n = 7 unknown. and regression models were fit separately within men (n = 90) and

women (n = 66).

We report Pearson bivariate correlation coefficients to assess convergent validity with in-person neuropsychological measures. Testretest reliability is determined by computing single-rating, absoluteagreement, two-way mixed intraclass correlations (ICCs) with 95% confidence intervals (CIs) around the ICCs. 22 ICCs are interpreted using recommended ranges. 23 

We included all participants who initiated a MTD session from study initiation 

Median time to complete the first test session was 15.1 minutes (Table   S2 ; subtest completion times are also provided).

Test-retest reliability across two sessions for SLS learning variables were good (at or above 0.75 ICC; see 

No SLS measures showed significant practice effect (CI included 0; see Table 2 ). SLS delay showed evidence for a small but non-significant decrease in performance at session 2 (d = -0.20). All Symbols Test variables showed a small practice effect, with significantly faster performance at session 2.

Distributional properties of SLS variables were similar to in-person administered AVLT recall measures ( 

Age correlations with SLS (-0.05 to -0.24) were larger in magnitude than age correlations with AVLT (0.00 to -0.12; Table S3 in 

SLS showed significant correlations with AVLT variables (Table 3) (Table 3 and Tables S4 and S5 

This study examines feasibility, psychometric properties, and convergent validity of web-based, self-administered neuropsychological tests using the MTD platform. In addition, we examine criterion validity through associations with biomarkers of neurodegeneration sensitive to AD.

Consistent with our flexible platform, participants used a variety of devices to complete the tests in approximately 15 minutes. Although we specifically encouraged use of smartphones, only half of participants chose to use a smartphone; 47% used personal computers and 3% tablets. Once a session was initiated, most participants (98%) completed the full first session, suggesting acceptability of the platform and subtests. We predict that participants' technological literacy with their own specific and preferred devices will translate to high feasibility in other populations as well. The self-administered web-based design relying on visual presentation of stimuli eliminates other potential confounds that may occur when list-learning tests are administered orally via telephone, videoconferencing, or automated recordings, such as misinterpretation of words spoken due to hearing problems or suboptimal audio quality. 28, 29 Though infrequent, we were able to capture reports of environmental interference and participant comments that may impact interpretation of session results. We have previously observed lower performance on other self-administered cognitive measures at home versus in clinic. 30 Madero et al. 31 similarly

reported the presence of distraction in a minority of remote cognitive assessment sessions (7%), which had a negative impact on performance. Future work will examine whether participant self-report of interference can help reduce variability introduced by testing in an unsupervised environment.

Overall, the psychometric properties of the SLS and Symbols The SLS is a novel test designed to be sensitive to changes in memory encoding by expanding upon existing list-learning paradigms.

Typical recognition formats (yes/no response to test items and distractor items) are less sensitive to mild cognitive impairment (MCI) and AD dementia than spontaneous verbal free recall. 41 In contrast, when a more challenging 4-choice recognition format is used, recognition paradigms can show sensitivity to AD dementia that is comparable to free recall. 42 The current study suggests that our computer adaptive and 4-choice recognition approach is simulating recall as designed, demonstrated by significant correlations between AVLT and SLS variables, and illustrated by example learning curves for high and low performers ( Figure 2 ). We predict the SLS will have a lower floor than recall-based memory measures in individuals with cognitive impairment. The Symbols Test also showed significant correlations with person-administered measures of processing speed.

Structural neuroimaging markers of neurodegeneration, including temporal meta-ROI and entorhinal cortical thickness, were significantly associated with SLS performances, providing preliminary support for SLS criterion validity. Word list recall was associated with an alternative "AD-signature" cortical thickness ROI in a group of adults without significant psychiatric or neurological history (age range 21-78), 43 are needed to examine associations with amyloid and tau biomarkers, explain the theory underlying test development (in preparation), examine diagnostic accuracy in well-characterized clinical groups, and to apply factor analytic methods to better establish convergent and divergent validity in a larger sample. 48 There is increasing interest in developing digital tools to detect and track preclinical and prodromal stages of AD. 15 MTD helps address several emerging needs for digital tools, including a multi-device web-based platform that can increase representativeness of samples through ease of access, inclusion of methods to capture the presence of test interference in an unsupervised environment, and use of computer adaptive and multi-trial test design to help counteract the expected increased variability in performance with unsupervised and/or remote assessment methods.

Our results support the feasibility of MTD and strong psychometrics properties of the SLS and Symbols Test in a sample of female older adults. In addition, the SLS is correlated with biomarkers of neurodegeneration sensitive to AD. MTD shows potential as an equitable platform for self-administered cognitive measures to increase access for research and clinical use.

The authors wish to thank the participants and staff at the Spe- 

Early detection of mild cognitive impairment MCI in an at home setting

Validity of teleneuropsychology for older adults in response to COVID-19: a systematic and critical review

Accelerate: Building and Scaling High Performing Technology Organizations. IT Revolution Press

Mobile Technology and Home Broadband

No participant left behind: conducting science during COVID-19

Association of deficits in short-term learning and Aβ and hippoampal volume in cognitively normal adults

Brain substrates of learning and retention in mild cognitive impairment diagnosis and progression to Alzheimer's disease

Reliability and validity of ambulatory cognitive assessments

Measuring cognition and function in the preclinical stage of Alzheimer's disease

Measuring working memory capacity in children using adaptive tasks: example validation of an adaptive complex span

Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English

Parametric effects of word frequency in memory for mixed frequency lists

Extensions of the Paivio, Yuille, and Madigan (1968) norms

Current advances in digital cognitive assessment for preclinical Alzheimer's disease

Age, sex, and APOE epsilon4 effects on memory, brain structure, and beta-amyloid across the adult life span

Unified segmentation

Registration based cortical thickness measurement

A large-scale comparison of cortical thickness and volume methods for measuring Alzheimer's disease severity

A reproducible evaluation of ANTs similarity metric performance in brain image registration

Different definitions of neurodegeneration produce similar amyloid/neurodegeneration biomarker group findings

Intraclass correlations: uses in assessing rater reliability

A guideline of selecting and reporting intraclass correlation coefficients for reliability research

The statistical interpretation of pilot trials: should significance thresholds be reconsidered

Community screening for dementia: the Mini Mental State Exam (MMSE) and Modified Mini-Mental State Exam (3MS) compared

Mayo normative studies: regression-based normative data for the auditory verbal learning test for ages 30-91 years and the importance of adjusting for sex

Neuropsychological tests' norms above age

Hearing loss and verbal memory assessment among older adults

The impact of presentation modality on cognitive test performance for adults with hearing loss

Longitudinal comparison of in clinic and at home administration of the Cogstate brief battery and demonstrated practice effects in the mayo clinic study of aging

Environmental distractions during unsupervised remote digital cognitive assessment

The robust reliability of neuropsychological measures: meta-analyses of test-retest correlations

Computerized cognitive assessment in primary care to identify patients with suspected cognitive impairment

Measuring episodic memory across the lifespan: NIH Toolbox Picture Sequence Memory Test

Reliability and validity of a computerized neurocognitive test battery, CNS vital signs

Sensitivity and test-retest reliability of the international shopping list test in assessing verbal learning and memory in mild Alzheimer's disease

Reliability and validity of a homebased self-administered computerized test of learning and memory using speech recognition

Scoring higher the second time around: meta-analysis of practice effects in neuropsychological assessment

The psychology and neuroscience of forgetting

Comparison of PC and iPad administrations of the Cogstate Brief Battery in the Mayo Clinic Study of Aging: assessing cross-modality equivalence of computerized neuropsychological tests

Diagnostic accuracy of memory measures in Alzheimer's dementia and mild cognitive impairment: a systematic review and meta-analysis

Early detection of memory impairment in Alzheimer's disease: a neurocognitive perspective on assessment

Is the Alzheimer's disease cortical thickness signature a biological marker for memory?

Alzheimer's Disease Neuroimaging I. Fractionating the Rey auditory verbal learning test: distinct roles of large-scale cortical networks in prodromal Alzheimer's disease

Alzheimer's Disease Neuroimaging I. Fractionating verbal episodic memory in Alzheimer's disease

Associations between verbal learning slope and neuroimaging markers across the cognitive aging spectrum

Memory in the aging brain: doubly dissociating the contribution of the hippocampus and entorhinal cortex

Construct validity: advances in theory and methodology

Additional supporting information may be found in the online version of the article at the publisher's website.