key: cord-0481375-kw8bggrg authors: Candemir, Sema; Nguyen, Xuan V.; Prevedello, Luciano M.; Bigelow, Matthew T.; D.White, Richard; Erdal, Barbaros S. title: Predicting Rate of Cognitive Decline at Baseline Using a Deep Neural Network with Multidata Analysis date: 2020-02-24 journal: nan DOI: nan sha: 511ed5bb962d419e443d9e70e966381a30a9d148 doc_id: 481375 cord_uid: kw8bggrg This study investigates whether a machine-learning-based system can predict the rate of cognitive-decline in mildly cognitively impaired (MCI) patients by processing only the clinical and imaging data collected at the initial visit. We build a predictive model based on a supervised hybrid neural network utilizing a 3-Dimensional Convolutional Neural Network to perform volume analysis of Magnetic Resonance Imaging (MRI) and integration of non-imaging clinical data at the fully connected layer of the architecture. The analysis is performed on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Experimental results confirm that there is a correlation between cognitive decline and the data obtained at the first visit. The system achieved an area under the receiver operator curve (AUC) of 66.6% for cognitive decline class prediction. Mild Cognitive Impairment (MCI) is an intermediate stage between Cognitively Normal (CN) and Alzheimers Disease (AD) [1] . The patients in the MCI phase have a varied prognoses such that the cognitive functions of some MCI patients remain stable, without progression to AD [2] [3] . While there has not been any successful treatment to reverse cognitive decline, to date, therapy to decelerate its progression is likely to be most beneficial if it is applied early [4] [5] . In this study, we investigate whether a machine learning-based system can predict the rate of cognitive decline in patients with diagnosed MCI by processing only the clinical and imaging data obtained at the initial visit. Prior studies have reported on biomarkers and the prediction of MCI-to-AD conversion [6] [7] [8] [4] [9] . However, in our study, we investigate the feasibility of predicting the "rate of cognitive-decline" in MCI patients at their first visit by processing only the baseline MRI and routinely collected clinical data. We use a deep-learning-based predictive model that integrates imaging and non-imaging clinical data (demographic information) in the same neural network architecture. The analysis is performed on publicly available Alzheimers Disease Neuroimaging Initiative (ADNI) dataset (c.f., Section 2.1). 1 Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_ apply/ADNI_Acknowledgement_List.pdf To that end, we build a predictive model based on a supervised neural network. The model predicts the patients cognitive condition as slowly deteriorating/stable or rapidly deteriorating. The model processes the clinical data obtained at the baseline visit, and it contains 3 main components: 1) MRI brain images, 2) scalar volumetric features, and 3) demographics. MRI brain scans are provided as input to the network as sequential DICOM images. Scalar volumetric features represent selected brain substructure volume data as extracted using FreeSurfer methods [10] . The scalar volume features included in the neural network architecture are total intracranial volume, whole-brain volume, and regional volumes of the hippocampus, entorhinal cortex, fusiform gyrus, medial temporal lobe. The demographic information are age, gender, years of education, ethnicity, and race. The proposed model is illustrated in Figure 2 . We supervise the predictive model with the change in Mini-Mental State Examination (MMSE) scores [11] [12] . The MCI subjects are grouped clinically according to (i) slow cognitive decline over 3 years, and (ii) fast cognitive decline over 3 years. The neural network architecture is a fully-automated, deep-learning-based, hybrid model containing 3-Dimensional Convolutional Neural Network (3D-CNN) to perform volume analysis of MRI and integration of non-imaging clinical data at the fully connected layer of the architecture. The data used in this study were obtained from the ADNI database [13] , which is an ongoing multi-center study. The primary goal of ADNI has been to test whether serial MRI, positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. The subjects in the dataset were diagnosed as AD, MCI, Significant Memory Loss, or Cognitively Normal (CN) based on Mini-Mental State Examination (MMSE) scores. The enrolled subjects are being followed up to 3 years with visits at 3, 6, 12, 18, 24, and 36 months. For our research, we utilize data on ADNI patients who were clinically diagnosed as MCI at their baseline visits. A total of 569 subjects are used. The demographics of the subjects are summarized in 1. We use the rate of decline in MMSE scores to supervise the system. The MMSE, which is a 30-point test, is a cognitive assessment tool [11] [12] . Changes in MMSE scores in follow-up visits demonstrate the patients condition in terms of cognitive capabilities. A decrease in MMSE score reflects deterioration in cognitive capabilities; if a patients cognitive capability is stable, the MMSE scores remain relatively stable. We model the change in MMSE scores by fitting a line to the scores obtained at follow-up visits. The slope of the line indicates the rate of the cognitive loss. A patient who has faster cognitive deterioration would have a higher absolute value of slope. A slope close to zero indicates that the cognitive decline is stable. In this document, the Rate of Cognitive Decline term will refer to the slope of decline. The predictive model is binary. Therefore, the rate of cognitive decline is converted to binary variables using a threshold of -0.05 points/month, such that progressive rapidly deteriorating level of cognition is defined as a rate of decrease exceeding 0.6 points/year. The rate of cognitive decline distribution of the study subjects is shown in Figure 1 . The predictive model learns the mapping function from input data to the target output. Let V be the imaging sequence, D be the corresponding clinical data, y be the target class, and f (.) represent the mapping function between input data and output labels. The model can be formulated as for each subject i ∈ N, where N is the number of patients with MCI in the training data. Clinical data include age, gender, baseline MMSE score, education, ethnicity, and race. We also use brain volumes as supporting scalar features, which are computed with an open-source library (FreeSurfer) for analyzing and visualizing structural and functional neuroimaging data [10] . We use whole-brain volume and regional volumes of the hippocampus, entorhinal cortex, fusiform gyrus, and medial temporal lobe as scalar features. The brain volumes of each subject are available in the ADNI dataset [13] . We pose the problem as a supervised classification task, with training subjects classified into two groups based on their MMSE score changes (c.f., Figure 1 ). Therefore, the output variable y ∈ (0, 1) denotes the target classes, 0 represents "slowly deteriorating/stable" class, and 1 represents "rapidly deteriorating" class. The proposed system is illustrated in Figure 2 . We apply pre-processing techniques to each MRI volume V and corresponding clinical data D before the training. The MRI sequences are skull-stripped, which includes removal of noncerebral tissue (calvarium, scalp, and dura) [14] . The skull-strip algorithm, that is based on a U-Net architecture [15] trained on skull-stripping datasets [16] , reduces the processing size of volumes, hence increasing the computational speed during the training. The scalar regional volume features are divided by each subjects whole-brain volume size for normalization. The demographic data contains categorical values (e.g., such related to gender and ethnicity) that are converted into numeric data. The neural networks require the inputs to be scaled in a consistent way. Therefore, we normalize the images, scalar volumetric features, and clinical data into the range between 0 and 1. The learning algorithm is based on a supervised neural network that has a hybrid architecture with three main compo- nents: (i) a 3D-CNN that learns the brain morphology and patterns, (ii) integration of scalar volumetric features, and (iii) integration of non-imaging data (demographic information) at the fully connected layer. The 3D-CNN processes MRI scans and models the patterns and structures in brain volume. Earlier layers of the model capture the low-level features of brain details, while higher-level layers learn abstract features. The architecture consists of three 3D convolutional layers with kernels of 3 × 3 × 3 elements, with 32, 64, and 128 filters, respectively. The number of convolutional layers is decided empirically based on the performance on the validation set. Each convolutional layer is followed by a 2 × 2 × 2 max-pooling layer for feature reduction and spatial invariance. The architecture uses ReLU activation, which introduces nonlinearity to the system [17] . Each activation function is followed by a batch normalization, which mitigates overfitting and improves the system generalization by normalizing the output of the activation function of the previous layer [18] . The output of the deepest convolutional layer is flattened and fed to the fully connected layer. We have integrated the scalar volumetric features and demographic information at the fully connected layer of the architecture. Each scalar data is inserted through a node. The fully connected layer is followed by a dropout layer [19] in which the system temporarily ignores the randomly selected neurons during the training to prevent the system from memorizing the training data with the intent to decrease overfitting. The system is compiled with Adam optimizer [23] and categorical cross-entropy loss function with L2-norm weight decay [20] . The weight decays penalize the models with high weights, thus increase the generalization capacity of the model. The voxel-based convolutional neural networks are prone to over-fitting due to high dimensional data, a large number of parameters and a relatively small number of cases to optimally train the system [21, 14, 22] . To address the data-scarcity, we accumulate the training data with augmentation strategies. We flipped MRI volumes such that left and right hemispheres are reversed [14] , and randomly tilted at a degree less than 5 • . We have also employed the regularization techniques dropout [19] and weight decays [20] in order to increase the generalization capacity of the model. The dataset used in the study consists of 569 subjects with MPRAGE (MRI) scans and corresponding clinical data (c.f., Section 2.1). We perform 5-fold cross-validation to reduce the performance differences due to relatively small size datasets and to provide more robust generalization performance. At each fold, 60% of the dataset is used to train the model, 20% is used for model selection, and 20% of the dataset is used to test the model. The processing dimension of each MPRAGE volume is resized into 116 × 130 × 83 voxels. We train using Adam optimization [23] , which provides faster convergence due to the velocity component in addition to the acceleration component. The learning rate is 0.00001, hyper-parameters β 1 = 0.9, β 2 = 0.999, and = 10 −8 . The categorical cross-entropy is employed as a loss function. As regularization, we use dropout regularization at the fully connected layer with keep rate 0.5. We also use We built 3 models: (i) an imaging model based on a 3D-CNN component that takes into account information from whole brain MRI, (ii) a hybrid model that combines the 3D-CNN component with brain-volume scalar data and demographic information, and (iii) a simple model that processes brain-volume scalar data and demographic information data. We assess the models' prediction performance in terms of accurately classifying cognitive decline on a test dataset at each test fold and average the evaluation metric scores across all the models. The performance metrics used in the study are Sensitivity, Specificity, Accuracy, PPV, NPV, and AUC. Table 2 lists the performance metrics. The correlation between the morphological changes in the brain (e.g., parenchymal volume loss) and AD is known [25] [26] . Based on a prior study [27] , (i) MCI subjects have medium atrophy of hippocampus; (ii) the brain morphology in non-converters is similar to brain morphology in CN; converters are more similar to AD, and (iii) converters have more severe deterioration of neuropathology than nonconverters. Due to the correlation between the pathological changes in brain morphology and the AD stages, we first measured how much we could predict the pace of the cognitive decline of patients by processing only the MRI scans through a 3D-CNN. The system achieved 64.8% AUC for predicting cognitive-decline class. The Receiver Operator Characteristic (ROC) curve is shown in Figure 3 .Top. The hybrid model processes the MRI sequences, brain volume scalar data, and demographic information (age, gender, years of education, ethnicity and race). Table 3 lists the performance scores obtained with the proposed system in terms of mean and standard deviation across the cross-validated folds. The system achieved an Accuracy of 61.1%, with a PPV of 55.1%, Sensitivity of 51.8%, Specificity of 68%, and NPV of 65.5% at threshold 0.5. The average AUC is 66.6%. Adding the brain volume and demographic information as scalar values to the system increased the system performance from 64.8% AUC to 66.6% AUC as shown in Figure 3 .Middle. data prediction performance The voxel-based convolutional neural networks are prone to over-fitting due to high dimensional data, a large number of parameters, but lack of annotated subject to optimally train the system [21, 14, 22] . Although we utilize several regularization techniques, we still observed over-fitting due to the 3D-CNN module of the hybrid system. In this experiment, we remove the 3D-CNN module of the hybrid model and run the experiments only using brain−volume scalar data with non-imaging clinical data. The system achieved 66.6% AUC for cognitive decline class prediction as shown in Figure 3 .Bottom. In this study, we investigate whether a machine learningbased system can predict cognitive decline in MCI patients at the initial visit by processing the clinical data routinely collected. Unlike other studies that focus on predicting MCI-to-AD conversion or AD/CN/MCI classification, we approach the problem as an early prediction of cognitive decline rate in MCI patients. The ability to identify an individuals cognitive decline rate potentially helps the clinician to develop early preventive treatment strategies. We observed the performances of 3 models for the prediction of cognitive-decline class. Our results confirm that there is a correlation between cognitive decline and clinical data obtained at the first visit; the imaging model achieved 64.8% AUC. By adding brain volume and demographic information as scalar values to the system, the performance increased to 66.6% AUC. Processing brain volumes (from FreeSurfer brain data) and demographic information as scalar values provide similar results as the hybrid module performance. Even though patient's cognitive condition is mostly decided based on non-imaging clinical data (e.g., MMSE score, patient age) at the clinical visit, and MRI scans are generally collected to exclude other brain pathology, our results show that the structural MRI provides useful information related to the patient's cognitive condition and may further contribute to the clinical evaluation and followup of patients with MCI. A similar study [27] trains a convolutional neural network with regional patches extracted from the hippocampus and combines the extracted information with FreeSurfer brain data. Our results are compatible in that combining CNN features with scalar brain data features obtained with the FreeSurfer library increases the prediction performance. However, our study has differences, since our model (i) does not predict the MCI-to-AD conversion probability but rather predicts the rate of cognitiondeterioration in MCI patients based on first-visit data; (ii) identifies patterns within the whole brain MRI instead of only the hippocampus, and (iii) uses limited FreeSurfer brain data compared the brain data used in [27] . Our system performance is lower compared to the published studies that investigate MCI-to-AD conversion or AD/CN classification. However, predicting cognitive decline is more challenging than AD/CN classification due to the subtle nature of pathological changes [27] . Moreover, our system processed only data that is routinely collected at the first visit and therefore makes predictions based on much less information compared to studies that incorporate follow-up data through time-sequence analysis. Therefore, the systems ability to identify an individuals cognitive decline rate with 66.6% AUC is comparable to the performance of analogous models in the published literature. Neuropathologic alterations in mild cognitive impairment: a review Mild cognitive impairment: clinical characterization and outcome A multifactor approach to mild cognitive impairment Predicting Alzheimers disease progression using multi-modal deep learning approach Disease-modifying therapies for Alzheimer's disease: challenges to early intervention Deep learning-based feature representation for AD/MCI classification Structural imaging biomarkers of Alzheimer's disease: predicting disease progression Alzheimer's Disease Neuroimaging Initiative, et al., Machine learning framework for early MRI-based Alzheimer's conversion prediction in MCI subjects Prognosis of early-onset vs. late-onset mild cognitive impairment: Comparison of conversion rates and its predictors Online; accessed Mini-Mental State Examination (MMSE) for the detection of Alzheimer's disease and other dementias in people with mild cognitive impairment (MCI) The severe minimental state examination: a new neuropsychologic instrument for the bedside assessment of severely impaired patients with Alzheimer disease Alzheimer's disease neuroimaging initiative End-to-end Alzheimers disease diagnosis and biomarker identification U-net: Convolutional networks for biomedical image segmentation Deepbrain -skull strip algorithm Deep learning Batch normalization: Accelerating deep network training by reducing internal covariate shift Dropout: a simple way to prevent neural networks from overfitting A simple weight decay can improve generalization A survey on deep learning in medical image analysis Single subject prediction of brain disorders in neuroimaging: promises and pitfalls Adam: A method for stochastic optimization Deep learning Hippocampal atrophy on MRI in frontotemporal lobar degeneration and Alzheimers disease Global and local gray matter loss in mild cognitive impairment and Alzheimer's disease Convolutional neural networks-based MRI image analysis for the Alzheimers disease prediction from mild cognitive impairment