key: cord-0918320-zujuxmim authors: Park, Catherine; Mishra, Ramkinker; Sharafkhaneh, Amir; Bryant, Mon S.; Nguyen, Christina; Torres, Ilse; Naik, Aanand D.; Najafi, Bijan title: Digital Biomarker Representing Frailty Phenotypes: The Use of Machine Learning and Sensor-Based Sit-to-Stand Test date: 2021-05-08 journal: Sensors (Basel) DOI: 10.3390/s21093258 sha: f33374baff6990e5fd3d7559193d9387d5a49683 doc_id: 918320 cord_uid: zujuxmim Since conventional screening tools for assessing frailty phenotypes are resource intensive and unsuitable for routine application, efforts are underway to simplify and shorten the frailty screening protocol by using sensor-based technologies. This study explores whether machine learning combined with frailty modeling could determine the least sensor-derived features required to identify physical frailty and three key frailty phenotypes (slowness, weakness, and exhaustion). Older participants (n = 102, age = 76.54 ± 7.72 years) were fitted with five wearable sensors and completed a five times sit-to-stand test. Seventeen sensor-derived features were extracted and used for optimal feature selection based on a machine learning technique combined with frailty modeling. Mean of hip angular velocity range (indicator of slowness), mean of vertical power range (indicator of weakness), and coefficient of variation of vertical power range (indicator of exhaustion) were selected as the optimal features. A frailty model with the three optimal features had an area under the curve of 85.20%, a sensitivity of 82.70%, and a specificity of 71.09%. This study suggests that the three sensor-derived features could be used as digital biomarkers of physical frailty and phenotypes of slowness, weakness, and exhaustion. Our findings could facilitate future design of low-cost sensor-based technologies for remote physical frailty assessments via telemedicine. According to the World Health Organization, by 2050, approximately 22% of the global population will be 60 years or older [1] . Physical frailty, which is defined as the state of increased vulnerability in reserve and function across multiple physiological systems, is common in older adults [2] . The condition can increase the risk of adverse health outcomes, such as falls, poor quality of life, hospitalizations, mortality, etc. (see [3] for review). Although physical frailty is typically chronic and progressive in nature [4, 5] , it can be ameliorated or potentially reversed if identified and treated early [6] [7] [8] . Therefore, identification of older adults with physical frailty or at risk of becoming physical frailty plays an important role in monitoring health conditions, planning for appropriate health services, and designing and implementing interventions [9] . The medical profession generally relies on two common techniques to identify those at risk of physical frailty: a frailty phenotype and a frailty index. The Fried frailty phenotype assesses unintentional weight loss, slowness, weakness, exhaustion, and low physical activity [10] . A pre-frail/frail individual is identified when one or more of the five phenotypes are detected. The frailty index assesses health deficits (e.g., symptoms, signs, disabilities, diseases, etc.) [11] . The frailty index is represented as a ratio between the number of presented deficits and the number of considered deficits. However, the tools used for physical frailty assessment are resource intensive [12] , and thus they are largely unsuitable for telehealth assessments and monitoring. Primary healthcare providers also need simpler tools to administer physical frailty assessments [13, 14] . Wearable sensors, the internet of things (IoT), mobile technology, and cloud computing have encouraged medical device design engineers and researchers to use technology for frail-related developments [15] . We recently tested the effectiveness of five wearable sensors for a five times sit-to-stand test (5×STS) [16] ; the sit-to-stand test is widely used in research and clinical practice to assess physical frailty and motor performance [17] [18] [19] . In our previous study [16] , we found a strong correlation between sensor-based and manuallyrecorded 5×STS durations, and we first identified eight sensor-derived features as digital biomarkers for physical frailty and three key frailty phenotypes (slowness, weakness, and exhaustion). Despite these promising results, the use of five wearable sensors to extract eight sensor-derived features may have limited applications because of cost and computational burden. Acknowledging this limitation, the present study explores whether a machine learning technique combined with frailty modeling could determine the least sensor-derived features needed for identifying physical frailty and three key frailty phenotypes (slowness, weakness, and exhaustion). We hypothesize that: (1) a machine learning technique combined with frailty modeling can determine the optimal features required for identifying physical frailty and three key frailty phenotypes (slowness, weakness, and exhaustion), and (2) fewer sensors can be used for determining the optimal features. This is a retrospective analysis of sensor data from 102 community dwelling older adults or Veterans, who participated in our previous work [16] . All participants were ambulatory volunteers aged 65 years or older, and they had no significant medical or psychiatric conditions and did not use assistive devices while standing and walking [16] . The study protocol was approved by the Institutional Review Board at the local institutional review boards including the Michael E. DeBakey Veterans Affairs Medical Center, Baylor College of Medicine, and the University of Arizona. All participants read and signed the informed consent form. The Fried frailty phenotype assessed participants' physical frailty from 0 to 5 based on five criteria (weight loss, weakness, slowness, exhaustion, and low physical activity) [10] . Based on the results, participants were classified into a robust group (RG, Fried frailty phenotype less than 1) or a pre-frail/frail group (FG, Fried frailty phenotype greater than or equal to 1). Before performing the 5×STS, both groups were fitted with five wireless wearable sensors (LegSys+™, BioSensics, Watertown, MA, USA) [20] attached with Velcro to elastic belts worn on the trunk, left and right thighs, and left and right shanks. Each sensor had a tri-axial accelerometer and gyroscope, a Bluetooth module, a microcontroller, and a rechargeable battery. For the 5×STS, participants were instructed to sit on an ordinary chair, and fold their arms across their chest. After given the "go" instruction by a clinician, they performed the 5×STS as quickly as possible without resting their back or legs on the chair between the repetitions [21] . All participants completed the 5×STS successfully, and there were no system malfunctions during any of the experimental trials. Each sensor wirelessly transmitted quaternion data to the custom software installed on a standard laptop at a rate of 100 Hz. Detailed information about the raw sensor signal processing and the determination of sensor-derived features for the three key frailty phenotypes (slowness, weakness, and exhaustion) are available in our previous work [16] . Briefly, sensor-based 5×STS duration and primary features were extracted from the five wearable sensors. The eight primary features were hip angle range, hip angular velocity range, hip power range, knee angle range, knee angular velocity range, knee power range, vertical velocity range, and vertical power range. The eight primary features were computed for each STS cycle, and their mean and coefficient of variation (CV, defined as the standard deviation divided by the mean) were computed across 5×STS cycles. Therefore, the total number of sensor-derived features was 17 (i.e., sensor-based 5×STS duration + 8 primary features × 2 feature types (mean and CV)). Hip and knee angle, hip and knee angular velocity, and vertical velocity were computed by the raw sensor signal processing [16] . Angular power was computed from moment of inertia (I), angular velocity (ω), and angular acceleration (α) as: Hip and knee moment of inertia were estimated using adjusted Zatsiorsky-Seluyanov's segment inertia parameters [22] , and hip and knee angular acceleration were computed as the time derivative of hip and knee angular velocity. Vertical power was computed as a product of body mass, vertical velocity, and vertical acceleration. Vertical force was computed as a product of body mass and vertical acceleration. Scaled power considering weight and height was computed as: Scaled vertial power = vertical power m·g· g·h (2) where m is body mass, h is height, and g is gravitational acceleration. A scaled vertical power is unitless, and its computation is similar to the application of segment inertia parameters for calculating moment of inertia. Consistent with our previous work [16] , the indicators of slowness were the mean of hip angular velocity range, mean of knee angular velocity range, and mean of vertical velocity range; the indicators of weakness were the mean of hip angle range, mean of hip power range, mean of knee angle range, mean of knee power range, and mean of vertical power range; and the indicators of exhaustion were the CVs of the eight primary features. To determine the features for optimal feature selection, either the one-way analysis of variance (ANOVA) or the Mann-Whitney U test was applied to the 17 sensorderived features, depending on each sensor-derived feature's normality as confirmed by the Shapiro-Wilk test. Eight of the 17 sensor-derived features showed a significant difference between the RG and FG, and thus they were used as independent variables for optimal feature selection. Optimal feature selection used a recursive feature elimination technique with logistic regression modeling. The modeling used a frailty status (0 (robust) or 1 (frail)) as a dependent variable and the eight significant sensor-derived features as independent variables. The recursive feature elimination technique, which enables ranking the most effective features, was used to determine the least number of features that produce an optimal performance [23] . The bootstrapping technique, which enables testing any possible combinations of participants with different sample sizes, was used to generalize the logistic regression modeling [24, 25] . Considering the number of participants (sample size; n = 102), the optimal feature selection used 2000 bootstrap iterations to optimize the robustness of logistic regression modeling, which is recommended in the literature [25] . Figure 1 shows the flow chart of optimal feature selection using the recursive feature elimination technique and the bootstrapping technique. The bootstrapping technique splits participants' data into 2000 pairs of training and validation datasets, which enables calculating a 95% confidence interval (CI) for optimal feature selection. The five steps for recursive feature elimination are: (1) logistic regression models are created at each iteration loop. The number of logistic regression models is the number of significant sensor-derived features considered in each recursive loop (i.e., the first recursive loop considers the eight significant sensor-derived features, and they decrease by 1 after each recursive loop). For all logistic regression models, the dependent variable is the frailty status (i.e., 0 (robust) or 1 (frail)). For the nth logistic regression model, the independent variables are all significant sensor-derived features except for the nth feature; (2) for each model at each iteration loop, the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) are calculated because the AUC is a widely accepted performance measure to evaluate ranking predictions [26] ; (3) AUC values across 2000 iterations are averaged for each model; (4) a sensor-derived feature with the lowest AUC value is removed; and (5) steps 1-4 repeat until only one sensor-derived feature remains (i.e., steps 1-4 correspond to one recursive loop, for a total execution of eight recursive loops is executed). Recursive feature elimination incorporating bootstrapping (i.e., 2000 pairs of resampling) runs 70,000 loops in total. Sensors 2021, 21, x FOR PEER REVIEW 4 of 12 102), the optimal feature selection used 2000 bootstrap iterations to optimize the robustness of logistic regression modeling, which is recommended in the literature [25] . Figure 1 shows the flow chart of optimal feature selection using the recursive feature elimination technique and the bootstrapping technique. The bootstrapping technique splits participants' data into 2000 pairs of training and validation datasets, which enables calculating a 95% confidence interval (CI) for optimal feature selection. The five steps for recursive feature elimination are: (1) logistic regression models are created at each iteration loop. The number of logistic regression models is the number of significant sensorderived features considered in each recursive loop (i.e., the first recursive loop considers the eight significant sensor-derived features, and they decrease by 1 after each recursive loop). For all logistic regression models, the dependent variable is the frailty status (i.e., 0 (robust) or 1 (frail)). For the nth logistic regression model, the independent variables are all significant sensor-derived features except for the nth feature; (2) for each model at each iteration loop, the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) are calculated because the AUC is a widely accepted performance measure to evaluate ranking predictions [26] ; (3) Model performance was evaluated by its AUC, sensitivity, specificity, and accuracy. Sensitivity and specificity are the ability of logistic regression models to identify participants with and without frailty, respectively. Accuracy is defined as: where TP (true positive) and TN (true negative) represent the number of correctly identified frailty and the number of correctly identified non-frailty, respectively, and FP (false positive) and FN (false negative) represent the number of non-frailty identified incorrectly as frailty and the number of frailty identified incorrectly as non-frailty, respectively. After determining the least optimal features, the performance of the logistic regression model with the least optimal features was evaluated. For AUC, sensitivity, specificity, and accuracy, mean and 95% CI were calculated from the validation datasets (i.e., 2000 iterations). Table 1 reports participants' demographic characteristics for both groups, including statistical results. Statistical analysis found that weight and BMI were significantly higher for the FG than for the RG. However, age, gender, and height were not significantly different between the groups. Statistical analysis showed that the eight sensor-derived features were significantly different between RG and FG. Figure 2 shows the results of the three sensor-derived features for indicators of slowness, including the statistical significance. Sensor-based 5×STS duration, mean of hip angular velocity range, and mean of knee angular velocity range were significantly slower for the FG than for the RG. Figure 3 shows the results of the two sensor-derived features for indicators of weakness, including the statistical significance. Compared to the RG, mean of hip power range and mean of vertical power range were significantly lower for the FG. Figure 4 shows the results of the three sensor-derived features for indicators of exhaustion, including the statistical significance. CV of hip angular velocity range, CV of vertical velocity range, and CV of vertical power range were significantly higher for the FG than for the RG. Figure 5 shows the model performance assessed by AUC, sensitivity, specificity, and accuracy as a function of the number of ranked sensor-derived features based on the recursive feature elimination technique with logistic regression modeling. Table 2 reports the rankings of the eight significant sensor-derived features and an associated indication of the frailty phenotype. Based on the selection criteria (the presence of slowness, weakness, and exhaustion, and an AUC ˃ 0.8 (an AUC of 0.8 to 0.9 is considered excellent [27] )), mean of hip angular velocity range, mean of vertical power range, and CV of vertical power range were selected as the optimal features. A logistic regression model with the selected features had an AUC of 85.20% (95% CI = 85.04-85.36), a sensitivity of 82.70% (95% CI = 82.43-82.96), a specificity of 71.09% (95% CI = 70.72-71.46), and an accuracy of 78.35% (95% CI = 78. 16-78.54 ). The equation of a logistic regression model for the optimal features is: g(p(ph)) = ln ( p(ph) 1 − p(ph) ) = β + β ph + β ph + β ph (4) where g and ln is the logit function and the natural logarithm, respectively, p (ph) is the probability that the dependent variable equals the frailty status (i.e., robust or frail), and probability p (ph) ranges between 0 and 1. ph1, ph2, and ph3 indicate mean of hip angular velocity range, mean of vertical power range, and CV of vertical power range, respectively, and β0 is an intercept (2.722), and β1, β2, and β3 are constant coefficients (β1 = −0.022, β1 = 0.243, and β3 = 0.055), respectively. Table 3 reports the results of model validation. The mean values for an AUC, a sensitivity, a specificity, and an accuracy are 82.18%, 79.37%, 67.20%, and 73.91%, respectively. Figure 5 shows the model performance assessed by AUC, sensitivity, specificity, and accuracy as a function of the number of ranked sensor-derived features based on the recursive feature elimination technique with logistic regression modeling. Table 2 reports the rankings of the eight significant sensor-derived features and an associated indication of the frailty phenotype. Based on the selection criteria (the presence of slowness, weakness, and exhaustion, and an AUC > 0.8 (an AUC of 0.8 to 0.9 is considered excellent [27] )), mean of hip angular velocity range, mean of vertical power range, and CV of vertical power range were selected as the optimal features. A logistic regression model with the selected features had an AUC of 85.20% (95% CI = 85.04-85.36), a sensitivity of 82.70% (95% CI = 82.43-82.96), a specificity of 71.09% (95% CI = 70.72-71.46), and an accuracy of 78.35% (95% CI = 78. 16-78.54 ). The equation of a logistic regression model for the optimal features is: where g and ln is the logit function and the natural logarithm, respectively, p (ph) is the probability that the dependent variable equals the frailty status (i.e., robust or frail), and probability p (ph) ranges between 0 and 1. ph 1 , ph 2 , and ph 3 indicate mean of hip angular velocity range, mean of vertical power range, and CV of vertical power range, respectively, and β 0 is an intercept (2.722), and β 1 , β 2 , and β 3 are constant coefficients (β 1 = −0.022, β 1 = 0.243, and β 3 = 0.055), respectively. This study demonstrated the effects of the machine learning technique combined with frailty modeling (i.e., logistic regression modeling) for determining optimal sensorderived features that is required to identify physical frailty and three frailty phenotypes (slowness, weakness, and exhaustion). The machine learning technique selected the mean of hip angular velocity range (indicator of slowness), mean of vertical power range (indicator of weakness), and CV of vertical power range (indicator of exhaustion) as the optimal sensor-derived features. The performance of the machine learning technique showed excellent AUC (85.20%) and high sensitivity (82.70%), specificity (71.09%), and accuracy Table 3 reports the results of model validation. The mean values for an AUC, a sensitivity, a specificity, and an accuracy are 82.18%, 79.37%, 67.20%, and 73.91%, respectively. This study demonstrated the effects of the machine learning technique combined with frailty modeling (i.e., logistic regression modeling) for determining optimal sensor-derived features that is required to identify physical frailty and three frailty phenotypes (slowness, weakness, and exhaustion). The machine learning technique selected the mean of hip angular velocity range (indicator of slowness), mean of vertical power range (indicator of weakness), and CV of vertical power range (indicator of exhaustion) as the optimal sensor-derived features. The performance of the machine learning technique showed excellent AUC (85.20%) and high sensitivity (82.70%), specificity (71.09%), and accuracy (78.35%). Different from the published literature [28] [29] [30] , this study first showed that the FG had a slower, weaker, and more exhausted performance of the 5×STS compared to the RG. Physical frailty is reversible when identified and treated early (see [7] for review), and a routine and accurate assessment of physical frailty is a crucial part of intervention and treatment [2, 10] . Although a variety of physical frailty assessment tools exist [12, 13] , the Fried frailty phenotype and frailty index are widely used for in-person assessment and monitoring. They are inadequate, however, for remote assessments via telemedicine. For example, the frailty phenotype requires equipment for weakness assessments (e.g., handgrip dynamometer) and enough physical space for slowness assessments (e.g., 4.57 m walking test) [5] , and the frailty index relies on patient-reported outcomes that are relatively subjective compared to the frailty phenotype [6] . Additionally, trained health professionals must administer assessments in person and interpret the results. Given poor compliance, increased medical costs, and stress on family or caregivers, sensor-based physical frailty assessment tools offer a simple, fast, and objective physical frailty assessment protocol irrespective of physical setting. Our results indicate that the frailty model with three optimal features (i.e., mean of hip angular velocity range, mean of vertical power range, and CV of vertical power range) had a lower AUC of 85.20% (95% CI = 85.04-85.36), a specificity of 71.09% (95% CI = 70.72-71.46), and an accuracy of 78.35% (95% CI = 78.16-78.54) compared to an AUC (87.64% (95% CI = 87.49-87.79)), specificity of 73.61% (95% CI = 73.25-73.97), and accuracy (79.18% (95% CI = 78.98-79.39)) of the frailty model with eight sensorderived features, as shown in Figure 5 . However, both models showed an excellent AUC and a high specificity and accuracy [23, 27] and had a similar sensitivity levels (the model with three optimal features: 82.70% (95% CI = 82.43-82.96) and the model with eight sensor-derived features: 82.44% (95% CI = 82.19-82.69)). Notably, the three optimal features are sufficient to identify physical frailty and the three key frailty phenotypes, and two wearable sensors (trunk and one thigh) can capture the three optimal features. Compared to the use of a five-sensor configuration, a two-sensor configuration has significant commercial advantages (e.g., cheaper to manufacture and simpler to integrate), and the health care advantages include easier use and minimal computation when analyzing and interpreting the results. Compared to other function tests (e.g., walking and strength tests), the 5×STS is simple, fast, safe, easily reproducible, and widely used in research and clinical practice [17] [18] [19] 31] . The 5×STS is an important component of the Short Physical Performance Battery, a clinical tool used for identifying physical frailty [32] . Therefore, sensor-based 5×STS could enable health professionals to quickly identify slowness, weakness, and exhaustion, which can assist in recognizing potential modifiable risk factors to improve health outcomes and strategies for older adults [33, 34] . The limitations of this study include: the possible mispredictions of physical frailty and three frailty phenotypes, and a possible ungeneralizable frailty model. We speculate that the possible mispredictions of physical frailty and three frailty phenotypes are due to the differences between the 5×STS and the Fried frailty phenotype method. For example, slowness of gait may not be indicated by slowly performing 5×STS, and weakness of grip strength may not be indicated by weakly performing 5×STS. Our speculation is also supported by our previous findings that a rapid sensor-based elbow flexion/extension test with a machine learning approach identified physical frailty and frailty phenotypes diagnosed with the frailty index with an accuracy of above 80%, and possible mispredictions with less than a 20% false rate [23] . In addition, we attribute the possible mispredictions of physical frailty and three frailty phenotypes to the binarization of the Fried frailty phenotypes. Future research will focus on improving prediction rates by including additional measurements with sensors (e.g., gait speed) and by using a multiclass classification method. Although we have used the bootstrapping technique to generalize logistic regression mod-eling, our current sample size (n = 102) and gender imbalance may not be sufficient, given our cross-comparisons and combinations of participants. Therefore, future research will use larger samples with balanced gender within and between groups. We also plan to use a telemedicine camera (i.e., laptop or tablet integrated camera), which we are currently developing to record older adults performing 5×STS without wearing sensors. Our aim is to demonstrate that sensorless 5×STS could be an alternative when there is limited or no access to sensors. This study demonstrated that the mean of hip angular velocity range, mean of vertical power range, and CV of vertical power range are the optimal features for identifying physical frailty and three key frailty phenotypes (slowness, weakness, and exhaustion) with sensor-based 5×STS in older adults. Of note, the three optimal features can be extracted with the two-sensor configuration, which could facilitate the technological development and commercial application of low-cost, easy-to-use, computationally efficient sensor-based frailty assessment tools. Reliance on telemedicine is expected to increase in the post-COVID era [35] . Our sensor-based or sensor-less 5×STS would enable remote physical frailty assessments of older adults performing the 5×STS at home. Older adults and their families and caregivers should also benefit from remote physical frailty assessments via telemedicine. Frailty in elderly people Global incidence of frailty and prefrailty among community-dwelling older adults: A systematic review and meta-analysis Managing frailty as a long-term condition Frailty in older adults: Implications for end-of-life care Frailty consensus: A call to action Interventions to prevent or reduce the level of frailty in community-dwelling older adults: A scoping review of the literature and international policies Frailty screening and interventions: Considerations for clinical practice Screening for frailty: Older populations and older individuals Frailty in older adults: Evidence for a phenotype A global clinical measure of fitness and frailty in elderly people Frailty measurement in research and clinical practice: A review Instruments for the detection of frailty syndrome in older adults: A systematic review Family physicians need easy instruments for frailty Technology for home-based frailty assessment and prediction: A systematic review Toward remote assessment of physical frailty using sensor-based sit-to-stand test Test-retest reliability of the five-repetition sit-to-stand test: A systematic review of the literature involving adults The repetitive five-times-sit-to-stand test: Its reliability in older adults Interrater reliability of the five-times-sit-to-stand test. Home Health Care Manag Gait and balance assessments as early indicators of frailty in patients with known peripheral artery disease Lower extremity function and subsequent disability: Consistency across studies, predictive models, and value of gait speed alone compared with the short physical performance battery Adjustments to Zatsiorsky-Seluyanov's segment inertia parameters Toward using a smartwatch to monitor frailty in a hospital setting: Using a single wrist-wearable sensor to assess frailty in bedbound inpatients Making bootstrap statistical inferences: A tutorial An Introduction to the Bootstrap The use of the area under the ROC curve in the evaluation of machine learning algorithms Receiver operating characteristic curve in diagnostic test assessment An evaluation of the 30-s chair stand test in older adults: Frailty detection based on kinematic parameters from a single inertial unit The instrumented sit-to-stand test (iSTS) has greater clinical relevance than the manually recorded sit-to-stand test in older adults Gait Velocity and Chair Sit-Stand-Sit Performance Improves Current Frailty-Status Identification A short physical performance battery assessing lower extremity function: Association with self-reported disability and prediction of mortality and nursing home admission Performance of the Short Physical Performance Battery in identifying the frailty phenotype and predicting geriatric syndromes in community-dwelling elderly Frailty syndrome: Implications and challenges for health care policy The impact of frailty on admission to home care services and nursing homes: Eight-year follow-up of a community-dwelling, older adult COVID-19 transforms health care through telemedicine: Evidence from the field The authors thank Ana Enriquez, Manuel Gardea, Ivan Maria, Luciana Narvaez, and Maria Noun for their help with data collection, IRB, and analysis. We also thank all participants for their time and effort and other research coordinators, student helpers, and research interns who have contributed to participant recruitment. The authors declare no conflict of interest.