key: cord-0854711-oxd80pbi authors: Rykov, Yuri; Thach, Thuan-Quoc; Bojic, Iva; Christopoulos, George; Car, Josip title: Digital Biomarkers for Depression Screening With Wearable Devices: Cross-sectional Study With Machine Learning Modeling date: 2021-10-25 journal: JMIR Mhealth Uhealth DOI: 10.2196/24872 sha: b0402eafa230c57faf6fef1526616d4835d19af1 doc_id: 854711 cord_uid: oxd80pbi BACKGROUND: Depression is a prevalent mental disorder that is undiagnosed and untreated in half of all cases. Wearable activity trackers collect fine-grained sensor data characterizing the behavior and physiology of users (ie, digital biomarkers), which could be used for timely, unobtrusive, and scalable depression screening. OBJECTIVE: The aim of this study was to examine the predictive ability of digital biomarkers, based on sensor data from consumer-grade wearables, to detect risk of depression in a working population. METHODS: This was a cross-sectional study of 290 healthy working adults. Participants wore Fitbit Charge 2 devices for 14 consecutive days and completed a health survey, including screening for depressive symptoms using the 9-item Patient Health Questionnaire (PHQ-9), at baseline and 2 weeks later. We extracted a range of known and novel digital biomarkers characterizing physical activity, sleep patterns, and circadian rhythms from wearables using steps, heart rate, energy expenditure, and sleep data. Associations between severity of depressive symptoms and digital biomarkers were examined with Spearman correlation and multiple regression analyses adjusted for potential confounders, including sociodemographic characteristics, alcohol consumption, smoking, self-rated health, subjective sleep characteristics, and loneliness. Supervised machine learning with statistically selected digital biomarkers was used to predict risk of depression (ie, symptom severity and screening status). We used varying cutoff scores from an acceptable PHQ-9 score range to define the depression group and different subsamples for classification, while the set of statistically selected digital biomarkers remained the same. For the performance evaluation, we used k-fold cross-validation and obtained accuracy measures from the holdout folds. RESULTS: A total of 267 participants were included in the analysis. The mean age of the participants was 33 (SD 8.6, range 21-64) years. Out of 267 participants, there was a mild female bias displayed (n=170, 63.7%). The majority of the participants were Chinese (n=211, 79.0%), single (n=163, 61.0%), and had a university degree (n=238, 89.1%). We found that a greater severity of depressive symptoms was robustly associated with greater variation of nighttime heart rate between 2 AM and 4 AM and between 4 AM and 6 AM; it was also associated with lower regularity of weekday circadian rhythms based on steps and estimated with nonparametric measures of interdaily stability and autocorrelation as well as fewer steps-based daily peaks. Despite several reliable associations, our evidence showed limited ability of digital biomarkers to detect depression in the whole sample of working adults. However, in balanced and contrasted subsamples comprised of depressed and healthy participants with no risk of depression (ie, no or minimal depressive symptoms), the model achieved an accuracy of 80%, a sensitivity of 82%, and a specificity of 78% in detecting subjects at high risk of depression. CONCLUSIONS: Digital biomarkers that have been discovered and are based on behavioral and physiological data from consumer wearables could detect increased risk of depression and have the potential to assist in depression screening, yet current evidence shows limited predictive ability. Machine learning models combining these digital biomarkers could discriminate between individuals with a high risk of depression and individuals with no risk. Total, and sampled from weekdays and weekends separately Sedentary Sedentary.wd Sedentary.we Light-intensity physical activity (LPA) -mean Average daily time of physical activity of intensity below 3 METs over the observation period (according to the CDC's physical activity guidelines) [2] Total LPA Average daily time of physical activity of intensity from 3 to 6 METs over the observation period (according to the CDC's physical activity guidelines) [2] Total MPA Vigorous-intensity physical activity (VPA) -mean Average daily time of physical activity of intensity above 6 METs over the observation period (according to the CDC's physical activity guidelines) [2] Total VPA Heart rate metrics HR -mean, SD, and CV Average, SD, and CV of HR (measured in beats per minute) over the observation period. SD and CV indicate the extent of stability/variability in HR. The ratio of variance of the average 24h data profile to the total variance of all days (data are summarized by hourly) IS in which N is the total number of data items, p is the number of data items per day (24 in this case), xh corresponds to each hour of the mean profile, while xi represents each given hour of raw data, and is the average of all data [5] . IS is a measure of stability/regularity of circadian rhythm over a series of 24h cycles, i.e. the extent to which a day-by-day activity follows some regular pattern, where higher values indicate a more stable circadian The mean square of differences between successive hourly data (i.e., the first derivative) normalized by the total variance of all days (data are summarized by hourly) IV in which N is the total number of data items, xi represents data point of a given hour, and is the average of all data. IV quantifies the fragmentation of periods of activity from periods of rest within a 24-h cycle [5] . Intradaily variability scores range from 0 to 2 and are typically below 1, where higher values indicate a more fragmented rhythm and reflect shorter alternating periods of rest and activity rather than one extended active period during the daytime and one extended rest period at night. Steps data (hourly intervals), HR data (hourly intervals), total, weekdays IV.st IV.st.wd IV.hr IV.hr.wd Diurnal activity, the mean activity of the ten consecutive most active hours of the average daily activity profile. Steps The 24h mean of by-hour coefficients of variation (CV), where CV is the ratio of SD to average in each hour between days. in which p is the number of data points per day (24 in this case, data is Steps data, HR data, total, weekdays ICV.st ICV.st.wd ICV.hr ICV.hr.wd aggregated by-hourly), xi represents values corresponding to each hour from all days, xh represents values of each hour from the mean 24h profile, and N is the number of days. Higher ICV indicates higher variation and less stable rhythm. We proposed this metric as alternative to IS, which assesses rhythm stability with a different approach. Rhythm autocorrelation (AC) The autocorrelation of time series with a day-length lag, another alternative measure of a rhythm stability; higher values indicate higher stability/similarity of data patterns across days. in which k is a day-length lag (e.g. 24 if data is aggregated by-hourly), xi represents values of each interval, the average of all data, and N is total number of data points. Steps Daily peaks -mean, and SD The number of peaks per day in time series. Robust peak detection algorithm was used to identify peaks (following algorithm parameters were set for steps data: lag = 10, threshold = 10, influence = 0; for HR data: lag = 10, threshold = 2, influence =0.25) [6] ; average and SD over the observation period were taken. Peaks indicate and quantify distinct behavioral and physiological activities happening over a day or transitions between these activities, while SD and CV are stability estimates. Steps data (aggregated into 15-minute intervals, intervals with less than 50 steps were recoded as zeros to reduce noise), HR data (aggregated into 15minute intervals), total, weekdays Abbreviations: HR-heart rate; SD -standard deviation; CV -coefficient of variation (the ratio of a SD to a mean); REM -Rapid Eye Movement; SOL -sleep onset latency; TB -time in bed; TST -total sleep time; IS -inter-daily stability; IC -intra-daily variation; ICV -inter-daily coefficient of variation; SE -sleep efficiency; LPA -light-intensity physical activity; MPA -moderate-intensity physical activity; VPA -vigorous-intensity physical activity. Sedentary Behavior Research Network (SBRN) -Terminology Consensus Project process and outcome National Center for Chronic Disease Prevention and Health Promotion, Division of Nutrition and Physical Activity Beyond fitness tracking: The use of consumergrade wearable data from normal volunteers in cardiovascular and lipidomics research Cosinor-based rhythmometry A fresh look at the use of nonparametric analysis in actimetry Peak signal detection in realtime timeseries data: robust peak detection algorithm (using z-scores)