key: cord-0278794-v5lfleo0 authors: Dineley, Judith; Lavelle, Grace; Leightley, Daniel; Matcham, Faith; Siddi, Sara; Penarrubia-Mar'ia, Maria Teresa; White, Katie M.; Ivan, Alina; Oetzmann, Carolin; Simblett, Sara; Dawe-Lane, Erin; Bruce, Stuart; Stahl, Daniel; Haro, Josep Maria; Wykes, Til; Narayan, Vaibhav A.; Hotopf, Matthew; Schuller, Bjorn W.; Cummins, Nicholas title: Remote smartphone-based speech collection: acceptance and barriers in individuals with major depressive disorder date: 2021-04-17 journal: nan DOI: 10.21437/interspeech.2021-1240 sha: 2c9cec59469b8aaefa9ab7e1e4b03d4310d71447 doc_id: 278794 cord_uid: v5lfleo0 The ease of in-the-wild speech recording using smartphones has sparked considerable interest in the combined application of speech, remote measurement technology (RMT) and advanced analytics as a research and healthcare tool. For this to be realised, the acceptability of remote speech collection to the user must be established, in addition to feasibility from an analytical perspective. To understand the acceptance, facilitators, and barriers of smartphone-based speech recording, we invited 384 individuals with major depressive disorder (MDD) from the Remote Assessment of Disease and Relapse - Central Nervous System (RADAR-CNS) research programme in Spain and the UK to complete a survey on their experiences recording their speech. In this analysis, we demonstrate that study participants were more comfortable completing a scripted speech task than a free speech task. For both speech tasks, we found depression severity and country to be significant predictors of comfort. Not seeing smartphone notifications of the scheduled speech tasks, low mood and forgetfulness were the most commonly reported obstacles to providing speech recordings. Speech, both as a natural language processing (NLP) and computational paralinguistics signal, is increasingly regarded as a key digital phenotype for a range of health conditions [1] , [2] , particularly mental and neurological disorders [3] , [4] . With the increasing ubiquity of remote measurement technology (RMT), such as smartphones, wearables and other Internet-of-Things (IoT) devices, how we collect speech in speech-health studies is also changing. In-the-wild speech collection [5] - [9] enables research study participants to incorporate speech recordings into their daily lives, providing them with the flexibility of when and where to record their speech. Variables that may influence the paralinguistic and linguistic content of someone's speech, such as depression severity, can also be measured at the same time. However, moving from the laboratory to real-life environments comes with the need to understand participants' motivations, facilitators and barriers that may affect the remotely-collected speech data. There have been minimal efforts to understand the challenges of remote speech collection. Simblett et al. [10] , conducted a systematic review of RMT approaches as a healthcare management tool and identified that most studies did not assess the usability and acceptability of RMT. The studies reviewed also did not target any specific RMT signal, so no speech-specific conclusions can be drawn from this literature. Few studies in individuals with mental health disorders have explicitly explored user engagement with in-the-wild speech collection. One study [11] collected speech samples to track clinical changes in 47 individuals with serious mental illnesses. The authors implemented an exit survey, where 22 of 24 respondents gave positive statements on their experience. While promising, the survey was not fully described, and only limited results were reported. The study also collected speech via a dial-in system accessible using any phone, so participants did not actually interact with any RMT. Given the growing number of speech-health studies [12] - [21] , understanding the factors related to motivation, convenience, accessibility and usability of remote speech data collection tools is vital to maximise the research value of this unique health signal. This paper describes a survey designed to understand some of these factors in individuals diagnosed with major depressive disorder (MDD) and enrolled in the multicentre Remote Assessment of Disease and Relapse -Central Nervous System (RADAR-CNS) research programme [22] , an observational, longitudinal, prospective study collecting multimodal RMT data in six European countries. Based on survey responses, we specifically report how comfortable participants felt completing scripted and free speech tasks, and differences in comfort level with sociodemographic factors and depression severity. We also quantify the obstacles perceived by participants to have prevented speech collection, and the accuracy of participants' reporting of their level of speech task completion. Participants with MDD were asked to complete two speechrecording tasks every two weeks at the same time as questionnaires assessing depression severity [23] and selfesteem [24] . In a scripted speech task (SS), the participants read aloud an extract from Aesop's fable, The North Wind and the Sun [25] . The second task was a free-speech activity (FS) in which participants were asked to speak about what they were looking forward to in the following seven days [3] , [26] . Speech collection began in the UK in August 2019 and in December 2019 in Spain. Recordings were collected using the open-source RADAR-base m-health platform [27] . Participants recorded their speech on smartphones using a custom-designed app that produced notifications each time speech recordings were scheduled. English and Castilian Spanish versions of the app were available. Once recorded, speech data were encrypted and sent to a secure server. Sociodemographic data, including age at enrolment, gender and years in education, were collected for all participants. Depression severity was also assessed using the Inventory of Depressive Symptomatology -Self-Reported (IDS-SR) [28] , conducted every three months as part of the RADAR-CNS MDD protocol [29] . Participants' individual speech task completion rates were calculated as a percentage of the scheduled tasks. The survey was designed to minimise the burden on study participants, who already had several tasks to complete as part of RADAR-CNS. It was developed in a smartphone-compatible format using the Qualtrics survey software platform, which predicted a completion time of eight minutes. Participants were asked a maximum of 12 questions that were mostly multiple choice, all of which were optional (see associated multimedia file). Questions were developed based on anecdotal feedback from participants, discussions with service users in the RADAR-CNS Patient Advisory Board and an RMT engagement survey reported in the literature [30] . The first question asked participants how often they completed the speech recordings on a five-point scale from every time to never. The next two questions asked how comfortable participants felt completing each speech task, also on a 5-point scale, from extremely uncomfortable to extremely comfortable. Two further questions asked participants to select the barriers that prevented them from completing speech recordings at least once, divided loosely into practical and other, with a total of 17 possible barriers. Respondents could also name barriers not listed. These questions addressed the issues of perceived speech task completion, acceptance and barriers to completion; these were identified as core factors to examine in this study. Participants received survey invitations on 14 th October 2020 (UK) and on 4 th November 2020 (Spain) via an emailed link. All MDD participants in the UK and Spain at that time were invited. The survey links were tagged with participants' unique, anonymous study identification numbers to extract matching sociodemographic, depression severity and speech completion data. Responses collected up to 25th January 2021 (UK) and 18th January 2021 (Spain) were pooled for analysis. Response cohort: The effect of sociodemographic variables (age, gender, years in education and country), depression severity and speech task completion rates in survey responders and nonresponders were assessed using logistic regression with bootstrapping to assess the generalisability of the responses. Depression severity was quantified as each individual's IDS score within six weeks of the survey launch. Assessment of comfort: Linear regression with bootstrapping was conducted to identify significant associations between comfort levels for both speech tasks and sociodemographic factors as well as depression severity. Analyses were undertaken using the IBM SPSS Statistics 27 software. Perceived barriers preventing speech recording: The most commonly reported barriers were identified, in addition to the range, mean and standard deviation of the number of barriers. Accuracy of self-reported speech task completion: Selfreported completion was compared with actual speech task completion obtained via the participant identification numbers. A member of the RADAR-CNS Patient Advisory Board provided feedback on the results of these analyses. Survey links were emailed to the 384 participants (UK: 254, Spain: 130), who comprised 76% of the total of 505 enrolments in the two countries. We received responses from 209 participants (UK: 135, Spain: 74), a response rate of 54%. Table 1 details the sociodemographic data, depression severity and speech task completion rates of those who were invited to do the survey and those who responded. Results of a logistic regression model with survey response status as the outcome and sociodemographic factors and depression severity as predictors indicate that older participants were significantly more likely to respond to the survey (Wald χ 2 = 8.80, p < 0.005, odds ratio = 1.03). Gender (Wald χ 2 = 0.84, p > 0.05, odds ratio = 1.31), years in education (Wald χ 2 = 0.004, p > 0.05, odds ratio = 1.00), country (Wald χ 2 = 1.24, p > 0.05, odds ratio = 0.70) and depression severity (Wald χ 2 = 0.45, p > 0.05, odds ratio = 1.01), were all non-significant. Participants who completed the speech survey demonstrated higher completion rates for the scheduled speech tasks than those who did not complete the survey. The average completion rates for survey responders were 51% (SS) and 44% (FS) versus 31% (SS) and 26% (FS) for non-responders. A logistic regression with completion rates of both speech tasks as predictor variables and survey response status as the outcome indicates that participants with higher SS completion were more likely to complete the survey (Wald χ2 = 6.59, p = 0.01, odds ratio = 6.93), though there was no association with FS completion (Wald χ2 = 0.33, p > 0.05, odds ratio = 1.58). This result is unsurprising; the SS task appears before the FS task in the study app, so participants who did not complete the SS task are less likely to have completed the FS task. Also, as the survey indicates (Section 3.2), participants are less comfortable completing the free speech task. The most frequently reported comfort levels were extremely comfortable (SS) and neither comfortable nor uncomfortable (FS) (Figure 1 ). Responses to the FS task are likely to contain personal information, and so the task may provoke an emotional response [3] , [26] . This could have made participants more uncomfortable than when reading aloud a set text. Restrictions due to the COVID-19 pandemic could also have made thinking of things to say in the FS task more challenging. Linear regression demonstrated that depression severity and country were significant predictors of comfort in both speech tasks (Table 2, Figure 2 ). We speculate that the higher cognitive impairment and fatigue associated with severe depression [31] , [32] contributes to the decreased comfort seen with increasing depression severity in both tasks. Depression severity was a stronger predictor of comfort than country in SS, while country was a stronger predictor of comfort in FS. Age, gender and years of education were not significant predictors of comfort in either speech task. The R 2 values (SS: 0.29, FS: 0.16), suggest other factors could also account for the difference in comfort levels. Barriers that prevented participants from completing speech recordings on at least one occasion included practical and technical barriers, such as being unable to find a quiet place to record, and other issues such as low mood (Figure 3 ). In total, 104 (50%) of survey responders reported encountering at least one such barrier. The number of barriers reported per participant ranged from zero to eight (mean = 0.9, SD = 1.2). The most commonly reported barriers were (1) not seeing the app notifications for the speech tasks (36 responses), (2) low mood (33 responses) and (3) forgetfulness (19 responses). RADAR participants receive up to four notifications at the same time -for depression, self-esteem and COVID-19 questionnaires, followed by speech. Potentially, after completing the questionnaires, participants are less motivated to spend time and effort on the speech task, and this could be exacerbated with more severe depression. The majority of participants responding to the survey reported completing the speech task every time or on most occasions that it was scheduled. However, in a striking finding, many respondents markedly overestimated their completion rates in both speech tasks (Figure 4 ). We speculate that several factors contribute to this discrepancy. For example, after completing depression and self-esteem questionnaires, plus a COVID-19 questionnaire since the pandemic began, participants may not notice the notifications for the speech task. Other factors may include recall and social desirability biases, as well as missed app notifications. As the popularity of in-the-wild data collection paradigms grow in the speech community, we need to better understand the associated barriers and facilitators, particularly in health applications. To the best of the authors' knowledge, this preliminary analysis is the first to focus on the experiences of individuals recording their speech using RMT for health research. Gaining insights in this area is vital to maximise research value and participant comfort in future studies. The core issues described raise several questions for investigation that will be addressed in future work. These include the investigation of technical issues that lower speech recording rates and scenarios that may cause participants to overestimate their speech recording rates. The RADAR-CNS speech survey has since been released at the remaining MDD study site and all three sites in the multiple sclerosis arm of the study. The RADAR-CNS project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 115902. This Joint Undertaking receives support from the European Union's Horizon 2020 research and innovation programme and EFPIA (www.imi.europa.eu). This communication reflects the views of the RADAR-CNS consortium and neither IMI nor the European Union and EFPIA are liable for any use that may be made of the information contained herein. The funding bodies have not been involved in the design of the study, the collection or analysis of data, or the interpretation of data. We would like to thank all members of the RADAR-CNS patient advisory board, who all have experience of living with or supporting those who are living with depression, epilepsy or multiple sclerosis. Figure 3 : Self-reported obstacles preventing speech task completion. 1 -didn't see the notifications, 2 -low mood, 3 -forgot 4 -felt unwell in some other way or too tired, 5 -couldn't find a quiet place, 6 -felt self-conscious, 7 -technical problems with phone or app, 8 -didn't have time, 9 -already contribute enough to RADAR-CNS, 10 -privacy concerns, 11 -not interested in recording speech, 12 -speech part of the app difficult to use, 13 -instructions not clear, 14 -other. Speech analysis for health: Current state-of-the-art and the increasing impact of deep learning Natural language processing in mental health applications using non-clinical texts A review of depression and suicide risk assessment using speech analysis Automated assessment of psychiatric disorders using speech: A systematic review Multimodal Data Collection for Social Interaction Analysis Inthe-Wild SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild Into the Wild: Transitioning from Recognizing Mood in Clinical Interactions to Personal Conversations for Individuals with Bipolar Disorder The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild The Speakers in the Wild (SITW) speaker recognition database Barriers to and Facilitators of Engagement With Remote Measurement Technology for Managing Health: Systematic Review and Content Analysis of Findings Clinical state tracking in serious mental illness through computational analysis of speech Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech Automatic Assessment of Depression From Speech via a Hierarchical Attention Transfer Network and Attention Autoencoders A comparison of acoustic and linguistics methodologies for Alzheimer's dementia recognition Squeeze for Sneeze: Compact Neural Networks for Cold and Flu Recognition Fully automated cognitive screening tool based on assessment of speech and language Understanding speech and swallowing difficulties in individuals with Huntington disease: Validation of the HDQLIFE Speech Difficulties and Swallowing Difficulties Item Banks Domain adaptation for enhancing Speech-based depression detection in natural environmental conditions using dilated CNNs Voice analysis as an objective state marker in bipolar disorder Mood state prediction from speech of varying acoustic quality for individuals with bipolar disorder A Framework for Biomarkers of COVID-19 Based on Coordination of Speech-Production Subsystems The PHQ-8 as a measure of current depression in the general population Itemwording and the dimensionality of the Rosenberg Self-Esteem Scale: do they matter? International Phonetic Association, Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology RADAR-Base: Open Source Mobile Health Platform for Collecting, Monitoring, and Analyzing Data Using Sensors, Wearables, and Mobile Devices The Inventory of Depressive Symptomatology (IDS): Clinician (IDS-C) and Self-Report (IDS-SR) ratings of depressive symptoms Remote assessment of disease and relapse in major depressive disorder (RADAR-MDD): A multi-centre prospective cohort study protocol Engagement Strategies for Self-Monitoring Symptoms of Bipolar Disorder With Mobile and Wearable Technology: Protocol for a Randomized Controlled Trial The many faces of fatigue in major depressive disorder A meta-analysis of depression severity and cognitive function