key: cord-0729946-psk0pxsl authors: Elbeji, A.; Zhang, L.; Higa, E.; Fischer, A.; Despotovic, V.; Nazarov, P. V.; Aguayo, G. A.; Fagherazzi, G. title: Development of a vocal biomarker for fatigue monitoring in people with COVID-19 date: 2022-03-02 journal: nan DOI: 10.1101/2022.03.01.22271496 sha: 48b3642ab630f91ad45aab50f589fe22ed5ccfe5 doc_id: 729946 cord_uid: psk0pxsl Objective To develop a vocal biomarker for fatigue monitoring in people with COVID-19. Design Prospective cohort study. Setting Predi-COVID data between May 2020 and May 2021. Participants A total of 1772 voice recordings was used to train an AI-based algorithm to predict fatigue, stratified by gender and smartphone s operating system (Android/iOS). The recordings were collected from 296 participants tracked for two weeks following SARS-CoV-2 infection. primary and secondary outcome measures Four machine learning algorithms (Logistic regression, k-nearest neighbors, support vector machine, and soft voting classifier) were used to train and derive the fatigue vocal biomarker. A t-test was used to evaluate the distribution of the vocal biomarker between the two classes (Fatigue and No fatigue). Results The final study population included 56% of women and had a mean (SD) age of 40 (13) years. Women were more likely to report fatigue (P<.001). We developed four models for Android female, Android male, iOS female, and iOS male users with a weighted AUC of 79%, 85%, 86%, 82%, and a mean Brier Score of 0.15, 0.12, 0.17, 0.12, respectively. The vocal biomarker derived from the prediction models successfully discriminated COVID-19 participants with and without fatigue (t-test P<.001). Conclusions This study demonstrates the feasibility of identifying and remotely monitoring fatigue thanks to voice. Vocal biomarkers, digitally integrated into telemedicine technologies, are expected to improve the monitoring of people with COVID-19 or Long-COVID. The Predi-COVID study is supported by the Luxembourg National Research Fund (FNR) (Predi-COVID, grant number 14716273), the André Losch Foundation, and the Luxembourg Institute of Health. Competing interests: None declared . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. Coronavirus disease 2019 (COVID-19) is a global outbreak. More than 199 million confirmed cases of COVID-19 have been detected worldwide as of 4 August 2021, with more than 4 million deaths reported by the World Health Organization 1 . The worldwide population and healthcare systems have been greatly impacted by the COVID-19 pandemic. The pandemic has essentially put whole healthcare systems under pressure, requiring national or regional lockdowns 2 . Finding solutions that allow healthcare providers to focus on the more important and urgent patients, was, and still is, critical. This outbreak continues to impact people, with many patients suffering from a range of acute symptoms, such as fatigue. Fatigue is a common symptom in patients with COVID-19 that can impact their quality of life, treatment adherence, and can be associated with numerous complications 3 . Recent findings showed that fatigue is a major symptom of the frequently reported Long-COVID syndrome. After recovering from the acute disease caused by the SARS outbreak, up to 60% of patients reported chronic fatigue 12 months later 4 . This supports the need for long-term monitoring solutions for these patients. In general, fatigue can be of two types: physical and mental 5 experiencing lack of energy, inability to start and perform everyday activities, and lack of desire to do things. In the context of COVID-19, determinants of fatigue were categorized as both central and psychological factors, the latest might also be indirectly caused by pandemic-related fear and anxiety 6, 7 . Fatigue affects men and women differently and has previously been shown to be reported differently in the two genders. Men and women have different anatomy and physiology, resulting in significant sex differences in fatigability 8 . Telemedicine, artificial intelligence (AI), and big data predictive analytics are examples of digital health technologies that have the potential to minimize the damaging effects of COVID-19 by improving responses to public health problems at a population level 9 . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Voice is a promising source of digital data since it is rich, user-friendly, inexpensive to collect, and non-invasive, and can be used to develop vocal biomarkers that characterize disease states. Previous research was mostly conducted in the field of neurodegenerative diseases, such as Parkinson's disease 11 and Alzheimer's disease 12 . There are also studies that confirm the relation of voice disorders to fatigue, e.g., in Chronic Fatigue Syndrome (CFS). Neuromuscular, neuropsychological and hormonal dysfunction associated with CFS can influence the phonation and articulation, and alter tension, viscosity and thickness of the tissue of the larynx, tongue and lips, leading to decreased voice quality 13 . Increased fatigue affects voice characteristics, such as pitch, word duration 14 and timing of articulated sounds 15 . Vocal changes related to fatigue are more observed in consonant sounds that require a high average airflow 16 . In the context of the COVID-19 pandemic, respiratory sounds (e.g coughs, breathing, and voice) are also used as sources of information to develop COVID-19 screening tools 17, 18, 19 . However, no previous work has been devoted to investigating the association of voice with COVID-19 symptoms. We hypothesized that there is an association between fatigue and voice in patients with COVID-19 and that it is possible to train an AI-based model to identify fatigue and subsequently generate a digital vocal biomarker for fatigue monitoring. We used data from the large hybrid prospective Predi-COVID cohort study to investigate this hypothesis. This project uses data from the Predi-COVID study 20 . Predi-COVID is a hybrid cohort study that started in May 2020 in Luxembourg and involved participants who should meet all of the following requirements: (1) a signed informed consent form; (2) participants with confirmed SARS-CoV-2 infection as determined by PCR at one of Luxembourg's certified laboratories; and (3) 18 years and older. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) This study combines data from the national surveillance system, which is used for virtually all COVID-19 positive patients. Biological sampling, electronic patientreported outcomes, and smartphone voice recording were collected to identify vocal biomarkers of respiratory syndromes and fatigue in this study. More details about the Predi-COVID study can be found elsewhere 20 . The National Research Ethics Committee of Luxembourg (study number 202003/07) gave a favorable opinion to the study in April 2020. Health Inspection collaborators made the initial phone contact with potential participants. Those who consented to participate were contacted by a qualified nurse from the Clinical and Epidemiological Investigation Center (CIEC -Luxembourg Institute of Health), who outlined the study and arranged home or hospital visits. Participants were followed for up to a year using a smartphone app to collect voice data. To ensure a minimum quality level, participants were asked to record it in a quiet environment while maintaining a certain distance from the microphone, and an audio example of what was required was also provided. All the participants of this study were invited to record two audio types. The first, Type 1 audio, required participants to read paragraph 1 of article 25 of the Declaration of Human Rights 21 , in their preferred language: French, German, English, or Portuguese; and the second, Type 2 audio, required them to hold the [a] vowel phonation without breathing for as long as they could (see Supplementary Online Material 1 for more details). Predi-COVID collects data in conformity with the German Society of Epidemiology's best practices guidelines 22 . To draft the manuscript, we followed the TRIPOD criteria for reporting AI-based model development and validation, as well as the corresponding checklist. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.03.01.22271496 doi: medRxiv preprint 7 All Predi-COVID participants recruited between May 2020 and May 2021 who reported their fatigue status ("I feel well" as "No Fatigue" and "I am fatigued"/"I don't feel well" as "Fatigue") on the same day as the audio recordings during the 14 days of follow-up were included in this study 23 . As a result, several audio recordings for a single participant were available for both audio types 24 . The audio recordings were collected in two formats, 3gp format (Android devices) and m4a format (iOS devices). Based on the smartphone's operating system and the user's gender (male/female), we trained one model for each category. This stratification was performed to minimize data heterogeneity and deal with sex as a potential confounding bias. All of the raw audio recordings were pre-processed ( Figure 1 ). They were initially converted to .wav files, with audios lasting less than 2 seconds being excluded. Then, an audio clustering (DBSCAN) on basic features was performed (duration, the average, sum, and standard deviation of signal power, and fundamental frequency) to detect the outliers and exclude poor quality audios. Finally, peak normalization was used to boost the volume of quiet audio segments, and leading and trailing silences longer than 350 seconds were trimmed. We used transfer learning for the feature extraction process since it is adapted for small training databases 25 . Transfer learning is a technique where a model is constructed and trained with a set containing a large amount of data and then transfer and apply this learning to our dataset on top of it. It has the advantage of reducing the amount of data required while shortening training time and improving performance when compared to models built from scratch 26 . Convolutional neural networks require a fixed input size, whereas audio instances in our dataset were of variable length. To deal with this issue, Zero-padding was used to set the duration of each audio file to 50 seconds (the maximum length in our . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.03.01.22271496 doi: medRxiv preprint 8 database). To raise the amount of information fed to the classifiers, type 1 and type 2 audios were concatenated and used as a single input to the learning models. All the audio recordings were first resampled to 8kHz and then converted to Melspectrograms using the Librosa library in Python. The hop-length was 2048 samples, and the number of Mel coefficients was set to 196. The Mel spectrograms were passed through VGG19 convolutional neural network architecture provided by Keras, which was pre-trained on the ImageNet database 27 . This approach, presented in This large number of features is computationally expensive. Principal Component Analysis (PCA) 28 is therefore used for dimensionality reduction and to select the number of relevant components explaining the maximum of the variance in the data. We divided our data into "Fatigue" and "No Fatigue" groups based on the participant's reported answers for the inclusion and daily fatigue assessment of Predi-COVID. To characterize participants, descriptive statistics were used, which included means, standard deviations for quantitative variables, and counts and percentages for qualitative variables. The two population groups (3gp (Android users) and m4a (iOS users)) were compared using a student test for continuous variables, and a χ 2 test for categorical variables. accuracy, F1-score, precision, and recall. The Brier score was also used to evaluate the calibration of the selected models. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.03.01.22271496 doi: medRxiv preprint 9 The predicted probability of being classified as fatigued from the best model was considered as our final vocal biomarker, which may be used as a quantitative metric to monitor fatigue. Fatigue) and performed a t-test between the two groups. The final study population is composed of 296 participants of whom 165 were women (56%), with an average age of 40 years (SD = 13). To record both audio types,109 (37%) participants utilized Android smartphones (3gp format), whereas 187 (63%) used iOS devices (m4a format). We found no difference in the distribution of age, gender, body mass index, smoking, antibiotic usage, and asthma, between the two types of devices (P-value>.05). The overall rate of comorbidities in this study was relatively low: there were 31 (10%) participants who used antibiotics and only 12 (4%) participants with asthma. More details are shown in Table 1 . We reduced the extracted features from Mel-spectrograms to 250 top components with PCA, explaining 97% and 99% of the variance in the data for iOS and Android audio sets respectively. We then compared the performances of the machine learning algorithms to select the best models for the derivation of the vocal biomarkers. The voting classifier was the best model selected for the development of the vocal biomarker for male iOS users, with an AUC of 82% and overall accuracy, precision, recall, and f1-score of 84%. The model selected for female iOS users was SVM with . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.03.01.22271496 doi: medRxiv preprint an overall precision of 80% and an AUC of 86%. For male Android users, the selected model is the voting classifier with a precision and recall of 89%, a f1-score of 88%, and a weighted AUC of 85%. For female Android users, the SVM was selected with an overall precision of 79% and an AUC of 79%. More details are shown in Table 2 . The calibrations of the selected models were good (Mean Brier Scores = 0.15, 0.12, 0.17, and 0.12 respectively for Android female users, Android male users, iOS female users, and iOS male users). Based on the model selected for each audio set, we derived the trained vocal biomarkers which quantitatively represent the probability of being labeled as fatigued. As shown in Figure 3 , we found a significant difference in the distributions of vocal biomarkers between the fatigue and no fatigue classes in our testing dataset (t-test P<.001). In this study, we built an AI-based pipeline to develop a vocal biomarker for both genders and both types of smartphones (male/female, Android/iOS) that effectively recognize fatigued and non-fatigued participants with COVID-19. We stratified the data to prevent data heterogeneity, which is considered contamination and makes it difficult to build a reliable and consistent classification model(s), resulting in poorer prediction performance. This contamination is caused by two factors: first, significant gender differences in fatigability, since it has previously been shown that men and women experience and report fatigue differently, and second, different microphone types incorporated in both smartphone devices used by the participants (iOS and Android), which have a direct impact on the quality of the recorded audios (machine learning algorithms separate the audio formats rather than the fatigue status if there is no constant microphone. (see Supplementary Online Material 2 for more details). With the increased interest in remote voice analysis as a noninvasive and powerful telemedicine tool, various studies have been carried out, mostly in neurological disorders (eg, Parkinson's disease 11 and Alzheimer's disease 29 ) and mental health . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Human voice is produced by the flow of air from the lungs through the larynx, which causes the vocal fold vibrations, generating a pulsating air stream 39 . The process is controlled by the laryngeal muscle activation 40 but involves the entire respiratory system to provide the air pressure necessary for phonation. Decreased pulmonary function in COVID-19 patients can cause reduced glottal airflow that is essential for normal voice production 41 . Furthermore, in case of increased fatigue, the voice production process may be additionally disturbed due to reduced laryngeal muscle tension, resulting in dysphonia that appears in up to 49% of COVID-19 patients 41 . This study has several limitations. First, although our data was stratified based on gender and smartphone devices, the mix of languages might also result in different voice features subsequently, in different model performances. There is presently no comparable dataset with similar audio recordings for further external validation of our findings. Thus, more data should be collected to improve the transferability of our vocal biomarker to other populations. Second, our data labeling was only based on a . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; 12 qualitative self-reported fatigue status. A fatigue severity scale would allow a quantitative assessment of fatigue severity in a uniform and unbiased way throughout all participants. Finally, time series voice analysis for each participant was not included in the study. More investigation, including time series analysis, would establish a personalized baseline for each participant, potentially enhancing the performance of our vocal biomarkers. In this study, we demonstrated the association between fatigue and voice in people with COVID-19 and developed a fatigue vocal biomarker that can accurately predict the presence of fatigue. These findings suggest that vocal biomarkers, digitally incorporated into telemonitoring technologies, might be used to identify and remotely monitor this symptom in patients suffering from COVID-19 as well as other chronic diseases. We thank all participants that accepted to be involved in the study, members that collaborated to the launch and monitoring of the Predi-COVID cohort, as well as its scientific committee, the IT team responsible for the development of the application, and the nurses in charge of recruitment, data collection, and management on the field. Elbéji and Fagherazzi had full access to all of the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design: Fagherazzi, Zhang, Fischer. Drafting of the manuscript: Elbéji. Statistical analysis: Elbéji, Zhang, Higa, Fischer. Obtained funding: Fagherazzi. Administrative, technical, or material support: Fischer. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.03.01.22271496 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ; https://doi.org/10.1101/2022.03.01.22271496 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) follow-up period . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. COVID-19) Dashboard Psychological morbidities and fatigue in patients with confirmed COVID-19 during disease outbreak: prevalence and associated biopsychosocial risk factors One-year outcomes and health care utilization in survivors of severe acute respiratory syndrome Neuropsychological and neurophysiological correlates of fatigue in post-acute patients with neurological manifestations of COVID-19: Insights into a challenging symptom Post-COVID-19 Fatigue: Potential Contributing Factors COVID-19 pandemic and psychological fatigue in Turkey Sex differences in human fatigability: mechanisms and insight to physiological responses Applications of digital health for public health responses to COVID-19: a systematic scoping review of artificial intelligence, telehealth and related technologies International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted Precision Medicine for COVID-19: Phenotype Anarchy or Promise Realized? Investigating voice as a biomarker: Deep phenotyping methods for early detection of Parkinson's disease Longitudinal Speech Biomarkers for Automated Alzheimer's Detection Differences in self-rated, perceived, and acoustic voice qualities between high-and low-fatigue groups Speech during sustained operations Automatic measurement of aspects of speech reflecting motor coordination Fatigue estimation using voice analysis Detection of COVID-19 from voice, cough and breathing patterns: Dataset and preliminary results The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms. Scientific Data The voice of COVID-19: Acoustic correlates of infection in sustained vowels Protocol for a prospective, longitudinal cohort of people with COVID-19 and their household members to study factors associated with disease severity: the Predi-COVID study Universal Declaration of Human Rights | United Nations Guidelines and recommendations for ensuring Good Epidemiological Practice (GEP): a guideline developed by the German Society for Epidemiology Audio recordings of COVID-19 positive individuals from the prospective Predi-COVID cohort study with their fatigue status Transfer Learning for Small Dataset A survey of transfer learning Very Deep Convolutional Networks for Large-Scale Image Recognition A Review of Principal Component Analysis Algorithm for Dimensionality Reduction Automatic speech analysis for the assessment of . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 2, 2022. ;