key: cord-0187033-k2zxpd0u authors: Truong, Hoang Van; Sinh, Nguyen Thi Anh; Cuong, Nguyen Quoc; Phong, Nguyen Xuan title: Sound-Dr Dataset and Baseline System for Detecting Respiratory Anomaly date: 2022-01-12 journal: nan DOI: nan sha: fe38593d5fb7001070d461e25ecf9d352baad803 doc_id: 187033 cord_uid: k2zxpd0u As the COVID-19 pandemic significantly affects every aspect of human life, there is an urgent need for high-quality datasets for further COVID-19 research. We, therefore, introduce Sound-Dr dataset that not only provides quality coughing, mouth breathing, nose breathing sounds, but also valuable related metadata for detecting relevant-respiratory illness. We propose a proof-of-concept system that is effective for the detection of abnormalities in the respiratory sounds of patients. Our system has a promising processing time and good accuracy for real-time trials on mobile devices. The proposed dataset and system will serve as effective tools to assist physicians in diagnosing respiratory disorders. There is an abnormality in the respiratory sound of individuals with fever, asthma, tuberculosis, pneumonia and COVID-19 compared to the sound of those without these conditions. A solid body of literature have shown the effectiveness of respiratory sounds in diseases detection [1] - [4] . The demand for remote diagnosis to examination and treatment has increased rapidly, especially when COVID-19 has deeply impacted the global health systems. As of 12 February 2022, over 400 million confirmed cases and over 5.7 million deaths have been reported globally [5] . It is now spanning across 200 countries quickly, and the number of COVID-19 infections per day is consistently reported at an alarming rate. It is crucial that effective COVID-19 detection methods be accessible for people from all walks of life. The use of rapid antigen test (ART) and polymerase chain reaction (PCR) tests has been proven effective, yet, costly and time consuming. Breathing or coughing test, which takes only 1 to 3 seconds, is faster and more reliable than conventional temperature measurement. System and data can be periodically updated, thereby accuracy and reliability can be improved. Since the outbreak of the pandemic, the need for quick and economical health check method have increased. Locations where entry and exit are allowed, such as airports, may apply breathing or coughing test. Locations, which has employees and customers such as companies, factories, supermarkets, may also apply. There are some respiratory sound datasets to detect diseases such as the 2017 Internal Conference on Biomedical Health Informatics (ICBHI) [6] in which each audio recording identifies the patients in terms of being healthy or exhibiting one of the following respiratory diseases or conditions including COPD, Bronchiectasis, Asthma, Upper and Lower respiratory tract infection, Pneumonia and Bronchiolitis. To address some respiratory sound datasets for COVID-19 detection, at first, there are two major datasets from New York [7] and Cambridge [8] Universities. These respiratory datasets, however, are not publicly available. Some public datasets are Coswara [9] and Coughvid [10] . However, it is still limited for scientists to exploit and improve effective method. Therefore, we built a system to collect respiratory sound data in the most efficient way. We named this dataset-Sound-Dr dataset. Sound-Dr dataset is collected under many different diseases such as Fever, Asthma, and Covid as the dataset is designed to support researchers in solving different problems related to respiratory diseases. This is also suitable for multi-labeling tasks in machine learning and abnormally detection tasks. Sound-Dr dataset collects three types of respiration sounds such as nose breathing, mouth breathing and coughing to help scientists have more options to build a detection system or test the model in a more various way. Beside the audio recordings, metadata along with health related habits (e.g., smoking, insomnia) is included to build a better detection system. Sound-Dr dataset is designed to collect data in an efficient way such as recording each cough to reduce the impact of noise. Sound-Dr dataset aims to provide an additional dataset on respiration to increase the distribution of data (mainly subjects in Vietnam) when the data is still very limited and very local in certain areas. In addition, we built a baseline system on the Sound-Dr dataset that gives us an overview of the initial accuracy of the database as well as the efficiency of the data collection. The organization of the paper is as follows: Section I contains the related works and literature survey; Section II describes the aspects of the provided dataset; the pre-processing steps and the description containing our baseline system have been explained in Section III; results and discussions are present under Section III-C; finally, section IV concludes the paper with future scopes and improvements. Respiratory sounds might indicate the health status of a person. From the insights of these sounds, the respiratory health of a person can be predicted to be normal or abnormal by machine learning algorithms, especially the health status towards COVID-19. A respiratory sound is a sound produced by your lungs during inhalation or exhalation. Doctors usually use stethoscopes to check the breathing of individuals. Listening to respiratory sounds is a critical component of the diagnosis procedure for a variety of different diseases. Abnormal respiratory sounds are indicative of a lung or airway issue. COVID-19 detection requires high-quality respiratory data for researchers to develop rapid screening methods. To meet this demand, Sound-Dr dataset provides not only quality coughing and breathing sounds but also metadata for researching illnesses or diseases related to respiratory systems. Sound-Dr dataset was collected during the peak season of the COVID-19 pandemic in Vietnam from August 2021 to October 2021 with support from FPT Software Company Limited [11] To collect data, we created web-based and mobile-based applications on which users can easily interact and record three For each audio type, users are requested to record at least three times with the minimum duration of 5 seconds in each turn. The sample rate of 48,000 Hz is set to be default and no noise reduction method is used in the web-based or mobilebased applications to collect the true nature of the data. Additionally, some metadata of users is also collected via a survey form which includes personal information (e.g. age and gender), related respiratory illness symptoms, smoking status, and their COVID-19 diagnosis. We obtained a dataset of 3,930 sound recordings; the distribution of coughing, mouth breathing, and nose breathing is presented in Figure 1 . There are totally 1,310 subjects with gender distribution shown in Figure 2 and age distribution presented in Figure 4 . It can be seen that more male (e.g. 60%) than female (e.g. 40%) participated in our program. In terms of age groups, subjects of 20 to 40 years old are dominant. Regarding the smoking status, Figure 3 indicates that 90% of the subjects are non-smokers. Given the Sound-Dr dataset, we propose three main tasks: (I) Detect negative or positive COVID-19 subjects, (II) Detect subjects with and without related respiratory symptoms, and (III) Detect healthy subjects and unhealthy subjects (i.e. unhealthy subjects are positive COVID-19 or present related respiratory symptoms). For each task, the audio input of coughing, mouth breathing mouth, and nose breathing are evaluated independently. Based on the metadata as shown in Table I , the total of 1,310 subjects are separated into: COVID-19 negative and COVID-19 positive subjects, subjects with and without symptom, and healthy and unhealthy subjects for task I, II, and III respectively as shown in Figure 5 . To evaluate the Sound-Dr dataset for each defined task, we apply 5-fold cross validation method where the final result is an average of the 5 folds. The eveluation metrics in use are Accuracy (Acc), F1 score [12] , and AUC [13] . Given Sound-Dr dataset, we develop a deep-learning based framework to explore which is referred to as the baseline. Generally, the baseline framework can be separated into two main steps: Front-end feature extraction and back-end classification. The raw audio from one channel (mono) are firstly resampled with a sample rate of 16000 Hz using Librosa toolkit [14] . Then, re-sampled audio recordings are fed into a pretrained model to extract embedding features. In this paper, the pre-trained model is from both TRILL [15] and FRILL [16] , which is recommended for down-stream tasks on non-semantic speech signals. Using TRILL to extract feature from Cough sounds for detecting COVID-19 has also been proven effective [17] . While pre-trained TRILL model is based on ResNet architecture presenting a large footprint, pre-trained FRILL model is built on MobileNet architecture with leveraging knowledge distillation from pre-trained TRILL model. As a result, pretrained FRILL model is suitable for real-time application on edge device (i.e. FRILL pre-trained model is 32 times faster on a Pixel 1 smartphone and equals to 40% of TRILL size, but still competitive to TRILL model with an average decrease of only 2% in terms of accuracy). Both pre-trained models are trained on an one-second duration. This means that we obtain one embedding (i.e. 2048dimensional vector) from every second when feeding the audio recordings with different lengths from Sound-Dr dataset into the models. Hence, we obtain multiple embeddings representing for one audio recording. Consequently, we conduct two statistic features of mean and standard deviation across the time axis. We then concatenate these features to create one embedding (4096-dimensional vector) which represents for each audio input recording. To classify extracted embedding features into certain groups defined in Section II-C, we use XGBClassifier [18] . To finetune hyper-parameters of this classifier, we make use of the Optuna framework [19] with Grid Search algorithm. All these back-end classification models are implemented by using XG-Boost library [20] for XGBClassifier and Scikit-Learn toolkit [21] for the others. We experiment the task of COVID-19 Detection based on the three collected sound types: Cough, Breathe mouth and Breathe nose. The performance using Breathe mouth and Breathe nose are lower compared the Cough sound data. The best performance using Cough sound achieves the score of 88.30 AUC, 74.14 F1, 86.26 Accuracy. Although TRILL outperform FRILL on Accuracy about 0.3% (86.56-86.26 Acc), on F1 and AUC metric, FRILL performs better for 2% (74.14-71.34, 88.30-86.56 AUC). Therefore, we use FRILL for our baseline model. This is satisfactory for the real environment that needs fast, accurate detection, especially on the mobile device. In addition, we also experiment with the Abnormal Detection in respiratory sound by adjusting the label which we combine the COVID-19 Positive and Symptomatic status into Abnormal labels. Using XGBClassifier with hyper-parameters shown in Table III , we achieve promising results of 82.68 AUC, 70.71 F1, and 79.01 Accuracy. The performance comparison is described in Table IV . This shows that our dataset has potential for Anomaly Detection in Respiration Sound. In the future, a model based on the Sound-Dr dataset could be built to support the doctor diagnosis disease faster and more accurate. High-quality respiratory sound data, which can be used to detect patient symptoms, is still limited. Thus, Sound-Dr dataset is essential for researchers to build health applications. Sound-Dr dataset is collected with 3,930 sound recordings from more than 1,300 subjects using a noise reduction method. We also build a system to evaluate this collect way and create the first baseline for future research to benchmark against. Base on the conduct of the experimented result, Sound-Dr dataset is collected in an effective way. We build a model using FRILL embedding and XGBoost classifier for potential real-life context that necessitates rapid and accurate detection. It also help the researchers easy to explore to improve the performance compare with the baseline system. With Sound-Dr dataset, we hope that researchers can build an Artificial Intelligence model that can help doctors diagnose diseases faster and more accurately. ACKNOWLEDGMENT This work is supported by FPT Software AI Committee of FPT Software Company Limited [11] in Hanoi, Vietnam. FPT Software is a global technology and IT services provider headquartered in Vietnam. As the pioneer in digital transformation, the company delivers world-class services in Smart factories, Digital platforms, RPA, AI, IoT, Cloud, AR/VR, BPO, and more. Automatic cough classification for tuberculosis screening in a real-world environment Diagnosis of pneumonia from sounds collected using low cost cell phones Use of cough sounds for diagnosis and screening of pulmonary disease Quantified breathing patterns can be used as a physiological marker to monitor asthma WHO Coronavirus Disease (COVID-19) Dashboard An open access database for the evaluation of respiratory sound classification algorithms NYU Breathing Sounds for COVID-19 Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data Coswara -a database of breathing, cough, and voice sounds for covid-19 diagnosis The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms Fpt softwave company limited The truth of the f-measure The use of the area under the roc curve in the evaluation of machine learning algorithms librosa: Audio and music signal analysis in python Towards learning a universal non-semantic representation of speech FRILL: A Non-Semantic Speech Embedding for Mobile Devices A cough-based deep learning framework for detecting covid-19 Greedy function approximation: A gradient boosting machine Optuna: A nextgeneration hyperparameter optimization framework Xgboost: A scalable tree boosting system Scikit-learn: Machine learning in Python