key: cord-0673531-bdct30wl
authors: Chen, Xinru; Hu, Menghan; Zhai, Guangtao
title: Cough Detection Using Selected Informative Features from Audio Signals
date: 2021-08-07
journal: nan
DOI: nan
sha: ca27b28d8bf553214bf6b3a8426fb77bb5d88b28
doc_id: 673531
cord_uid: bdct30wl

Cough is a common symptom of respiratory and lung diseases. Cough detection is important to prevent, assess and control epidemic, such as COVID-19. This paper proposes a model to detect cough events from cough audio signals. The models are trained by the dataset combined ESC-50 dataset with self-recorded cough recordings. The test dataset contains inpatient cough recordings collected from inpatients of the respiratory disease department in Ruijin Hospital. We totally build 15 cough detection models based on different feature numbers selected by Random Frog, Uninformative Variable Elimination (UVE), and Variable influence on projection (VIP) algorithms respectively. The optimal model is based on 20 features selected from Mel Frequency Cepstral Coefficients (MFCC) features by UVE algorithm and classified with Support Vector Machine (SVM) linear two-class classifier. The best cough detection model realizes the accuracy, recall, precision and F1-score with 94.9%, 97.1%, 93.1% and 0.95 respectively. Its excellent performance with fewer dimensionality of the feature vector shows the potential of being applied to mobile devices, such as smartphones, thus making cough detection remote and non-contact.

Cough is one of the commonest symptoms of respiratory and lung diseases such as asthma, pertussis and pneumonia. It is a body mechanism to clear upper respiratory tract and eject excessive amount of mucus and foreign particles from respiratory system [1] . The mechanical phases of a cough event include that the glottis closes after deep inhalation, and the glottis opens while rapid expiratory flow occurs, producing a specific audio signal [2] .

Because cough events convey vital information of the state of the respiratory system and the status of the disease progression, various cough event assessment devices have been developed to detect cough events and calculate cough frequency, cough intensity, cough impact, cough duration and other indicators [3] . In addition to manual assessment methods such as manual cough counter, objective cough assessment devices such as the Leicester Cough Monitor [4] , the Hull Automatic Cough Counter [5] and VitaloJak system [6] are available. Although these objective cough assessment devices can be used as wearable cough detection systems with high accuracy, they are not suitable for large-scale cough screening applications.

Cough event detection for a large number of people plays an important role in epidemic prevention and control, epidemiological research, risk assessment of infectious diseases and other fields. For example, by July 30 2021, COVID-19, a global pandemic, has confirmed 196,553,009 cases and caused 4,200,412 deaths, according to World Health Organization (WHO) [7] . Due to the spread of the mutant COVID-19 coronavirus and the unbalanced vaccine supply, the global COVID-19 pandemic is facing a severe situation again. In the COVID-19 pandemic, the primary symptom of a patient with COVID-19 coronavirus may be cough [8] . Therefore, largescale and contactless cough detecting applications are of great importance. The occurrence of cough events is usually accompanied by the production of some specific sounds. Therefore, many scholars used audio signals to build models, so as to achieve remote detection of cough events. Alsabek reported the importance of Mel Frequency Cepstral Coefficients (MFCC) extraction of COVID-19 and non-COVID-19 samples in cough signal process [9] . Monge-Álvarez reported that using local Hu moments as a robust feature for cough detection with audio signals [10] . Al-Khassaweneh used Wigner distribution and wavelet transform to analyze cough signals for detecting asthma [11] . Pramono developed an algorithm based on features of the audio signals with a logistic regression model to identify cough events [12] . Laguarta developed an Artificial Intelligence (AI) speech processing architecture based on a Convolutional Neural Network (CNN) to screening the coughing symptom of COVID-19 [13] . Imran proposed a COVID-19 diagnosis system using the combination of machine learning and deep learning from cough samples via an App [14] .

Most studies we mentioned above extract a large number of feature and classify with complex deep learning or machine learning model, which may be infeasible or time-consuming to apply in remote mobile devices, such as smartphones. Meanwhile, it may increase the risk of overfitting. Cough detection, a relatively simple problem, does not require complicated technical solutions to solve. What's more, most studies do not include cough samples recorded in scenarios where cough detection is required such as the ward in their dataset. Therefore, it is necessary to develop a cough detection model using small number of features and machine learning algorithm such as Support Vector Machine (SVM) while achieving optimal performance with inpatient cough samples. SVM, a well-known classifier, is widely used in various detection and identification techniques. Kumar presented a person authentication framework using SVM classifier to implement person identification and verification [15] . Song proposed a sound-of-tapping technology based on SVM, including 7 tapping position classification and 6 medium identification [16] . Many studies used SVM as a classifier to implement cough detection. Bhateja reported a method for features extraction and classification of cough sound in noisy environment using SVM [17] . Vhaduri developed a cough and snore detection framework based on several machine learning classifiers including SVM, which can be implemented on a smartphone [18] .

In this paper, we compare the performance of models using different feature space optimization methods, including Random Frog, Uninformative Variable Elimination (UVE) and Variable influence on projection (VIP), and propose the optimal model of automatic cough detection which uses 20 features selected from MFCC features by UVE algorithm and classifies with SVM linear two-class classifier. Our model can achieve 94.9% accuracy of classifying between cough audios and non-cough audios.

In Section 2, we summarize the methods and the specific process of our cough detection model. Section 3 explains the dataset we used to train and test our model, describes the experimental results with performance metrics and discusses the potential and limitations of our study. Section 4 draws a conclusion.

We first extract Mel Frequency Cepstral Coefficients (MFCC) from the cough audios result in a 5000 × 36 feature matrix. Then, we take the top few Principal Component Analysis (PCA) projections of the MFCC features which keep 95% of the main information and combine them into a feature vector with the dimension of 107. By combining MFCC feature extraction method and PCA, it was expected to improve the accuracy in automatic speech recognition system and reduce the feature dimension [19] .

1) MFCC: For automatic speech recognition, MFCC features have been found to a useful feature extraction method [19] . The Mel scale describes the nonlinear characteristics of human ear frequency, and its relationship with frequency can be approximated by the following equation:

It is aimed to reflect audible changes and conform to the auditory characteristics of the human ear more closely when making changes in frequency. Then, the Cepstral analysis is performed on the Mel spectrum of cough audios to compute their Cepstral coefficients, thus obtaining MFCC features. The process of MFCC is shown in Fig. 1(a) . 2) PCA: PCA is a useful dimension reduction method with high correlation level. PCA processing steps can be seen in Fig. 1(b) .

a) : There are m samples with n dimension to form a n × m matrix X. Decentralize each row of X by subtracting the mean of that row.

b) : Calculate the covariance matrix C = 1 m XX . c) : Calculate the eigenvalues and corresponding eigenvectors of the covariance matrix. d) : Form a feature vector. The eigenvectors are arranged in rows from top to bottom according to the size of corresponding eigenvalues, and the first k rows are taken to form matrix P . e) : Y = P X is the feature vector after dimensionality reduction to k.

To select informative features, reduce the complexity of the model and the risk of overfitting, and make future practical application more effective at the same time, we use the following three feature selection algorithms: Random Frog, UVE algorithm and VIP algorithm.

1) Random Frog: Random Frog is an effective feature selection algorithm for high dimensional feature. It can use a small number of feature iteration for modeling and output the possibility of each feature selection, which can be used as a feature selection criterion [20] . Fig. 2 shows the key steps of Random Frog algorithm. Its main calculation steps include the following three steps: a) : Given an initial variable subset V 0 and initialize V 0 with Q variables.

b) : Based on the initial variable subset V 0 , a candidate variable subset V * , including Q * variables, is proposed. Choose V * as V 1 instead of the initial variable subset V 0 . Repeat the above process until N iterations are finished. c) : Compute the selection probability of each variable and use the selection probability as the criterion for selecting variables.

Based on the single 107 × 1 feature vector obtained from PCA, we select 10, 20, 30, 40 and 50 features respectively using Random Frog algorithm.

2) UVE Algorithm: UVE is a feature selection algorithm based on a stability analysis of PLS regression coefficients which can eliminate variables of invalid information in the PLS regression model, thus improving the speed, anti-interference ability, and predictive stability of PLS models [21] . UVE algorithm's main principle is to add artificial random variables to the original variable matrix, calculate a selection criterion for original and random variables, and keep original variables whose selection criterion result is larger than that of random variables. The selection criterion calculation is the ratio of regression coefficient b j and its standard deviation s (b j ), measuring the reliability of the PLS regression coefficient c j [22] .

We select 10, 20, 30 and 40 features respectively from the single 107 × 1 feature vector obtained from PCA using UVE algorithm.

3) VIP Algorithm: VIP algorithm is used to screen out the most relevant variables. VIP is a parameter that calculates the cumulative influence of individual X-variables in a PLS model. Equation 3 gives a detailed calculation of VIP:

where a means PLS dimension, K means the total number of variables, W 2 a means the squared PLS weight, and SS means the explained the sum of squares [23] . The values of VIP above 1 are considered as the most relevant variables, and the values of VIP smaller than 0.5 express irrelevant variables. We select top 10, 20, 30, 40 and 50 features whose VIP values are larger than 1 according to the single 107 × 1 feature vector obtained from PCA.

On the basis of the features selected, we use classifiers to classify cough samples and non-cough samples, thus detecting cough. In the current work, SVM linear dichotomous classifiers is used for the classification. The basic idea of SVM is to find the separation hyperplane that can divide the training data set correctly and maximize the margin between the two classes. SVM constructs a hyperplane in a high dimensional space which can be used for classification, and training data samples are referred as points in the high dimensional space. The points that are nearest to the hyperplane drawn previously are known as Support Vectors, and the distance between these vectors and the hyperplane is defined as margin. SVM model attempts to find the optimal hyperplane with maximum margin distance.

Our dataset consists of cough samples and non-cough samples. The cough samples include cough sounds in Environmental Sound Classification (ESC-50) dataset, self-recorded cough recordings and inpatient cough recordings from the patients with respiratory diseases. The non-cough samples are chosen from labeled environmental recordings in ESC-50 dataset, including animal sounds, natural sounds, human (nonspeech) sounds, interior sounds and other exterior noises [24] . The inpatient cough recordings are collected from inpatients of the respiratory disease department in Ruijin Hospital. The inpatients we recorded suffered from respiratory disease and had symptoms of cough. The cough samples collected were recorded by mobile microphone, and background noise is included in these cough samples. The duration of every sample in our dataset is 5 seconds.

Finally, we obtained 335 cough samples and 335 non-cough samples, a total of 670 samples. In the classification process, we separate our dataset into training dataset and testing dataset. The training dataset contains 266 cough samples and 266 non-cough samples.The test dataset includes 69 cough samples and 69 non-cough samples. All the cough samples in the test dataset are inpatient cough recordings collected from inpatients with respiratory diseases.

We choose the best model for each feature selection algorithm introduced in Section 2 and draw the importance of features in Fig. 3 (left column) . The importance of features from Random Frog, UVE and VIP is respectively described as selection probability, PLS regression coefficient and variable influence on projection. The features with importance above the cut-off threshold drawn in Fig. 3 (left column) are chosen as informative features. Fig. 3 (right column) depicts the feature value of a typical cough audio and a typical noncough audio after PCA processing. The informative features selected by three feature selection algorithms are respectively marked with green dots in the right column of Fig. 3 (a-c) . In addition, Fig. 3 (right column) shows the differences between feature value of the cough audio and the non-cough audio with red dotted lines, indicating that the selected informative features are more effective for classification than the original features. Six features overlap in three models shown in Fig.  3 (right column) despite using different feature selection algorithms, which means three algorithms can select audio features informative to cough.

To evaluate the performance of models, we use the confusion matrix in Table 1 and calculate the performance metrics of accuracy, precision, sensitivity/recall, and F1-score on the test dataset. accuracy = T P + T N T P + T N + F P + F N 

sensitivity/recall = T P T P + F N

F1-score = 2 * precision × recall precision + recall

We first use the single 107 × 1 feature vector obtained from the dimension reduction of PCA to train SVM classifier. Then, we respectively use 10, 20, 30, 40 and 50 features selected by Random Frog, UVE and VIP. We totally build 15 cough detection models and evaluate their performance of detecting cough events via the test dataset. The classification results of these models are presented in Table 2 .

According to the results in Table 2 , we can see that the model with 20 features selected by UVE algorithm can achieve the best performance, allowing classification with accuracy, recall, precision and F1-score of 94.9%, 97.1%, 93.1%, and 0.95, respectively. The results indicate that our cough detection model can reach 94.9% accuracy of classifying between cough event and non-cough event and use fewer dimensionality of the feature vector.

The model with 20 features selected by UVE algorithm shows that the classification performance can be optimized while reducing dimension of the feature vector. Our model has low complexity, so it is well suited for deployment at the edge such as smartphones. In addition, the use of audio combined with our low-complexity model minimizes the computational resources at the edge. Therefore, it is ideal for long-term deployment in public places such as wards and subways to monitor the frequency of coughs and thus providing decision support for inspection and quarantine.

The performance of our cough detection model is limited by the following factors: a) Inadequate ambient noise of the training samples: The training samples we used were recorded in a quiet environment using a stationary mobile microphone, while the test samples have ambient noise. We will try recording cough and non-cough samples in a more noisy and less controlled environment and changing the microphone position during recording, which is more consistent with the actual cough detection applied to mobile devices.

b) Quantity of the training and test samples: Our cough dataset is not so big enough. In the ongoing work, we will enlarge our cough recording dataset and combine deep learning algorithm to optimize our cough detection model. c) Extraction method of features: Our model only uses selected MFCC features. In the future work, we can combine MFCC with LPC coefficient and spectral features, such as spectral flatness and spectral centroid.

We have developed a cough detection model using SVM classifier and selected informative features. To train our model, we have used a cough and non-cough dataset combined ESC-50 dataset with cough samples recorded from people around our authors using mobile microphone. The test dataset we used is inpatient cough recordings collected from inpatients of the respiratory disease department in Ruijin Hospital, who had symptoms of cough. We first extract MFCC features from the audios in dataset, take the top PCA projections of the MFCC features to reduce dimension, and combine them into a single 107 × 1 feature vector. Then, we use three feature selection algorithms, namely Random Frog, UVE, and VIP, and build several models with different numbers of features. Finally, we use SVM linear two-class classifier to classify cough samples and non-cough samples. We have found that the model with 20 features selected by UVE algorithm can achieve the best model performance with 94.9% accuracy, 97.1% recall, 93.1% precision and the F1-score of 0.95, respectively.

In the future work, we will enlarge our dataset, make our model more noise-proof and improve the performance of our models. We will also try to implement our cough detection model on customers' smartphones via an App.

Prevalence, pathogenesis, and causes of chronic cough

Global physiology and pathophysiology of cough: Accp evidence-based clinical practice guidelines

Methods of cough assessment

The leicester cough monitor: preliminary validation of an automated cough detection system in chronic cough

The automatic recognition and counting of cough

Automated cough detection: A novel approach

COVID-19) Dashboard

Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in wuhan, china

Studying the similarity of covid-19 sounds based on correlation analysis of mfcc

Robust detection of audio-cough events using local hu moments

A signal processing approach for the diagnosis of asthma from cough sounds

Automatic identification of cough events from acoustic signals

Covid-19 artificial intelligence diagnosis using only cough recordings

Ai4covid-19: Ai enabled preliminary diagnosis for covid-19 from cough samples via an app

A pervasive electroencephalography-based person authentication system for cloud environment

Sound-of-tapping user interface technology with medium identification

Pre-processing and classification of cough sounds in noisy environment using svm

Nocturnal cough and snore detection in noisy environments using smartphone-microphones

Improvement of mfcc feature extraction accuracy using pca in indonesian speech recognition

Random frog: An efficient reversible jump markov chain monte carlo-like approach for variable selection with applications to gene selection and disease classification

Evaluating photosynthetic pigment contents of maize using uve-pls based on continuous wavelet transform

Elimination of uninformative variables for multivariate calibration

Variable influence on projection (vip) for orthogonal projections to latent structures (opls)

Esc: Dataset for environmental sound classification

(a) 40 features selected by Random Frog (b) 20 features selected by UVE algorithm (c) 10 features selected by VIP algorithm Fig. 3 . Importance of features from three feature selection algorithms (left column) and Differences between feature value of a typical cough audio and a non-cough audio (right column)