key: cord-0812707-epkg65sc
authors: Mukherjee, Himadri; Sreerama, Priyanka; Dhar, Ankita; Obaidullah, Sk. Md.; Roy, Kaushik; Mahmud, Mufti; Santosh, K.C.
title: Automatic Lung Health Screening Using Respiratory Sounds
date: 2021-01-11
journal: J Med Syst
DOI: 10.1007/s10916-020-01681-9
sha: 5735b941063d22b6fbc44cb9a5c7ac83bd3a674f
doc_id: 812707
cord_uid: epkg65sc

Significant changes have been made on audio-based technologies over years in several different fields. Healthcare is no exception. One of such avenues is health screening based on respiratory sounds. In this paper, we developed a tool to detect respiratory sounds that come from respiratory infection carrying patients. Linear Predictive Cepstral Coefficient (LPCC)-based features were used to characterize such audio clips. With Multilayer Perceptron (MLP)-based classifier, in our experiment, we achieved the highest possible accuracy of 99.22% that was tested on a publicly available respiratory sounds dataset (ICBHI17) (Rocha et al. Physiol. Meas. 40(3):035,001 20) of size 6800+ clips. In addition to other popular machine learning classifiers, our results outperformed common works that exist in the literature.

Respiratory diseases are the third leading cause of death worldwide. As rapid growth of respiratory diseases is witnessed around the world, medical research field has gained interest in integrating potential audio signal analysisbased technique. Like in other application domains, audio signal analysis tools can potentially help in analyzing respiratory sounds to detect problems in the respiratory tract. Audio analysis aids in timely diagnosis of respiratory ailments more effortlessly in the early stages of a respiratory dysfunction. Respiratory conditions are diagnosed through spirometry and lung auscultation. Even though, spirometry is one of the most commonly available lung function tests, it is limited to patient's cooperation. As a result, it is error prone. Auscultation is a technique that involves listening to the internal human body sounds with the aid of a stethoscope. Over several years, it has been an effective tool to analyze lung disorders and/or abnormalities. Such procedure is limited to trained physicians. Besides, for various reasons (e.g., faulty instrument), false positives can happen. Therefore, it opens a door to develop computerized respiratory sound analysis tools/techniques, where automation is integral.

Lung sounds are difficult to analyze and distinguish because they are non-stationary and non-linear signals. Automated analysis was made possible with the use of electronic stethoscope. In 2017, the largest publicly available respiratory sound database was compiled and encouraged the development of algorithms that can identify common abnormal breath sounds (wheezes and crackles) from clinical and nonclinical settings. Respiratory sounds are generally classified as normal or adventitious. Adventitious sounds are RS superimposed on normal respiratory sounds, which can be crackles or wheezes. Crackles are discontinuous sounds, explosive, and non-musical. that are typically less than 20 ms that occur frequently in cardiorespiratory diseases associated with lung fibrosis (fine crackles) or chronic airway obstruction (coarse crackles). Wheezes are high pitched sounds that last more than 100 ms. They are common in patients with obstructive airway diseases and indicate obstructive airway conditions, such as asthma and COPD. The dataset contains respiratory cycles that were recorded and annotated by professionals as wheezes, crackles, both, or no abnormal sounds.

Rao et al. [19] discussed acoustic techniques for pulmonary analysis. They studied acoustic aspects for different lung diseases. It includes different type of sounds in the thick of internal and external sounds. Aykanat et al. [3] presented a convolutional network plus mel frequency cepstral coefficient-support vector machine-based approach for lung sound classification. On a dataset of 17930 sounds from 1630 subjects, an accuracy of 86% (for healthypathological classification) was reported. Pramono et al. [18] classified normal respiratory sounds and wheezes on a dataset of 38 recordings. Of 425 events, 223 were wheezes and the rest were normal. They reported a AUC value of 0.8919 with MFCC-based features. Acharya et al. [1] presented a deep learning-based approach for lung sound classification. They reported an accuracy of 71.81% on the ICBHI17 dataset of size 6800+ clips. Dokur [10] used machine learning approaches to distinguish respiratory sounds. In their experiments, nine different categories from 36 patients were used. An accuracy of 92% was reported by using Multilayer Perceptron (MLP).

Melbye et al. [14] studied the classification of lung sounds by 12 observers. They worked with 1 clip each from 10 adults and children and obtained fleiss kappa values of 0.62 and 0.59 for crackles and wheezes, respectively. Among the 20 cases, they found that in 17 cases, the observers concluded presence of atleast 1 adventitious sound. Bahoura and Pelletier [4] used cepstral features to distinguish normal and wheezing sounds. They worked with 12 instances from each class and reported the highest true positive value of 76.6% for wheezing sounds. They also reported 90.6% true positives for normal sounds with fourier transform-based features. Ma et al. [13] developed a system to distinguish lung sounds using a resnet-based approach. On ICBHI17 dataset, an accuracy of 52.26% was reported. Emmanouilidou et al. [11] proposed a robust approach to identify lung sounds in the presence of noise. In their experiments, with 1K+ volunteers (over 250 hours of data), an accuracy of 86.7% was reported.

To analyze lung sounds, Sen et al. [23] used Gaussian mixture model and support vector machine-based classifier. Using 20 healthy and non-healthy subjects, they reported an accuracy of 85%. Demir et al. [9] used a CNN-based approach. On ICBHI17 dataset, the highest accuracy of 83.2% was reported. Chen et al. [7] used a S-transformbased approach coupled with deep residual networks to classify lung sounds: crackle, wheeze, and normal. In their study, the reported accuracy was 98.79%. Kok et al. [12] employed multiple features, such as MFCC, DWT, and time domain metrics to distinguish healthy and nonhealthy sounds. In their study, they reported accuracy, specificity, and sensitivity values of 87.1%, 93.6%, and 86.8%, respectively on the ICBHI17 dataset.

Chambers et al. [6] developed a tool to identify healthy/ non-healthy patients using respiratory sounds. They used several spectral, rhythm, SFX, and tonal features coupled with decision tree-based classification. In their study, they reported an accuracy of 85% on a dataset of 920 records. Altan et al. [2] developed a deep learning-based approach to detect chronic obstructive pulmonary disease. Their tool used Hilbert-Huang transform on multi-channel lung sounds. In their experiment, an accuracy of 93.67% was reported on a dataset of 600 sounds collected from 50 patients. Cohen and Landsberg [8] classified 7 different type of sounds using linear predictive coefficient-based technique. In their experiments, out of 105 instances, 100 were classified correctly.

Even though there exists a rich state-of-the-art literature for lung sound analysis, they do not guarantee optimal performance. Moreover, non-healthy cases are composed of several issues/criteria. Distinguishing healthy sounds from non-healthy sounds is not trivial. Handcrafted feature-based systems are preferred over deep learning-based systems, where computational resource is considered. Secondly, prior to deeper analysis of non-healthy sounds, it is essential to distinguish healthy and non-healthy sounds. A hierarchical approach can aid to reduce the workload of medical experts in resource-constrained regions. After ensuring that whether a person has lung infection, the true positive case can be taken for further treatment(s)/processing.

In this paper, we developed an automated tool, where LPCC-based features are employed. LPCC-based features were chosen due to its ability of modeling a variety of audio signals [15, 16] . In our experiments, on a dataset ICBHI17 (of size 6800+ clips), we achieved an accuracy of 99.22% using MLP.

The remainder of the paper is organized as follows. "Dataset description" discusses on dataset. In "Proposed method: LPCC-based features and MLP", we describe the proposed tool. Experimental results are provided in "Results and analysis". We conclude the paper in "Conclusion".

To develop of a robust system, it is important to ensure that the dataset mimics real-world problems. Our system was trained on a publicly available respiratory sound database [20] , which is associated with the International Conference on Biomedical and Health Informatics (ICBHI). To collect data, disparate stethoscopes and microphones were used. The audios were recorded from the trachea and 6 other chest locations: left and right posterior, anterior, and lateral. The audios were collected in both clinical and non-clinical settings from adult participants of disparate ages. Participants encompassed patients with lower and upper respiratory tract infections, pneumonia, bronchiolitis, COPD, asthma, bronchiectasis, and cystic fibrosis.

The ICBHI database consists of 920 audio samples from 126 subjects. These are annotated by respiratory experts, and used as a benchmark in the field. Each respiratory cycle in the dataset is annotated amidst 4 classes. The annotations basically cover 2 broad groups: healthy and non-healthy. The non-healthy category is further divided into wheeze and crackle with some cycles having both issues. Among 6898 cycles totaling to 5.5 hours, 1864 cycles have crackles while 886 have wheezes. There are 506 cycles, which have both wheezes and crackles.

While recording, the participants were seated. The acquisition of respiratory sounds was performed on adult and elderly patients. Many patients had COPD with comorbidities (e.g., heart failure, diabetes, and hypertension). Further, noise exists, such as rubbing sound of the stethoscope with the patient's dress, and background talking. Such varieties in the data made it challenging to identify problems in the respiratory sounds. One of the most challenging aspects of the audio clips was the presence of heartbeat sound along with the breath sounds. No preprocessing was performed to remove the heartbeat sounds.

For better understanding, visual representations of 200 audio clips from the healthy and non-healthy sounds are shown in Fig. 1 . In Table 1 , a complete dataset is provided. 

As audio clip contains high deviations across its entire length, its analysis is not trivial. Therefore, each audio clip is broken down into smaller segments called frames to facilitate analysis. In our study, we divided each clip into frames consisting of 256 sample points with a 100-point overlap in between them. The parameters were empirically designed. The same 200 audio clips (as in Fig. 1 ) are shown in Fig. 2 

After framing audio clips (into shorter segments), it was observed that in various instances the starting and ending points were not aligned in a frame. These discontinuities/ jitters lead to smearing of power across the frequency spectrum. This posed a problem in the form of spectral leakage during frequency domain analysis which produced additional frequency components. To tackle this, the frames were subjected to a window function. Hamming window was selected for this purpose due to its efficacy as reported in [16] . The same frames (Fig. 2) are presented in Fig. 3 after windowing. The hamming window is mathematically illustrated as

where A(z) is the hamming window function and z is a point within a frame. Thereafter, we performed Linear Predictive Coefficient (LPC) based analysis [15] on each of them. The previous 

where p 1 , p 2 ,. . . , p P are the LPCs or predictors. The error of this prediction E(r) bounded by the actual and predicted samples: (s(r) andŝ(r)) can be explained as

The error of sum of squared differences (as shown below) is minimized to generate the unique predictors for a x sized frame, which can be expressed as

Thereafter, a recursive technique is used to compute the Cepstral coefficients (C), which is expressed as C 0 = log e P C r = p r + r−1 q=1 q r C q p r−q , f or1 < r ≤ P and

Since clips in the dataset were of unequal lengths and number of frames obtained varied. When features were extracted in frame level, it produced different dimensions. To handle this, we performed two operations: a) grading and b) standard deviation measurement.

1. Firstly, the sum of LPCC coefficients in each of the frequency ranges (bands) across all the frames was computed. Based on the sum of these energy values, bands were graded in an ascending order. This sequence of band numbers was used as features that helped in identifying dominance of different bands for the clips from various categories. 2. Secondly, standard deviation was computed for every band. These two metrics were stacked to form the feature, which is independent of the clip length. 10, 20, 30, 40 and 50 dimensional features were extracted for the 2 classes. The trend of the 30 dimensional feature values (best result) for the 2 classes is shown in Fig. 4 .

We emplpyed MLP classfier -feed-forward artificial neural network -for classification purpose [17] . Feedforward neural networks are made up of the input layer, output layer and hidden layer. It is a supervised learning algorithm trained on a dataset using a function f () : Z n −→ Z o , where n and o represent the dimensions for input and output. For a given set of features P = p 1 , p 2 , . . . , p n and aim x, a non-linear function is learned for classification. The difference between MLP and logistic regression lies in the existence of one or more non-linear layers (hidden layers) between the input and the output layer. MLP consists of three or more layers (input layer, output layer and one or more hidden layers) of non-linear activating neurons. The number of hidden layers can be increased according to the requirement of developing a model to accomplish certain task. The initial layer is the input layer which comprises of a set of neurons {p i | p 1 , p 2 , . . . , p n } denoting the features. Each neuron of the hidden layer modifies the values from the previous layer using sum of weights as w 1 p 1 + w 2 p 2 +, . . . , +w n p n .

The activation function that represents the relationship between input and output layer in of non-linear nature. It makes the model flexible in defining unpredictable relationships. The activation function can be expressed as 

Accuracy is not enough to measure the performance of any system. It is also much important to analyze the disparate misclassifications. Hence, to evaluate our tool, the following performance metrics are used: Precision, Accuracy, Sensitivity (Recall), Specificity, and Area under ROC curve (AUC). They are computed as Accuracy = T P + T N T P + T N + F P + F N ,

where T P , T N , F P , and F N refer to true positive, true negative, false positive, and false negative, respectively. To avoid possible bias in evaluation, 5-fold cross validation was used.

The performance of the different features are provided in Table 2 . It is observed that the best result was obtained with 30 dimensional features and it's corresponding confusion matrix is provided in Table 3 .

Next, the momentum was varied from 0.1 to 0.5 with a step of 0.1, and results are provided in Table 4 . The best result was obtained for a momentum of 0.1 whose interclass confusions are provided in Table 5 . As compared to the default scenario, there were 4 more misclassifications in the case of the healthy cases (and 9 less misclassifications for the non-healthy cases). Finally, the momentum was varied from 0.1-0.6 with a step of 0.1 whose results are provided in Table 6 . In our experiment, the highest performance was obtained when a learning rate of 0.5 was selected. We presented a confusion matrix for this setup in Table 7 . It is observed that the number of misclassifications for both classes was reduced as compared to the initial setup. The misclassified instances were analyzed, and it was found that many of them had heartbeat sounds. Along with this, other unwanted artefacts, such as talking and movement of the probe helped in misclassifying.

It is observed that the misclassified instances was reduced by almost 15.63% as compared to the original setup using default settings. As compared to best result, after momentum tuned, a decrease of nearly 8.47% occurred for the misclassified instances.

A deeper analysis of the misclassifications revealed that approximately 0.74% of the healthy cases were misclassified as opposed to non-healthy. In the case of nonhealthy instances, approximately 0.83% of the clips were misclassified as healthy, which we call false negative.

The different performance metrics were computed for the default setup, best momentum, and best learning rate (overall highest). Such results are provided in Table 8 . The ROC curves for these scenarios are shown in Fig. 5 .

The performance of several other classifiers was compared in order to establish the efficacy of MLP. For comparison, the 30 dimensional feature set (best performance) was chosen. We experimented with BayesNet, SVM, RNN, Table 9 .

We also compared the performance of our system with reported works by Kok et al. [12] and Chambers et al. [6] . The average accuracies for both the systems along with the proposed system are provided in Table 10 . Kok et al. [12] 87.10 Chambers et al. [6] 85.00 Proposed technique 99.22

In this paper, we developed a tool to detect respiratory sounds that come from respiratory infection carrying patients. We have employed Linear Predictive Cepstral Coefficient (LPCC)-based features to characterize respiratory sounds. With Multilayer Perceptron (MLP)-based classifier, in our experiment, we have achieved the highest possible accuracy of 99.22% (AUC = 0.9993) on a publicly available dataset of size 6800+ clips. In addition to other popular machine learning classifiers, our results outperformed common works that exist in the literature. Not limiting to binary classification (health/non-healthy), our immediate plan is to classify disease types from nonhealthy category. This will help identify the nature and severity of infection. As we observed that COVID-19 could possibly screened by analyzing respiratory sound [5] , we are now extending our experiments on COVID-19 [21, 22] .

Deep neural network for respiratory sound classification in wearable devices enabled by patient specific model tuning

Deep learning on computerized analysis of chronic obstructive pulmonary disease

Classification of lung sounds using convolutional neural networks

New parameters for respiratory sound classification

Exploring automatic diagnosis of covid-19 from crowdsourced respiratory sound data

Automatic detection of patient with respiratory diseases using lung sound analysis

Tripleclassification of respiratory sounds using optimized s-transform and deep residual networks

Analysis and automatic classification of breath sounds

Classification of lung sounds with cnn model using parallel pooling structure

Respiratory sound classification by using an incremental supervised neural network

Computerized lung sound screening for pediatric auscultation in noisy field environments

A novel method for automatic identification of respiratory disease from acoustic recordings

Lungrn+ nl: An improved adventitious lung sound classification using non-local block resnet neural network with mixup data augmentation

Wheezes, crackles and rhonchi: simplifying description of lung sounds increases the agreement on their classification: a study of 12 physicians' classification of lung sounds from video recordings

Linear predictive coefficients-based feature to identify top-seven spoken languages

Misna-a musical instrument segregation system from noisy audio with lpcc-s features and extreme learning

Multilayer perceptron, fuzzy sets classifiaction

Evaluation of features for classification of wheezes and normal respiratory sounds

Acoustic methods for pulmonary diagnosis

An open access database for the evaluation of respiratory sound classification algorithms

Ai-driven tools for coronavirus outbreak: Need of active learning and cross-population train/test models on multitudinal/multimodal data

COVID-19 prediction models and unexploited data

A comparison of svm and gmm-based classifier configurations for diagnostic classification of pulmonary sounds

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations