key: cord-1040030-qp7urgil
authors: Balamurali, B T; Hee, Hwan Ing; Kapoor, Saumitra; Teoh, Oon Hoe; Teng, Sung Shin; Lee, Khai Pin; Herremans, Dorien; Chen, Jer Ming
title: Deep Neural Network-Based Respiratory Pathology Classification Using Cough Sounds
date: 2021-08-18
journal: Sensors (Basel)
DOI: 10.3390/s21165555
sha: 727e92c9a22eb0a75703b08ce118fd6c54c75ce4
doc_id: 1040030
cord_uid: qp7urgil

Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). To train a deep neural network model, we collected a new dataset of cough sounds, labelled with a clinician’s diagnosis. The chosen model is a bidirectional long–short-term memory network (BiLSTM) based on Mel-Frequency Cepstral Coefficients (MFCCs) features. The resulting trained model when trained for classifying two classes of coughs—healthy or pathology (in general or belonging to a specific respiratory pathology)—reaches accuracy exceeding 84% when classifying the cough to the label provided by the physicians’ diagnosis. To classify the subject’s respiratory pathology condition, results of multiple cough epochs per subject were combined. The resulting prediction accuracy exceeds 91% for all three respiratory pathologies. However, when the model is trained to classify and discriminate among four classes of coughs, overall accuracy dropped: one class of pathological coughs is often misclassified as the other. However, if one considers the healthy cough classified as healthy and pathological cough classified to have some kind of pathology, then the overall accuracy of the four-class model is above 84%. A longitudinal study of MFCC feature space when comparing pathological and recovered coughs collected from the same subjects revealed the fact that pathological coughs, irrespective of the underlying conditions, occupy the same feature space making it harder to differentiate only using MFCC features.

Cough is a prevalent clinical presentation in many childhood respiratory pathologies including asthma, upper and lower respiratory tract infection (URTI and LRTI), atopy, rhinosinusitis and post-infectious cough [1] [2] [3] . Because of its wide range of aetiologies, the cause of cough can be misdiagnosed and inappropriately treated [1] . Clinical differentiation for pathological respiratory conditions takes into consideration the history of the presenting respiratory symptoms as well as clinical signs such as pyrexia (i.e., raised body temperature), respiratory rate, shortness of breath and chest auscultation of pathognomonic breath sounds. In some cases, additional investigations such as chest radiographs, laboratory blood tests, bronchoscopy and spirometry are required to reach a definitive diagnosis. These investigations often require hospital visits and place demands on healthcare resources. Moreover, such visits may create a negative social economic impact on the ill child and on his/her family (such as time away from work and childcare arrangements). Furthermore, some of these investigations such as chest radiographs, and blood tests can result in more harm than benefit, if performed indiscriminately.

There is a growing interest in characterizing acoustic features to allow objective classification of cough sounds originating from different respiratory conditions. Previous studies have looked at medical screenings based on cough sounds [4] [5] [6] [7] [8] . Abaza et al. [4] analysed the characteristics of airflow and the sound of a healthy cough to train a classifier that distinguishes between healthy subjects and those with some kind of lung disease. Their model incorporates a reconstruction algorithm that uses principal component analysis. It obtained an accuracy of 94% and 97% to identify abnormal lung physiology in female and male subjects, respectively. Murata et al. [5] used time expanded wave forms combined with spectrograms to differentiate between productive (i.e., coughs producing phlegm) and non-productive coughs (i.e., dry coughs). Cough sound analysis has also been used to diagnose pneumonia [6] and Swarnkar et al. [7] used it to assess the severity of acute asthma. The latter reported that their model can predict between children suffering from breathing difficulties involving acute asthma and can characterize the severity of airway constriction. In [9] , tuberculosis (TB) screening was investigated using short-term spectral information extracted from cough sounds. They reported an accuracy of 78% when distinguishing between coughs of TB positive patients and healthy control group. Furthermore, it was noted that the TB screening accuracy increased to 82% when clinical measurements were included along with features extracted from cough audio. The cough sounds used in some of the aforementioned investigations were carefully recorded in studio environments, whereas the database used in this investigation is collected using a smartphone in a real hospital setting (see Section 2) . This type of ecological data collection (or unconstrained audio collection) is of more practical use to physicians, and may also help in developing a mobile phone app in the future that will be more robust when performing early diagnosis of respiratory tract infections in a real-life setting.

There are some studies that use a realistic cough sound database: A Gabor filterbank (GFB) [8] was used to classify coughs sounds as being 'dry' or 'productive'. They reported an accuracy of more than 80% when incorporating acoustic cough data collected through a public telephone hotline. Another study reported a similar accuracy in classifying wet and dry cough sounds, though the data were collected using a smartphone [10] . Recently, this strategy of collecting cough sounds has become popular [11] [12] [13] . Such audio-based strategy has profound implication when examining symptomatic cough sounds associated with COVID-19 whereby cough is a primary symptom, alongside fever and fatigue. Convolution Neural Network (CNN)-based systems were trained to detect cough and screen for COVID-19, and reported accuracy exceeding 90% in [14] [15] [16] and while another study had reported 75% accuracy [17] . Features were extracted (both handcrafted and transfer learned) from a crowd-sourced database containing breathing and cough sounds [18] and were used to train a support vector machine and ensemble classifiers to screen COVID-19 individuals from healthy controls. They reported an accuracy around 80%.

There is another line of research inquiry which mainly focuses on cough event detection (i.e., to identify the presence of cough events) in audio recordings [19] [20] [21] [22] [23] [24] ; however, in this investigation, we manually segment the cough epochs, and thus review of such studies is outside the scope of this report. Having said that, with the advent of deep learning, there is good progress made in the cough event detection from smartphone recordings, and incorporating such techniques at the preprocessing stage in the cough screening system could bypass the tedious manual segmentation process altogether [25] [26] [27] .

This study aims to determine if a predictive machine learning model, trained using acoustic features extracted from cough sounds, could be a useful classifier to differentiate common pathological cough sounds from healthy-voluntary coughs (i.e., cough sounds collected from healthy volunteers). The knowledge gained through such methods could support with the early recognition and triage of medical care, as well as assist physicians with the clinical management which includes making a differential screening and monitoring of the health status in response to medical interventions.

In the authors' earlier work, audio-based cough classification using machine learning has shown to be a potentially useful technique to assist in differentiating asthmatic cough sounds from healthy-voluntary cough sounds in children [28, 29] . The current paper builds upon this previous work (the earlier one used a simple Gaussian Mixture Model-Universal Background Model (GMM-UBM) [28, 29] ) and uses the collected cough sound dataset to train a deep neural network (DNN) model that can differentiate between pathological and healthy-voluntary subjects. The proposed deep neural network model is trained using acoustic features extracted from the cough sounds. Three different pathological conditions were considered in this investigation: asthma, upper respiratory tract infection (URTI) and lower respiratory tract infection (LRTI). The accuracy of the proposed trained model is evaluated by comparing their predictions against the clinician's diagnosis.

Subjects in this study were divided into 2 cohorts: Healthy cohort (without respiratory conditions) and the pathological cohort (with respiratory conditions which included LRTI, URTI and asthma; LRTI included a spectrum of respiratory diseases such as bronchiolitis, bronchitis, bronchopneumonia, pneumonia, lower respiratory tract infection). Participants were recruited from KK Children's Hospital, Singapore. Inclusion criteria in the pathological cohort was the presence of concomitant symptom of cough, while inclusion criteria for the healthy cohort was the absence of active cough and active respiratory conditions. Pathological cohorts were recruited from the Children's Emergency Department, Respiratory Ward, and Respiratory Clinic. The cough sounds were recorded during their initial presentation at the hospital. The healthy cohorts were recruited from the Children Surgical Unit. These healthy children were first screened by the anaesthetic team and recruited for the study.

A smartphone was used to record cough sounds from both pathological and healthy children (i.e., without respiratory conditions). For both groups, the subjects were instructed to cough actively. This often resulted in multiple cough epochs per participant (on average 10 to 12). Recordings were collected at a sampling rate of 44.1 kHz in an unconstrained clinic setting, i.e., a hospital ambience with background noise such as talking in background, beeping sounds from monitoring devices, alarm sounds, ambulance siren, etc. The collected cough audio files were manually segmented into individual coughs (such that non-cough signal portions are negligible) to form different entries in the dataset. Characteristics of the resulting dataset are shown in Table 1 . The working diagnosis for the aetiology of the cough was determined by the clinician based on the clinical history, physical examination, and for some cases investigations such as laboratory tests and chest X-rays were also used. 

Using the dataset described above, five different classification models based on deep neural networks were built.

The first model (Healthy vs. Pathology (2-class) Model) was trained to classify whether each cough segmented is a healthy-voluntary cough or pathological. Here, we consider all pathological coughs as one class, known as 'pathological cough'.

The second set of models (three in total) were trained to classify between healthyvoluntary coughs and a particular respiratory pathology (i.e., one respiratory pathology at a time). Healthy vs. LRTI Model-was trained to predict whether a cough is healthy-voluntary or from a subject diagnosed with LRTI; Healthy vs. URTI Model-was trained to predict whether a cough is healthy-voluntary or from a subject diagnosed with URTI; and finally Healthy vs. Asthma Model-was trained to predict whether a cough is healthy or from a subject diagnosed with Asthma.

The final classification model was trained to predict all the four chosen classes. Thus, Healthy vs. Pathology (4-class) Model-classifies whether a cough is healthy-voluntary or associated with any of the three pathological conditions of LRTI, URTI, or asthma.

An LSTM-based network was chosen as the classification model in this investigation. LSTM networks take sequence data as the input, and makes predictions based on their sequence dynamic characteristics by learning long-term dependencies between time steps of sequence data. They are known to work well for their ability to handle sequence data due to their memory mechanism [30] . Our choice for LSTM is motivated by the sequential nature of audio data and its ability to handle input audio features that vary in length [30, 31] , as is the case with the features extracted from the collected cough sounds (see Section 5.3).

In this investigation, we used a four-layer neural network with two deep layers of bidirectional LSTMs (BiLSTMs) (see Figure 1 ). Each BiLSTM layer learns bidirectional long-term dependencies from sequence data. These dependencies will help the network to understand the long-term dynamics present in the features and thus learning the complete time series [32, 33] . We have investigated different deep neural network types such as fully connected deep neural networks, LSTMs, BiLSTMs, to identify the best classification model for our cough screening problem. In the end, BiLSTMs were chosen, as they were found to produce better results for the chosen feature sets (These network comparison results are not shown as they are outside the scope of this paper; a similar outcome preferring BiLSTM was reported in [33] ). 

The first layer (input layer) has a dimension of 42 to match the size of the MFCC feature vectors corresponding to every audio frame (see Section 5.3). The second layer is a BiLSTM layer with 50 hidden units. This is followed by a dropout layer which in turn is followed by another BiLSTM and a dropout layer. The second BiLSTM layer also has 50 hidden units. A 30% dropout was chosen for both dropout layers. Finally, depending on the classification objective, we used either two fully connected layers (for the 2-class classification problem) or four fully connected layers (for the 4-class classification problem). The networks were optimized to minimize cross-entropy loss with sigmoid activation. This particular architecture was selected after multiple hyper-parameter optimization steps. We used grid search to find the optimal number of hidden units, the number of hidden layers, as well as the dropout rate. The resulting combination reported in this paper was able to reach the lowest training loss (or in other words maximum training accuracy; precluding overfitting of the classifier) when trained for multiple cough classification hypotheses.

The collected dataset was randomly split (70-30%) into two non-overlapping parts: training and test set. The resulting split sizes are shown in Table 2 . We made sure that cough sounds belonging to the same person were either in the test or in the training set, but not in both. Since the test data have not yet been seen by the model during the training phase, one could expect that the resulting performance of this model offers a good approximation for what can be expected in a real scenario (i.e., when the model is asked to make a prediction for an unseen cough). 

The general experimental methodology followed in this investigation is shown in Figure 2 . We first trained our deep neural network models using features extracted from data from our training set, and then proceeded to evaluate the models using a separate test set. The trained model is used to predict which class a cough sound belongs to. This cough prediction was subsequently used to screen whether a subject is healthy or having some respiratory conditions. This screening is done based on the most frequent (mode) prediction outcome of all the cough sounds belonging to a particular subject. In what follows, we discuss how the data have been pre-processed, which audio features were chosen for this investigation, and how the model was built. 

The segmented cough sounds were detrended to remove any linear trends, baseline shifts, or slow drifts, then normalized (to have a maximum sample value of one), and finally downsampled (downsampled to 11.025 kHz from the original sampling rate of 44.1 kHz).

The pre-processed audio signals were first segmented into frames of 100 ms, after which a Hamming window was applied, followed by the extraction of audio features. Mel-Frequency Cepstral Coefficients (MFCCs) were chosen for this investigation owing to their effectiveness when it comes to audio classification problems [34, 35] . MFCCs are a set of features that focus on the perceptually relevant aspects of the audio spectrum, additionally the coefficients could contain information about the vocal tract characteristics [36, 37] . In this investigation we used 14 MFCCs with their deltas and delta-deltas, thus resulting in a total of 42 coefficients (14 MFCCs, 14 deltas and 14 delta-deltas) for every audio frame. The result obtained using MFCCs thus serves as a baseline against which future investigations can be compared.

The performance of DNN models is measured by calculating the classification accuracy and is further analysed using the receiver operating characteristic (ROC) [38] and confusion matrix [39] .

The classification accuracy is calculated by comparing the predicted outputs with the actual outputs.

Number of correct predictions Total number of predictions

The ROC is created by plotting the true positive rates (i.e., sensitivity (or recall): the ratio of true positives over the sum of true positives and false negatives) against the false positive rates (i.e., (100-specificity); specificity is the ratio of true negatives over the sum of false positives and true negatives) for various decision thresholds. A perfect model results in a ROC curve which passes close to the upper left corner, indicating a higher overall accuracy. This would thus result in a ROC of which the area underneath (AROC) equals 1.

The performance of a classifier was further analysed using confusion matrices, whereby the true and false positives and negatives are displayed for each class. For a good classifier, the resulting confusion matrix will have large numbers along the diagonal (i.e., values closer to 100%). The percentage of misclassified data is reflected in the off-diagonal elements.

From the original cough sounds, the power spectrum (i.e., the distribution of energy contained within the signal over various frequencies) was estimated. These frequencies were then grouped into five equal bins between 0 to f s /2 (whereby f s is the sampling frequency) and the corresponding spectral power present in each of these bins was calculated.

The distribution of the power spectrum for 500 randomly chosen cough samples (of different respiratory conditions) is shown using a boxplot (see Figure 3) . The median is shown using a red line. The bottom and top edges of each of the boxes indicate the 25th and 75th percentile, respectively. The likely range of variation (i.e., inter-quartile range (IQR)) is given by distances between the tops and bottoms [40] . The median line corresponding to every bin (for both the healthy and pathological coughs) does not appear to be centred inside the box (i.e., the possible mean of each bin), thus indicating that the power distribution is slightly skewed for each bin. IQR is found to be slightly larger in spectral power bins of pathological cough when compared to the healthy spectral bin. Overall, there are no clear trends between the median value of the spectral bin for healthy and pathological coughs. The asthmatic spectral bins tend to have a slightly higher median value compared to the spectral bins of healthy coughs. The opposite trend was found when comparing spectral bins of LRTI and URTI against that of healthy. We speculate that this may be due to the fact that both these conditions (LRTI and URTI) include inflamed airway tissues, which may increase acoustic damping (especially at high frequency). This postulate requires further investigation. In addition, the difference observed maybe attributed to variability in subject characteristics between the groups such as age, gender between groups (see Table 1 ).

The objective of this feature analysis is to understand if cough sounds contain any subtle cues to distinguish between healthy and pathological subjects. The higher-dimensional MFCC features extracted from various respiratory pathological coughs were compared against the healthy coughs after transforming them to a lower dimension using Principal Component Analysis (PCA) [41] . Such dimensionality reduction techniques often give some insight into the feature space of the chosen classes. The resulting visualisation of the first three PCA components (the first three principal components correspond to the largest three eigen values and capture more than 95% of the variance (information) in this dataset) is shown in Figure 4 . MFCCs extracted from 5000 audio frames from each of the categories were used for this visualisation. All these audio frames were part of the training set used for training the BiLSTM network.

No clear clusters are visible in the feature space (see Figure 4 ). This is true for all the four investigated cases: features of healthy versus pathological cough sound signal, and features of healthy coughs when compared to features from each individual respiratory pathologies (see Figure 4a -d). This reflects anecdotal observations that clinicians themselves find it hard to distinguish these pathologies based on cough sound alone. 

The objective of this longitudinal study is to understand the evolution of the feature space of MFCCs over time for the different classes of respiratory conditions. For this study, the cough sounds were collected and organised in a two-stage process. In the first stage, 51 subjects recruited from the hospital were asked to make multiple voluntary cough sounds (on average 10 to 12 coughs). There were 24 subjects with asthma, seven with URTI and 20 with LRTI. In the second stage, these 51 subjects were followed up upon recovery after hospital discharge (approximately two weeks after hospital discharge) and voluntary cough sounds (on average 10 to 12) were again collected. It is important to note here that Stage 1 coughs were a part of the cough dataset used for training the BiLSTM model; however, Stage 2 coughs were not used in any training process. The cough sounds were recorded as described in Section 2.2.

MFCCs were extracted from the coughs collected from these 51 subjects as described in Section 5.3. There was a total of 3810 frames analysed as part of this longitudinal study: 1675-recovered, 746-LRTI, 399-URTI and 990-Asthmatic. The extracted MFCCs' dimensionalities were then reduced using PCA for visualisation purposes (see Figure 5 ). Stage 1 coughs can be considered to be pathological whereas Stage 2 coughs (i.e., recovered) can be considered to represent healthy-voluntary coughs. The evolution of the MFCC feature space is explored here, since the coughs were collected from the same subject over a period of time. As in Figure 4 , no clear clusters are visible when analysing evolution of the extracted features (see Figure 5 ). Additionally, it can be seen that MFCC features extracted from Stage 1 coughs occupy relatively the same feature space irrespective of the underlying respiratory conditions (see Figure 5b) . With no clear clusters visible in the feature space analysis discussed in Sections 6.2 and 6.3, our classification problem may require the introduction of non-linearity, so as to uncover more complex, hidden, relationships. This thus presents an additional motivation for choosing a deep neural network.

The cough classification accuracy (i.e., accuracy in classifying each cough segment) and the healthy-pathology classification accuracy (i.e., accuracy in classifying entire cough epochs to a particular respiratory pathology) on our test set are shown in Table 3 . The BiLSTM has resulted in good performance when classifying the pathological cough sounds from healthy-voluntary cough sounds, with an accuracy of 84.5%. Furthermore, when respiratory pathology classification of subject was made (by considering the entire cough epochs) based on the most frequent (mode) prediction outcome of coughs from a subject for an entire cough epoch, the accuracy is even higher (91.2%). This is to be expected, e.g. if one assumes there are n coughs available per subject, even though model misclassifies individual cough sounds, the respiratory pathological classification result will be wrong only when (n/2) + 1 out of the n coughs belonging to a particular patient are misclassified (or in other words respiratory pathological classification is more robust). Given an accuracy rate of 84.5% for individual cough prediction, this would be very rare. A confusion matrix was created to further analyse the results of this model (see Figure 6 ). The percentage of healthy-voluntary coughs misclassified as pathological coughs is higher compared to pathological coughs misclassified as healthy-voluntary coughs (23.8% misclassified compared to 7.1%, see Figure 6a ). This higher healthy-voluntary cough misclassification rate further resulted in a relatively large number of healthy subjects misclassified as having a pathology (15.6 % subjects were misclassified, see Figure 6b ). The receiver operating characteristic of this model is shown in Figure 7 , along with the corresponding AROC value. The resulting AROC values are 0.84 for cough classification and 0.91 for respiratory pathology classification of subject, see Table 4 ). The AROC is convincingly high, which means that the model has delivered good separability between two classes. Additionally shown in Figure 7 , is the optimum threshold, co-located in the nearest point to (0, 1), which maximizes the sensitivity and specificity values (shown as a red cross). The resulting cough classification accuracy and the respiratory pathology classification of subject accuracy when considering one respiratory pathology at a time is shown in Table 5 . Again, the deep BiLSTM was able to produce good results when differentiating the healthy-voluntary coughs from those resulting from various respiratory conditions. This resulted in classification accuracy exceeding 85% for every investigated scenario. Respiratory pathology classification of subjects, as expected, result in even higher accuracy (exceeding 92% for every case). Confusion matrices were produced to further analyse the results from each of these models. Figures 8-10 show the confusion matrices for Healthy vs. LRTI Model, Healthy vs. URTI Model and Healthy vs. Asthma Model, respectively. The performance of Healthy vs. LRTI Model and Healthy vs. Asthma Model when it comes to correctly classifying healthy coughs from pathological coughs is comparable (see Figures 8a and 10a) . Healthy vs. URTI Model has a slightly larger number of misclassifications when predicting healthy coughs; however, its performance on pathological coughs detection (URTI in this case) is better compared to the other two models (see Figure 9 ). When it comes to respiratory pathology classification of subject based on the entire cough epochs, as expected, the classification models have resulted in higher correct classification rate compared to the individual cough classification model (see Figures 8b, 9b and 10b) .

Receiver operating characteristics were created for all three models, both for the case of cough and pathology classification. The ROCs are shown in and the resulting AROC is shown in Table 6 . The AROC values are convincingly higher for all the pathology screening results (exceeding 93%) compared to the individual cough classification models. They support the finding from Table 5 and the corresponding confusion matrices. 

The resulting performance of the proposed model when trained to classify different respiratory pathological coughs and healthy-voluntary coughs (i.e., 4-Class model) is shown in Table 7 . The subject respiratory pathology classification result for this fourclass classification, based on the most frequent (mode) prediction outcome for all cough epochs of a subject, is shown in Table 8 . The overall classification accuracy of both cough classification and each pathology classification is lower compared to the results shown in Tables 3 and 5 . The classification accuracy for the healthy-voluntary cough class and the subsequent respiratory pathology classification is relatively high (71.2% and 84.4%, respectively). However, the classification accuracy of pathological cough classes is relatively low. The Asthma class has the highest misclassification rate among the three investigated respiratory conditions. The confusion matrices are shown to further understand this classification result (see Figure 14a ). It is interesting to note in the respiratory pathology classification results (see Figure 14b ) that none of the subjects with LRTI and asthma are misclassified as healthy and only one subject with URTI is misclassified as healthy (4.2% out of 24 subjects with URTI tested will be one subject). However, seven healthy subjects were misclassified to have some kind of respiratory problems (of these seven, two were misclassified as having URTI, another two were misclassified as having LRTI and another three misclassified as having asthma). Among the three respiratory conditions, as mentioned earlier, asthma was the most misclassified pathology (15 subjects out of 24 with asthma were misclassified as having LRTI). Even though there is high misclassification rate among the three investigated respiratory conditions, in summary, this four-class classification model has a classification accuracy of 84.4% for correctly identifying healthy subjects and 95.8% accuracy for identifying subjects with respiratory issues , see Table 9 . We expect that such cough classification methodology should be eventually applied to support clinicians "in the field", if at least as a simple triage or as a preliminary screening tool. However, explicit discussions of a smartphone-deployed application (App) are premature for the scope of the current paper. However, if allowed to speculate, we see two possible pathways towards implementation: (1) port the whole algorithm into the smartphone and perform all the computational heavy lifting using the smartphone hardware to generate the prediction result; (2) the app simply collects audio data (via the onboard microphone) and communicates with a centralized server to perform the prediction and the results are returned to the user. Both pathways have their operational considerations, such as processing hardware available on the smartphone (the developer must consider the number of floating-point operations needed to make the prediction), availability and connection speed of the Internet (a consideration if remote deployment in rural communities is expected), among other issues. For the current setup running in a NVIDIA TITAN Xp Graphics Card, it takes almost three hours to train a particular deep neural network model and requires less than half a second to perform the prediction for a particular cough sample (timings include audio preprocessing and feature extraction steps). Given the fact that a clinical usage scenario only needs to be "quasi-real time" (a few seconds delay is usually tolerated-clinicians are accustomed to waiting longer for other screening tests), the second approach seems prudent for contexts with ready internet connection, so that the App would be lighter in terms of mobile phone hardware usage.

A classifier was developed based on a BiLSTM model trained using Mel-Frequency Cepstral Coefficient features that can differentiate cough sounds from healthy children with no active respiratory pathology to those with active pathological respiratory conditions such as asthma, URTI and LRTI. Four classifiers were trained as part of this investigation. The resulting trained model that classifies cough sounds into healthy/pathological in general or healthy/belonging to LRTI, URTI and asthma resulted in classification accuracy exceeding 84% when predicting a clinician's diagnosis. When a respiratory pathology classification of subject was performed using the mode of the prediction results across the multiple cough epochs from a particular subject, the resulting classification accuracy exceeded 91%. The classification accuracy of the model was compromised when trained to classify all the four classes of cough categories in one shot. However, most of the misclassification happened within the pathological classes where one class of pathological cough was often misclassified as having another pathology. If one ignores such misclassification and considers healthy cough to be that from a healthy subject and pathological cough to have come from subject with some kind of pathology, then the overall accuracy of the classifier is above 84%. This is a first step towards developing a highly efficient deep neural network model that can differentiate between different pathological cough sounds. Such a model could support physicians in creating a differential screening of respiratory conditions that present with cough, and will thus add value to health status monitoring and triaging in medical care, and potentially be deployed to support tele-medicine in remote and developing communities. Data Availability Statement: Please contact authors to access the data used in this study.

Recommendations for the assessment and management of cough in children

The difficult coughing child: Prolonged acute cough in children

Cough during infancy and subsequent childhood asthma

Classification of voluntary cough sound and airflow patterns for detecting abnormal pulmonary function

Discrimination of productive and non-productive cough by sound analysis

Cough sound analysis can rapidly diagnose childhood pneumonia

Stratifying asthma severity in children using cough sound analytic technology

Classification of human cough signals using spectro-temporal Gabor filterbank features

Detection of tuberculosis by automatic cough sound analysis

A comprehensive approach for cough type detection

Coswara-A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis. arXiv 2020

Novel coronavirus cough database: Nococoda

The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

A Real-time Robot-based Auxiliary System for Risk Evaluation of COVID-19 Infection

AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app

COVID-19 Artificial Intelligence Diagnosis using only Cough Recordings

Cough against COVID: Evidence of COVID-19 signature in cough sounds

Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data

Audio signals encoding for cough classification using convolutional neural networks: A comparative study

The automatic recognition and counting of cough

Cough detection using fuzzy classification

DeepCough: A deep convolutional neural network in a wearable cough detection system

Private audio-based cough sensing for in-home pulmonary assessment using mobile devices

Cough detection algorithm for monitoring patient recovery from pulmonary tuberculosis

Accurate and privacy preserving cough sensing using a low-cost microphone

Automatic recognition, segmentation, and sex assignment of nocturnal asthmatic coughs and cough epochs in smartphone audio recordings: Observational field study

Robust detection of audio-cough events using local hu moments

Development of Machine Learning for Asthmatic and Healthy Voluntary Cough Sounds: A Proof of Concept Study

Asthmatic versus healthy child classification based on cough and vocalised /a:/sounds

Long short-term memory

Understanding the difficulty of training deep feedforward neural networks

Bidirectional recurrent neural networks

Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Towards robust audio spoofing detection: A detailed comparison of traditional and learned features

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

Speaker identification by combining various vocal tract and vocal source features

Receiver operating characteristics curves and related decision measures: A tutorial

Machine learning and its applications to biology

Variations of box plots

Principal component analysis

We thank Ariv K. (from SUTD for helping with audio segmentation), Dianna Sri Dewi and Foo Chuan Ping (from KK Women's and Children's Hospital, Singapore for coordinating the recruitment of patients and research project administration).

The authors declare no conflict of interest.