key: cord-1008457-gu7fdegr
authors: Islam, Rumana; Abdel-Raheem, Esam; Tarique, Mohammed
title: A study of using cough sounds and deep neural networks for the early detection of Covid-19
date: 2022-01-06
journal: Biomed Eng Adv
DOI: 10.1016/j.bea.2022.100025
sha: 8fc3c4af209a07c38217c3bcaab0b9b93de05c53
doc_id: 1008457
cord_uid: gu7fdegr

The current clinical diagnosis of COVID-19 requires person-to-person contact, needs variable time to produce results, and is expensive. It is even inaccessible to the general population in some developing countries due to insufficient healthcare facilities. Hence, a low-cost, quick, and easily accessible solution for COVID-19 diagnosis is vital. This paper presents a study that involves developing an algorithm for automated and noninvasive diagnosis of COVID-19 using cough sound samples and a deep neural network. The cough sounds provide essential information about the behavior of glottis under different respiratory pathological conditions. Hence, the characteristics of cough sounds can identify respiratory diseases like COVID-19. The proposed algorithm consists of three main steps (a) extraction of acoustic features from the cough sound samples, (b) formation of a feature vector, and (c) classification of the cough sound samples using a deep neural network. The output from the proposed system provides a COVID-19 likelihood diagnosis. In this work, we consider three acoustic feature vectors, namely (a) time-domain, (b) frequency-domain, and (c) mixed-domain (i.e., a combination of features in both time-domain and frequency-domain). The performance of the proposed algorithm is evaluated using cough sound samples collected from healthy and COVID-19 patients. The results show that the proposed algorithm automatically detects COVID-19 cough sound samples with an overall accuracy of 89.2%, 97.5%, and 93.8% using time-domain, frequency-domain, and mixed-domain feature vectors, respectively. The proposed algorithm, coupled with its high accuracy, demonstrates that it can be used for quick identification or early screening of COVID-19. We also compare our results with that of some state-of-the-art works.

According to the global database maintained by John Hopkins University, more than 270 million COVID-19 (and its variants) cases and 5.3 million deaths have been reported till December 13, 2021 [1] . Social distancing, wearing masks, widespread testing, contact tracing, and massive vaccination are all recommended by the World Health Organization (WHO) to reduce the spreading of this virus [2] . To date, reverse transcription-polymerase chain reaction (RT-PCR) is considered the gold standard for testing coronavirus [3] . However, the RT-PCR test requires person-to-person contact to administer, needs variable time to produce results, and is still unaffordable to most global populations. Sometimes, it is unpleasant to the children. Not least, this test is not yet accessible to the people living in remote areas, where medical facilities are scarce [4] . Alarmingly, the physicians suspect that the general people refuse the COVID-19 test in fear of stigma [5] .

Governments worldwide have initiated a free massive testing campaign to stop the spreading of this virus, and this campaign is costing them billions of dollars per day at the average rate of $23 per test [6] . Hence, easily accessible, quick, and affordable testing is essential to limit the spreading of the virus. The COVID-19 detection method, using human audio signals, can play an important role here.

Researchers and clinicians have suggested using the recordings of speech, breathing, and cough sounds to detect various diseases. The results published in the literature show that the speech samples can help clinicians to detect several diseases, including asthma [7] [8] [9] [10] , Alzheimer"s disease [11] [12] [13] , Parkinson"s disease [14] [15] [16] , depression [17] [18] [19] , schizophrenia [20] [21] [22] , autism [23] [24] , head or neck cancer [25] , and emotional expressiveness of breast cancer patients [26] . A comprehensive survey on these works can be found in [27] . Among these diseases, respiratory diseases like asthma have some similarities to COVID-19. An extensive investigation on the detection of asthma using audio signal processing can be found in [7] [8] [9] [10] . These works show that asthma causes swollen and inflamed vocal folds, which do not vibrate appropriately during voice generation. Hence, the voice samples of asthma patients differ from that of healthy (i.e., control) patients. For example, it is shown in [7] that asthmatic subjects show longer pauses between speech segments, produce fewer syllables per breath, and spend a more significant percentage of time in voiceless ventilator activity than their healthy counterpart.

Recently, researchers have been suggesting using cough sounds for the early detection of the COVID-19. However, there are still some challenges as the cough is also a symptom of 30 other diseases [28] [29] . Hence, it is very challenging to discriminate the cough sound of the COVID-19 patients from that of other patients. In [28] , the authors considered three diseases: bronchitis, pertussis, and COVID-19. Investigation of 247 normal cough samples and 296 pathological samples was performed. The authors used a convolutional neural network (CNN) to implement a binary classifier and a multiclass classifier. The binary classifier discriminates pathological cough sounds from normal cough sounds, and the multiclass classifier categorizes the pathologies into one of the three pathology types. In a similar work [30] , the authors considered bronchitis, bronchiolitis, and pertussis. They used a CNN to discriminate against these pathologies.

Various human audio samples, namely, sustained vowel "/a/", counting (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) , breathing, and cough samples, have been used in [31] . The authors considered nine acoustic voice features: spectral contrast, mel-frequency Cepstral Coefficients (MFCCs), spectral roll-off, spectral centroid, mean square energy, polynomial fit, zero-crossing rate, spectral bandwidth, and spectral flatness. They used a random forest (RF) algorithm to discriminate the COVID-19 samples from the control/healthy samples, and they have achieved an accuracy of 66.74%.

In [32] , the authors used large samples (5320 samples) selected from the MIT open voice COVID-19 cough dataset [33] . They extracted the MFCC features from the cough samples and classified them by using a CNN. The network consists of one Poisson biomarker layer and three pre-trained ResNet50s. The results showed that their proposed system achieved an accuracy of 97%.

Cough and breathing sounds have also been used in [34] . In that work, the authors used eleven acoustic features: RMS energy, spectral centroid, roll-off frequencies, zero-crossing rate, MFCC, Δ-MFCC, ∆^2-MFCC, tempo, duration onsets, and period. In addition, they used VGGish (a pre-trained CNN from Google) to classify the samples into COVIDpositive/non-COVID, COVID-positive with cough/non-COVID with cough, and COVID-positive with cough/non-COVID asthma with cough. The proposed system achieved 80%, 82%, and 80%, respectively, accuracy for the classification tasks mentioned above.

In [35] , the authors used Computational Paralinguistic Challenge (COMPARE) [38] features and extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) [37] to discriminate the COVID-19 samples from the healthy samples. These features were extracted by using the OpenSMILE [38] tool kit. The voice samples were collected by using five sentences uttered by the patients. The authors classified the COVID-19 patients into three categories, namely, high, mid, and low. In that study, they used 260 samples, including 52 COVID-19 samples. The authors have used a support vector machine (SVM) and achieved an accuracy of 69%.

Three acoustic feature sets have been used in [39] . The first one was the COMPARE acoustic features, which were collected by using the OpenSmile software. The second one was a combination of acoustic feature sets extracted by freely available software, PRAAT [40] and LIBROSA [41] . The third one was an acoustic feature set consisting of 1024 embedded features extracted by a deep CNN. The samples used in the investigation comprised of three vowels (i.e., "/a/", "/s/", and "/z/"), cough sounds, six symptomatic questions, and counting from 50 to 80. The authors have used the SVM with radial basis function (RBF) and RF as the classifiers. Experimental results showed an average accuracy of 80% in discriminating the COVID-19 positive patients from the COVID-19 negative patients based on the features extracted from the cough and vowel "/a/" recordings. They also achieved even more accuracy (83%) by evaluating six symptomatic questions.

In [42] , the authors used voice features, namely cepstral peak prominence (CPP), harmonic-to-noise ratio (HNR), first and second harmonic (H1H2), fundamental frequency and its variations (F0SD), Jitter, Shimmer, and maximum phonation time (MPT) to discriminate the voice samples of the COVID-19 patients from that of the healthy patients. The authors collected the sustained vowel sample "/a/" from 70 healthy and 64 COVID-19 patients of Persian speakers. They revealed significantly higher F0SD, Jitter, shimmer, H1H2, and voice break numbers in the COVID-19 patients than the control/healthy group. Vowels in "/ah/", snoring consonants in "/z/", cough sound, and counting samples from 50 to 80 have been used in [43] . The authors have used a recurrent neural network (RNN) based expert classifier in work. The authors have used three techniques: pre-training, bootstrapping, and regularization to avoid the over-fitting problem of RNN. They also used the leave-one-speaker-out validation technique to achieve a recall of 78%. In a similar work [44] , the authors used the RNN algorithm with long short-term memory (LSTM) to detect the COVID-19 patients. In that investigation, the authors used several features, including spectral centroid, spectral roll-off, zero-crossing-rate, MFCCs, and ΔMFCCs from the cough sound, breathing sound, and voice samples of the COVID-19 patients. The authors used 60 healthy and 20 COVID-19 patients in the sample work. To improve accuracy, they removed the silence part from the samples using the PRAAT software. As a result, the authors achieved an accuracy of 98.2%, 97.0%, and 77.2% by using breathing, cough, and voice samples, respectively.

In [45] , the authors have used the MFCC features of cough, breathing, and voice sounds to discriminate the COVID-19 patients from the non-COVID-19 patients. The authors concluded that the MFCCs of cough and breathing sounds for the COVID-19 patients and non-COVID-19 patients are similar. However, the MFCCs of voice sounds are very distinct between the COVID-19 and non-COVID-19 patients.

A cloud computing and artificial intelligence-based early detection of the COVID-19 patients have been presented in [46] . The authors used three-voice features, namely, HNR, Jitter, and Shimmer. In addition, they used the RBF algorithm as a classifier. The authors suggested that the HNR, Jitter, and Shimmer can be used to differentiate between healthy and asthma patients. They also indicated that the same parameters can be used to discriminate between the healthy and COVID-19 patients.

Recurrence quantification measures in the MFCCs have been introduced in [47] to detect the COVID-19 patients using sustained vowel "/ah/' and cough sounds. The authors have used several classifiers, namely, decision trees, SVM, knearest-neighbor, RF, and XGBoost. Among these classifiers, they achieved the best results with the XGBoost classifier. That model achieved accuracies of 97% (with an F1 Score of 91%) and 99% (with an F1 Score of 89%) for coughs and sustained vowels, respectively.

In [48] , the authors used crowdsourced cough audio samples that were acquired on a smartphone from around the world. They collected three acoustic features: MFCCs, mel-frequency spectrum, and spectrogram from the cough sounds. The authors used an innovative ensemble classifier model (consisting of three networks) to discriminate the COVID-19 patients from the healthy patients. The highest accuracy achieved was 77.1%.

This work is a preliminary investigation of Artificial Intelligence"s (AI"s) capability to detect COVID-19 by using acoustic features. The proposed algorithm has been developed based on the available data which is limited. Rigorous testing of the algorithm is required with more data before deploying the algorithm in practice for COVID-19 prescreening. The main contributions of this paper are as follows:

 To develop a novel algorithm based on signal processing and a deep neural network (DNN).  To compute the acoustic features and compare their uniqueness for the cough sound samples of control (i.e., healthy) and COVID-19 patients.  To form the feature vectors using three domains: time-domain, frequency-domain, and mixed-domain, to investigate the efficacy of these feature vectors.  To achieve a high classification accuracy (compared to other related works) while avoiding overwhelming computation burden on the system.  To use a dropout strategy in the proposed algorithms to make the training process faster and to overcome the overfitting problem.  To provide a detailed performance analysis of the proposed system in terms of the confusion matrix, Accuracy, Precision, Negative predictive value (NPV), and F1-Score.

The rest of the paper is organized as follows. The related background is presented in Section 2. The models, materials, and methods are explained in Section 3. Simulation results and discussions are presented in Section 4. Research Applicability is explained in Section 5, and the paper is concluded with Section 6.

The human voice generation system mainly consists of lungs, larynx, and articulators. Among them, the lungs are considered the power source of the voice generation system. Respiratory diseases prevent the lungs from working properly and hence affect the human voice generation system. Respiratory diseases can be classified into two main classes, namely, (a) obstructive and (b) restrictive [49] . Obstructive lung diseases make the pulmonary airways narrow and affect a patient"s ability to expel air from the lungs completely. Hence, a significant amount of air remains in the lungs all the time. On the other hand, people suffering from restrictive lung diseases cannot fully expand their lungs to fill them with air. Hence, the lungs fail to fully expand. Some patients may suffer from a combination of both obstructive and restrictive respiratory diseases. Cough is the common symptom of obstructive, restrictive, and combined lung diseases. Hence, cough sounds are considered useful for detecting lung diseases caused by respiratory issues [50] .

The COVID-19 is also considered a respiratory disease. Like other respiratory diseases, the COVID-19 can cause the lungs to fill with fluid and get inflamed. As a result, patients can suffer from breathing difficulty and need treatment at the hospital with severe onset. Untreated COVID-19 can progress and lead to acute respiratory distress syndrome (ARDS), a form of lung failure [51] . Although coughing is a common symptom of any respiratory illness, including the COVID-19, recent studies suggest that the COVID-19 cough is characterized by dry, persistent, and hoarse at the earliest stage of coronavirus infected patients. Hence, the cough sound samples of the COVID-19 patients differ from those of other patients suffering from some other respiratory diseases. Human cough samples contain three phases: explosive phase, intermediate phase, and voiced phase [52] , as shown in Fig. 1 . These phases represent the glottal airflow variation in the vocal cord, and they differ depending on the pathological conditions of the patients.

Two segmented cough sound samples are randomly selected from the Virufy database [53] to investigate the differences between the cough sound samples of a COVID-negative (i.e., healthy/control) subject and a COVID-positive patient. The cough samples of a healthy subject and a COVID-positive subject are shown in Fig. 2 . This figure demonstrates that the healthy sample is similar to the typical human cough signal presented in Fig. 1 . However, the cough sound sample of the COVID-19 patient varies significantly from the typical human cough sample. For example, both the intermediate and voiced phases are longer for the COVID-positive patient than for the healthy subject.

Moreover, the signal amplitude during the voiced phase is higher for the COVID-positive patient than for the healthy subject. The amplitudes in the explosive phase also differ between these two cough sound samples, as depicted in Fig. 2 . The differences mentioned above indicate that the cough sound can be used as a valuable tool to discriminate the COVID-positive patient from the healthy subject. The power spectral densities (PSD) of these two samples are plotted in Fig. 3 . It is observed in the figure that the healthy cough sound has prominent frequencies of continuous decreasing magnitudes. On the other hand, the COVID-positive cough sound samples do not contain very distinct frequencies. 

The proposed system model is presented in Fig. 4 . It consists of four major steps: pre-processing, feature extraction, formation of feature vectors, and classification. The main functions of the pre-processing stage are audio segmentation and windowing. Afterward, the frames are formed. In the next step, the features are extracted from the framed samples.

The extracted features are then grouped to form the feature vectors. Finally, the feature vectors are applied as the input to the classifier. The most crucial component of the proposed system is feature extraction (also called the data reduction procedure). It involves extracting features from the cough sound of interest. The main advantage of using features is that the analysis algorithm (i.e., classifier) needs to deal with small and transformed data compared to original voluminous cough sample data. In practice, acoustic features are extracted, and a feature vector is formed, representing the original data. However, the selection of features and the formation of the appropriate feature vector is an open issue for ongoing research in pattern recognition. In this investigation, 33 acoustic features are considered to form three feature vectors. The acoustic features used in this work can be broadly classified into two major classes: time-domain and frequency-domain features. In this investigation, the cough sound samples are divided into small frames using a rectangular window, and the features are extracted from these frames. These features are explained in the following subsections.

In this investigation, we consider the following time-domain features: (i) short-term energy, (ii) zero-crossing rate, and (iii) entropy of energy [54] . The short-term energy of the th frame is calculated by

where, ( ) is the th frame, with being the length of the frame. The energy expressed in (1) is normalized as The normalized energy content of the COVID-positive and healthy cough sounds is plotted in Fig. 5 (a). This figure shows that the energy contents of both samples are concentrated in a few frames, and they exhibit a high variation over successive frames. However, the energy content of the COVID-positive patient is much higher than that of the healthy subject. It indicates that the cough sample of the COVID-positive patient contains weak phonemes and a short period of silence between two coughs. Hence, the energy content also varies rapidly between two successive frames. The zero-crossing rate of a cough sound signal can be defined as the rate of sign changes of the movement over the frames. It is calculated by using the following equation

where, ( ) is the sign function defined by s , ( )-, when ( ) and s , ( )-, when ( ) . The zero-crossing rate of the COVID-positive patient and the healthy subject are plotted in Fig. 5 (b), which shows that the healthy cough sample has a more zero-crossing rate than that of the COVID-positive patient. Since the zero-crossing rate measures the noisiness of a signal, it exhibits a higher zero-crossing rate for the unvoiced part of the cough sound sample and a lower zerocrossing rate for the voiced samples. As shown in Fig. 2 , the voiced phase of the cough samples for the COVID-positive patient is longer than that of the healthy subject. Hence, the zero-crossing rate is lower for the COVID-positive patient than for the healthy subject, as depicted in Fig. 5(b) . The short-term entropy of energy can be interpreted as a measure of the abrupt changes in the energy level of an audio signal. To compute it, we first divide each short-term frame into sub-frames of fixed duration. Then, for each sub-frame, , the energy is calculated by using (1) and divide it by the total energy, of the short-term frame. Then, the sub-frame energy values, , for j =1,2, …, K, is computed as a sequence of probabilities and is defined as ,

where, ∑ . At the final step, the entropy, ( ), is calculated from the sequence by

The short-term entropy of energy for the COVID-positive patient and the healthy subject are plotted in Fig. 5(c) . The short-term entropy of energy for the COVID-positive patient is greater than that of the healthy subject for most of the frames. Since the energy content of the COVID-positive patient varies more abruptly than that of the healthy subject, the energy entropy tends to be higher for the COVID-positive patient after frame 20, as shown in Fig. 5(c) .

The frequency-domain acoustic features are extracted from the discrete Fourier transform (DFT) of a signal. The DFT of a frame of audio signal can be expressed as

where, is the size of the DFT, ( ) is the value of the DFT coefficients, and . The spectral centroid dictates a noise-robust estimate of the dominant frequency for the cough sound signal that varies over time. It is also called the center of gravity of the spectrum. The value of the spectral centroid, , of the th audio frame is calculated by

The spectral centroids of the COVID-19 positive and the healthy person are shown in Fig. 6(a) . It is shown in the figure that the spectral centroids of the cough sound for the healthy person are higher compared to those of the COVID-19 cough samples until approximately frame number 50. The highest value corresponds to the brightest sound practically. Usually, the existence of noise, silence, etc. signifies the lower values of the spectral centroid. This is noticeable for COVID-positive patient as opposed to the healthy person for the range mentioned above. From nearly 50-80 frames, the COVID-positive patient exhibits higher values of the spectral centroid. After frame number 80, both the samples show insignificant spectral components. The spectral entropy is a measure of irregularities in the frequency domain. The spectral entropy features are computed from the short-time Fourier transform (STFT) spectrum. Spectral entropy is widely used to detect the voiced regions of an acoustic signal. The flat distribution of silence or noise induces high entropy values. The spectral entropy is computed with the same method that follows to calculate the cough signal's energy entropy. First, the spectrum of the short-term frame is divided into sub-bands. The energy, , of the th sub-band, where ( ), is normalized by the total spectral energy. The normalized energy is defined as ∑ , ( ). Finally, the entropy of the normalized spectral energy, , is computed by

The spectral entropies of the COVID-positive and the healthy person are shown in Fig. 6(b) . This figure shows that the spectral entropy of the healthy person is higher than that of the COVID-positive patient for most of the frames. The reason is that the voiced part of the signal contains less spectral entropy than the unvoiced one. The spectral flux measures the spectral change between two successive frames. The spectral flux is computed as the squared difference between the normalized magnitudes of the spectra for the two subsequent short-term windows. It is defined by

where, This indicates more rapid spectral alternation among phonemes in the healthy cough sample than in the COVID-positive patient.

The spectral roll-off is the frequency below which a certain percentage (usually around 90%) of the magnitude distribution of the spectrum is concentrated. Therefore, if the th DFT coefficient corresponds to the spectral roll-off of the th frame, then it satisfies the following equation

where, is the adopted percentage (user parameter). The spectral roll-off frequency is usually normalized by dividing it with , so that it takes values between 0 and 1. The spectral roll-offs of the cough samples for the healthy person and the COVID-positive patient are shown in Fig. 7(a) . It can be easily observed that the cough samples of the healthy person show a higher spectral rolloff value than that of the COVID-positive patient for most of the frames. It means that the cough sample of the healthy person has a wider spectrum compared to that of the COVID-positive patient.

We also include the MFCCs to form the feature vector. The MFCCs have been widely used in respiratory disease detection algorithms for a long time [55] [56] [57] . The main advantage of the MFCCs over other acoustic features is that they can completely characterize the shape of the vocal tract configuration. Once the vocal tract is accurately characterized, we can estimate an accurate representation of the phonemes being produced by the vocal tract. The shape of the vocal tract manifests itself in the envelope of the short-time power spectrum, and the MFCCs accurately represent this envelope [58] . The following procedure is used to compute the MFCCs [59] . The voice sample, ,is first windowed with an analysis window ,and the STFT, ( ) is computed by 

where, , with being the DFT length. The magnitude of ( ) is then weighted by a series of filter frequency responses whose center frequencies and bandwidth are roughly matched with the auditory critical band filters called mel scale filters. The next step is to compute the energy using the STFT, weighted by each mel scale filter frequency response. The energy for each speech frame at time, and the th mel-scale filter is given by

where, ( ) is the frequency response of the th mel-scale filter, and and U l are the lower and upper-frequency indices, respectively, over which each filter is nonzero, while is defined as

The cepstrum, associated with ( ), is then computed for the speech frame at time, by , -

where is the number of filters. In this work, we consider 13 MFCCs. The plots for the arbitrarily chosen 7 th coefficient of the MFCCs for both the healthy cough samples and COVID-positive cough samples are shown in Fig. 7(b) . It is shown in the figure that the magnitude of the 7 th MFCC coefficient is higher for the COVID-positive cough sample compared to that of the healthy cough sound for most of the frames. The chroma vector used in this work is a 12-element representation of spectral energy. The chroma vector is computed by grouping the DFT coefficients of a short-term window into 12 bins. Each bin represents the 12 equal-tempered pitch classes of semitone spacing. Also, each bin produces the mean of the log-magnitudes of the respective DFT coefficients defined by

where, is a subset of the frequencies that correspond to the DFT coefficients and is the cardinality of . In the context of a short-term feature extraction procedure, the chroma vector is usually computed on a short frame basis. This results in a matrix , with elements , where indices and represent pitch-class and frame-number, respectively. The chroma vector plots of the healthy and the COVID-positive cough samples are shown in Fig. 7(c) . It is shown that the chroma vector of the healthy person shows one dominant coefficient, and the rest of the coefficients are of small magnitudes. On the other hand, the chroma vector of the COVID-positive cough sample is nosier and does not have any dominant coefficient. In addition, the chroma vector of the cough sample for the COVID-positive patient does not contain any nonzero coefficient.

The autocorrelation function for the th frame is computed by

Actually, ( ) is the correlation of the th frame with itself at time lag, . Then the autocorrelation function is normalized as

Afterward, the maximum value of , i.e., the harmonic ratio is calculated as

where , and are the minimum and maximum allowable values of the fundamental period. Here, is often defined by the user, whereas usually corresponds to the lag in time for which the first zero crossing of the ( ) occurs. The plots for the harmonic ratio of the healthy and the COVID-positive patients are shown in Fig. 7(d) . It is depicted in the figure that the harmonic ratio of the cough sample for the healthy person is higher for most of the frames. However, the harmonic ratio shows nonzero values for all analysis frames of the COVID-positive cough samples. On the other hand, the harmonic ratio of the healthy person has zero values for some of the analysis frames.

In this work, the cough samples collected from the Virufy database [53] are used. The Virufy is a volunteer-run organization, which has built a global database to identify the COVID-19 patients using AI. The database contains both clinical data and crowdsourced data. The clinical data is accurate because it was collected and reprocessed at a hospital following a standard operating procedure (SOP). Qualified physicians monitored the whole process of data collection. The patients were confirmed as healthy persons (i.e., COVID-19 negative) and COVID-19 patients (i.e., COVID-positive) by using the PCR test, and the data was labeled accordingly. The database also contains the patients" information, including age, gender, and medical history. Virufy provided 121 segmented cough samples from these 16 patients. The Virufy database contains both the original cough audio recordings and the segmented version of the cough sounds. The segmented coughs were created by identifying the periods of relative silence in the recordings and separating cough samples based on those silences. The segments with no coughing or having too much background noise were removed. The crowdsourced data, maintained by Virufy, is diverse and donated by patients from multiple countries. This database is significantly increasing in volume over time as more people are contributing their cough samples. In this work, only the clinically collected cough samples are used as they are more authentic than crowdsourced data and, also, we used the segmented cough samples.

A DNN discriminates the COVID-19 cough sound samples from the healthy cough sound samples, as shown in Fig. 4 . The DNN model presented in [60] is used and modified to implement our system. The DNN used in the network consists of three hidden layers. Each hidden layer consists of 20 nodes. The network has 500 input nodes for the matrix input. It has only one output node as the decision is binary. The output node employs the softmax activation function, whereas the hidden nodes consist of the sigmoid function. One of the limitations of the DNN is that they are vulnerable to overfitting. This problem worsens as the network includes more nodes. To solve the overfitting problem, we employ a dropout algorithm. This algorithm trains only some of the randomly selected nodes rather than all the entire network nodes. The dropout effectively prevents overfitting as it continuously alters the nodes and weights in the training process. In this work, a dropout ratio of 10% and 20% are used for the input and hidden layers, respectively.

For biomedical signals classification, findings are made in the context of medical prognosis [61] . Therefore, in COVID-19 cough sound sample detection, we need to provide a clinical or diagnostic interpretation of the rule-based classifications made with the acoustic features pattern. The following terminologies and performance parameters are used [55, 62] :

True positive (TP) is when the predicted test is positive for COVID while the subject is also COVID-positive.

Sensitivity or Recall is denoted by and it is defined by . (19) Specificity is denoted by and it is given by .

False-negative (FN) occurs when the test is negative for a subject who possesses the COVID. The probability of this error, known as the false-negative fraction (FNF), is given by .

False-positive (FP) is defined as the case when the predicted result is COVID-positive, but the individual is COVID-negative. The probability of this type of error or a false alarm, known as the false-positive fraction (FPF), is given by

Accuracy is simply a ratio of the correctly predicted observations to the total number of observations. The accuracy is defined by .

Precision or Positive Predictive Value (PPV) is the ratio of the correctly predicted positive observations to the total predicted positive observations. The precision is defined by

F1 Score is the weighted average of the Precision and Recall. Therefore, this score takes both false positives and false negatives into account. The F1 Score is defined by, .

NPV (Negative Predictive Value) represents the percentage of the cases labelled as truly negative. The NPV is defined by .

The samples are distributed into three parts: 70% are for training the DNN, the remaining 30% into validation, and testing with a ratio of 2:1. Five-fold validation is used. The data samples and patient information are listed in Table 1 . The proposed system's training, validation, and testing results with the three feature vectors are listed in Table 2 .

First, the time-domain feature vector is used that has three acoustic features, namely, zero-crossing rate, energy, and energy entropy. Then, the DNN (with five-fold cross-validation) is trained, and the system is tested with the time-domain feature vector. The results are shown in Table 2 , with an average training accuracy of 100%, validation accuracy of 93.27%, and testing accuracy of 89.20%. The confusion matrix of the time-domain feature vector is provided in Table 3 . Based on the data presented in Table  3 , it can be concluded that the DNN can correctly detect the COVID-positive cough sound samples with an accuracy of 86.67% by using the time-domain features. On the other hand, it can detect healthy cough samples with an accuracy of 91.67%.

Simulations are repeated by using the frequency-domain feature vector. As mentioned before, the features considered are spectral centroid, spectral entropy, spectral flux, spectral roll-offs, MFCCs, and chroma vector. The training, validation, and testing results are also listed in Table 2 . The data shows that the DNN achieves training accuracy of 100%, validation accuracy of 98.50%, and testing accuracy of 97.50% by using the frequency-domain feature vector. It can be concluded that the testing accuracy of the frequency-domain feature vector is higher than that of the time-domain feature vector. The confusion matrix of the frequency-domain feature vector is presented in Table 4 , which shows that the frequency-domain feature vector boosts the DNN"s ability to detect the COVID-positive cough sound samples with an accuracy of 95%. Moreover, the DNN can detect healthy samples with an accuracy of 100%. Both parameters are higher than those of the time-domain feature vector.

Lastly, time-domain and frequency-domain features are combined to form a mixed-feature vector. The training, validation, and testing accuracies for the mixed feature vector are listed in Table 2 . The achieved training, testing, and validation accuracies are 100%, 96.37%, and 93.80%, respectively. The confusion matrix of the mixed-feature vector is presented in Table 5 . The DNN can detect COVID-positive cough sound samples with an accuracy of 93.34%. On the other hand, it can detect the healthy cough sound samples with an accuracy of 94.17%. The performances of the proposed system in terms of Accuracy, Precision, F1 Score, and NPV for the time-domain feature vector, frequency-domain feature vector, and mixed domain feature vector are listed in Table 6 . This table shows that the proposed system achieves the highest accuracy of 97.5% using the frequency-domain feature vector. On the other hand, the lowest accuracy of 89.2% is achieved using the time-domain feature vector. The other performance scores, including precision, F1 Score, and NPV, are the highest in the frequency-domain feature vector.

Cough is regarded as a natural defense mechanism of some respiratory disorders, including COVID-19. The human audible hearing range impaired existing subjective clinical approaches of cough sound analysis [63] . Exploration of noninvasive diagnostic approaches well above the audible frequency range (i.e., 48000 Hz) used for sample data can overcome this limitation demonstrated in this study. The non-stationary characteristics of cough sound impose additional challenges for signal processingbased approaches. Also, cough patterns show variability in human subjects under the same pathological state. The cough features that are closely tied to the intensity levels as in the time domain can have dissimilarity for the identical pathology. The cough sound is characterized by the fundamental frequency and significant harmonics when pathology is involved. The restriction of airways causes turbulence in the cough sound that constitutes the harmonics [52] . More realistically, a method that captures both time and frequency changes over the cough samples should associate the investigated respiratory disorder, i.e., COVID-19, with greater accuracy. The best diagnostic performance of the frequency-domain feature vector in Table 6 justifies that the cough features distributed in the frequency domain should possess greater significance. .

Finally, the performance of the proposed system is compared with other related works available in the literature, as listed in Table 7 . The comparison table shows that the proposed system achieves a higher accuracy of 97.5% with the frequency-domain feature vector using the cough sound samples compared to [44] . The system achieves even higher accuracy with the time-domain and mixed-feature vector than the works published in [31, [34] [35] 64 ]. 

Since the publicly available databases are restricted to COVID-positive and COVID-negative (i.e., healthy/control) cases, this study focuses on discriminating COVID-19 cough sound from the healthy cough sound. However, the proposed algorithm can have a possibility to differentiate pathological cough sounds into distinct pulmonary/respiratory diseases, including COVID-19, asthma, bronchiectasis, etc. The pathophysiology and acoustic property of cough sounds can provide significant information in the frequency domain to characterize them for multi-classification purposes. Asthma causes the airways of the patient to be inflamed and narrower. On the other hand, bronchiectasis damages the airways and widens them abnormally. Few randomly selected cough sound samples of some respiratory disorders are investigated in [65] . The samples available in [66] are not sufficient to apply the proposed deep learning-based algorithm. One sample of asthma and bronchiectasis cough sound each is shown in Fig. 8 to demonstrate their uniqueness in the time domain. Bronchiectasis cough sound has longer cough sequences compared to asthma cough sound. Additionally, the bronchiectasis cough sound demonstrates more flow spikes than the asthmatic cough sound [52] . These flow spikes indicate more severe inflammation in bronchiectasis patients than in an asthmatic patient. When comparing Fig. 2 and Fig.  8 , it can be concluded that the explosive, intermediate, and voiced phases are very distinct in the COVID-19 cough sample; however, these phases are hardly visible in asthma and bronchiectasis cough sounds. As demonstrated in this study, some of the frequency-domain features of COVID-19, asthma, and bronchiectasis cough samples are plotted in Fig. 9 to show their uniqueness. The spectral entropy of the bronchiectasis sample is much higher for most of the frame compared to COVID-19 and asthma cough samples. The other features including spectral flux, MFCC, and feature harmonics are also non-identical for the mentioned three respiratory disorders. The distinct differences for the frequency domain features indicate that the proposed algorithm can also be applied to differentiate COVID-19 from asthma and bronchiectasis cough samples, provided a good number of datasets are available for each class. Fig. 9 The frequency domain features of (a) Spectral entropy, (b) Spectral flux, (c) MFCC coefficient (6 th ), and (d) Feature harmonics for COVID-19, asthma, and bronchiectasis cough samples.

In this paper, a DNN-based study for the early detection of COVID-19 patients has been presented using cough sound samples. The study proposed a system that extracts the acoustic features from the cough sound samples and forms three feature vectors. A rigorous, in-depth investigation has been provided in this work to show that the cough sound samples can be a valuable tool to discriminate the COVID-19 patient from other healthy cough samples for preliminary assessment instead of using the RT-PCR test. In this work, it has been shown that some acoustic features are unique in the cough sound samples of the COVID-19 patients and hence can be used by a classifier like DNN to discriminate them from the healthy cough sound samples successfully. However, there has always been an argument about selecting the appropriate acoustic features for the classifications. The major challenges are (a) to decide whether to use a single feature (like MFCC, spectrogram, etc.) or feature vector, (b) to select the appropriate combination of acoustic features to form the feature vector, and (c) to choose the appropriate domain (i.e., timedomain, frequency-domain, or both). Three feature vectors have been investigated in this work to address this issue. It was shown and justified that the frequency-domain feature vector has provided the highest accuracy compared to the time-domain or mixed-domain feature vector.

The performance of the proposed system has been compared with those of other existing state-of-the-art methods that are presented in the literature for the diagnosis of COVID-19. This accessible and noninvasive pre-diagnosis technique can enhance the screening of all COVID-positive cases, including asymptomatic and pre-symptomatic cases. Also, early diagnosis can help them to stay in touch with healthcare providers for a better prognosis to avoid the critical consequences of COVID-19.

For future work, more focus will be given to detecting the progression level of the COVID-19 patients by using the cough sound analysis. Furthermore, since some other respiratory diseases produce similar cough sounds, it is imperative to compare the cough features of the COVID-19 patients with those of the other respiratory diseases. Currently, we are actively seeking data to investigate the mentioned issues.

Worldometer Corona Virus Cases

COVID-19) technical guidance: Maintaining Essential Health Services and System

Diagnosing COVID-19: The Disease and Tools for Detection

Word Bank and WHO: Half of the world lacks access to essential health services, 100 million still pushed into extreme poverty due because of health expenses

More than the virus, fear of stigma is stopping people from getting tested: Doctors, The New Indian Express

Most Coronavirus Tests Cost About $100. Why Did One Cost $2,315? The New York Times

Speech segment durations produced healthy and asthmatic subject

Analysis of acoustic features for speech sound-based classification of asthmatic and healthy subjects

Speech Signal Analysis as an alternative to spirometry in asthma diagnosis: investing the linear and polynomial correlation coefficients

Assessment of chronic pulmonary disease patients using biomarkers from natural speech recorded by mobile devices

The dissolution of language in Alzheimer"s disease

Aphasia in senile dementia of the Alzheimer type

Voice Analysis for Detecting Parkinson"s Disease using Genetic Algorithm and KNN

Parametric quantitative acoustic analysis of conversation produced by speakers with dysarthria and healthy speakers

Variability in fundamental frequency during speech in prodromal and incipient Parkinson's disease: A longitudinal case study

(eds) Parkinson"s Disease and Movement Disorders. Current Clinical Practice

Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression

Acoustical properties of speech as indicators of depression and suicidal risk

Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology

Implications of normal brain development for the pathogenesis of schizophrenia

An automated method to analyze language use in patients with schizophrenia and their first-degree relatives

Clinical investigation of speech signal features among patients with schizophrenia

Brief Report: Epidemiology of autism

Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome

Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer

Emotional expression and emotional recognition in breast cancer survivor

A survey on signal processing based pathological voice detection systems

AI4COVID: AI-enabled preliminary diagnosis for COVID-19 from cough samples via an app

Causes and Risk Factors of Cough Health Conditions Linked to Acute, Sub-Acute, or Chronic Coughs

Can Machine Learning Be Used to Recognize and Diagnose Coughs?

Coswara-A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis

COVID-19 Artificial Intelligence Diagnosis using Only Cough Recording

Hi Sigma, do I have the Coronavirus?: Call for a new artificial intelligence approach to support healthcare professionals dealing with the COVID-19 pandemic

Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound

An early-stage on Intelligent Analysis of Speech under COVID19: Severity, Sleep Quality, Fatigue, and Anxiety

The INTERSPEECH 2014 Computational Paralinguistic Challenge: Cognitive and Physical Load

The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing

Artificial Intelligence enabled preliminary diagnosis for COVID-19 from voice cues and questionnaires

PRAAT: doing phonetics by computer

Librosa: audio and music processing in Python

Voice Quality Evaluation in Patients with COVID-19: An Acoustic Analysis

SARS-COV-2 Detection from Voice

COVID-19 Detection System Using Recurrent Neural Networks

Studying the Similarity of COVID-19 Sound based on Correlation Analysis of MFCC

Voice Analysis Framework for Asthma-COVID-19 Early Diagnosis and Prediction: AI-based Mobile Cloud Computing Application

Robust Detection of COVID-19 in Cough Sounds Using Recurrence Dynamics and Viable Markov Model

Virufy: Global Applicability of Crowdsourced and Clinical datasets for AI Detection of COVID-19 from Cough

The Difference between Obstructive and Restrictive Lung Diseases is

Spirometry in the evaluation of pulmonary function

COVID-19 Lung Damage

Analysis of the Cough Sound: an Overview

Introduction to Audio Analysis

Investigating the potential of MFCC features in classifying respiratory diseases

Automatic detection of patient with respiratory diseases using lung sound analysis

Classification of lung sounds using convolutional neural network

Production and Classification of Speech Sounds

Theory and Applications of Digital Speech Processing

MATLAB Deep Learning: With Machine Learning, Neural Networks and Artificial Intelligence

Biomedical Signal Analysis

Performance measures in evaluating machine learning-based bioinformatics predictors for classifications

High Frequency Analysis of Cough Sounds in Pediatric Patients with Respiratory Diseases

Detection of COVID-19 from voice, cough and breathing patterns: Dataset and preliminary results

The description of cough sounds by healthcare professionals

COVID-19-train-audio available at COVID-19-train-audio/not-covid19-coughs/PMID-16436200 at master · hernanmd/COVID-19-train-audio · GitHub

☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: