key: cord-0927943-6tgjuxw1
authors: Zealouk, Ouissam; Satori, Hassan; Hamidi, Mohamed; Laaidi, Naouar; Salek, Amine; Satori, Khalid
title: Analysis of COVID-19 Resulting Cough using Formants and Automatic Speech Recognition System
date: 2021-06-15
journal: J Voice
DOI: 10.1016/j.jvoice.2021.05.015
sha: b25a7df0eee8ba9ee44c3ee67110d5b8689fe1b8
doc_id: 927943
cord_uid: 6tgjuxw1

As part of our contributions to researches on the ongoing COVID-19 pandemic worldwide, we have studied the cough changes to the infected people based on the Hidden Markov Model (HMM) speech recognition classification, formants frequency and pitch analysis. In this paper, An HMM-based cough recognition system was implemented with 5 HMM states, 8 Gaussian Mixture Distributions (GMMs) and 13 dimensions of the basic Mel-Frequency Cepstral Coefficients (MFCC) with 39 dimensions of the overall feature vector. A comparison between formants frequency and pitch extracted values is realized based on the cough of COVID-19 infected people and healthy ones to confirm our cough recognition system results. The experimental results present that the difference between the recognition rates of infected and non-infected people is 6.7%. Whereas, the formant analysis variation based on the cough of infected and non-infected people is clearly observed with F1, F3 and F4 and lower for F0 and F2.

The cough is a natural protective mechanism, it helps to clear the secretions from the respiratory tract and prevents entering of noxious particles into the respiratory system. It is generally defined as the sudden expulsion of air accompanied by typical sound. This sound is a characteristic that allows identification and distinguishes it from other vocal manifestations [1] . Effective measurement of cough is needed in order to assess the severity of a particular patient's cough and the effectiveness of treatment. This assessment of cough intensity so far has mainly relied on subjective measures, such as cough reflex sensitivity, and on the patient's symptom perception, which was assessed through visual analog scores for cough, various cough symptoms, and quality of life questionnaires [2] .

The authors in [3] have described their system uses audio signals sampled at 8 kHz. Data reduction is achieved by selecting 1-s segments that contain signals above an energy threshold. The selected segments of recording are then played back for the identification of cough sounds. Matos. S et al. [4] have proposed an automatic system based on Hidden Markov Model to detect cough sounds from ambulatory recordings. Their system achieved a success rate of approximately 82%. In [5, 6] several of these have described cough sounds according to their waveforms, finding that the signal envelope appears to differ between patients with different diseases.

On the other hand, the researchers in [7] have developed an automatic speech recognition system to evaluate the six different types of voice disordered also was calculated the four formants (F1, F2, F3, and F4). The aim of their work is to classify the type of pathology of the voice and to compare distortion in terms of formants. In another study, Automatic Speech Recognition (ASR) system was developed to transcribe speech signals from subjects with a speech disorder into equivalent text [8] . In other similar works [9, 10, 11] , the authors have evaluated the speech signal of smokers, where different parameters were measured as pitch, four formants frequency and jitter. Moreover, they have employed the ASR technology to develop a system which differentiates between smokers and non-smokers voice based on the Mel frequency spectral coefficients (MFCCs) to determine the voices' features. Dubuisson et al [12] have analyzed the normal and pathological voices by utilizing the correlation between different acoustic descriptors kinds that are temporal and cepstral. Temporal descriptors consist of energy, mean, standard deviation, and zero-crossing, whereas spectral descriptors contain delta, mean, different moments, spectral decrease, roll-off, etc. Their findings show that the correct classification of pathological voices was 94.7% and the correct classification rate of normal voices it was 89.5%. Costa et al [13] have discriminated speakers pathological voices influenced by edema of the vocal fold by using linear predictive coding (LPC)-based spectral analysis. Their findings present that the LPC-based cepstral technique is a good method to illustrate changes in the vocal tract by vocal fold edema. Recently, a new epidemic called COVID-19 appeared, and among the most common symptoms at the beginning of this epidemic disease were coughing, fever, etc. This led researchers to make great efforts to understand and combat the phenomenon from a medical and interdisciplinary point of view, as well as computer science and engineering in terms of "digital health" solutions aimed at maximizing the use of available and achievable means.

In this work, we develop an open-source ASR system able to compare acoustic features of cough sounds producing by healthy and COVID-19 infected people based on Mel-frequency cepstral coefficients and HMM classifier. Also, we carry out a formants frequency and pitch-based analysis for two already presented kinds of cough. The first step is the automated recognition of the resulting COVID-19 cough. The second step is the confirmation of our obtained results by using voice analysis methods.

Apart from the introduction in section 1, the paper is organized as follows. The overview of COVID-19 is presented in section 2. Section 3 gives a brief cough production. Section 4 introduces the techniques and methods employed in this study. The system architecture is described in section 5. Section 6 investigates the experimental results. We finished with a conclusion.

Coronaviruses (CoV) are a large family of viruses that cause illnesses ranging from the common cold to more serious diseases such as Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV). A new coronavirus (nCoV) corresponds to a new strain that has not previously been identified in humans. In late December 2019, an outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections occurred in Wuhan, Hubei Province, China and rapidly spread in China and outside [14] . On February 12, 2020, WHO officially named the disease caused by the novel coronavirus as Coronavirus Disease 2019 (COVID-19) also declared the epidemic of COVID-19 as a pandemic on March 12th 2020 [15] .

Since most COVID-19 patients have been diagnosed with pneumonia and characteristic CT scans, radiological examinations and laboratory analyzes have become vital in early diagnosis and disease course evaluation [16] . The common symptoms of pandemic COVID-19 are fever, dry cough and fatigue and breathing difficulties. In the most serious cases, the infection may cause pneumonia, severe acute respiratory syndrome, kidney failure, and even death [17] .

The cough is a defense reaction mechanism that allows clearing the breathing ways of irritants, particles and microbes by an air expulsion from the lungs via the epiglottis with fast speed. Cough is generated by three stages; Inhalation (breathing in), increased pressure in the throat and lungs with the vocal cords closed, and an explosive release of air when the vocal cords open, giving a cough its characteristic sound [18] . Cough is a very important feature of more than 100 diseases and other medical symptoms. As reflex-generated perturbation of the respiratory function, cough is an important symptom in many respiratory diseases or irritations.

Cough can be produced by several mechanisms like receptors where the activation of specific these receptors will generate action potentials that will be carried by the vague nerve, in particular, to the nucleus of the solitary tract (NTS) in the brainstem. This has connections to neurons in the respiratory and coordinating centers of cortical and subcortical coughs. Once the information has been integrated, the signal is transmitted by the efferent channels to all of the actors (muscles of the upper airways, accessory respiratory tract, phrenic muscle and abdominal muscles) allowing the cough motor effort via effector motor neurons [19] . The schematic description of the cough reflex with the location of the receptors, the afferent pathways, the nerve centres, the efferent pathways and the effectors is shown in Fig. 1 . Fig.1 . Anatomical description of cough pathways [18] 4.

In this study, in the first case, popular HMM statistical method in machine learning systems were used to classify COVID-19 and non-COVID-19 cough sounds. In the second case, the cough sound acoustic measurement pitch and formants are exploited to confirm our ASR obtained results.

Pitch describes the sound perceived fundamental frequency (F0) and is one of the main auditory attributes of sounds along with loudness and quality [20] . It is defined as the vocal cords vibration rate under the flux of the glottis air out. Usually, the pitch is ignored by ASR systems and is considered as irrelevant to the recognition tasks. Although much sounds information passed through pitch that is above the levels of lexical and phonetic. In addition to providing the necessary information on the nature of the vocal signal excitation source, the speech pitch contour can be exploited for the speaker's identification, emotion state recognition, voice activity detection tasks and many different applications. Different Pitch extraction methods were referred to in the literature [21] . The autocorrelation approach is one of the most widely utilized time-domain methods to estimate the speech signal pitch period [22] . This approach is based on the detection of the highest autocorrelation function value in the interest region. For a known discrete signal {s(q), i= 0, 1, …, Q S − 1}, generally, we define the autocorrelation function as:

Where Q is the analysed sequence length and M 0 is the autocorrelation point's number that we want to calculate. For estimation of pitch, if s(q) was supposed as a periodic sequence with period P, s(q)=s(q + P) for all q, the autocorrelation function is also periodic R S (m) = R S (m + P). Conversely, the periodicity in the signal is indicated by the periodicity in the autocorrelation function.

By changing the shape of the vocal tracts, several shapes of a perfect tube are generated, which in turn can be utilized to change the required vibration frequencies. Each of the vocal tract preferred resonant frequencies (corresponding to the relevant bump in the frequency response curve) is known as formant [23] . The cough feature sound results from the vibrations of the vocal cords, mucosal folds above and below the glottis, and the accumulated secretions. The variation in cough sounds is due to various factors that include the secretions nature and amount, anatomical differences and pathological changes in the larynx and another respiratory tract, and the strength of the cough. Cough vibrations also help dislodge secretions from the walls of the airways. There are divers formants, each at a separate frequency; formants occur at intervals of approximately 1000 Hz. At any point in time (as with spectra) there may be any number of formants, in the case of speech, most of the information relating to vowels is determined in the first four formants, called F1, F2, F3 and F4. These are generally called F1 that indicates the first formant, F2 presents the second formant, F3 indicating the third formant, etc. That is, by moving around the body of the tongue and the lips, the position of the formants can be changed [24] .

PRAAT [25] is an open source software widely used by phoneticians and researchers for determining various phonetic features of the speech. It is a flexible tool for analysis and reconstruction of acoustic speech signals. It performs Analyse of speech, synthesis, manipulation, labeling and segmentation, graphics and has much other functionality [26] . Praat was used to record and analyse the wav files to obtain all the parameters presented in this work.

Speech processing is defined as a study of the speech signals and their treatment methods [27] . The signals are generally treated in a digital representation, so speech processing can be assumed as a special event of digital signal treatment, applied to a speech signal. On the other hand, Automatic Speech Recognition (ASR) is considered as one of the thrust research fields in speech processing. ASR is the procedure through which a sound is converted into a word sequence through an algorithm performed as a computer program. The main role of an ASR system is the hypothesize of the most probable discrete sequence of symbols out of all valid sequences in a target language, from the given input acoustic speech vector [28] . In the automatic speech recognition approach, the most common productive learning method is based on hidden Markov models combined with the Gaussian-Mixture Model (GMM). This combination is exploited by the conventional ASR systems for the representation of the speech signals sequential structure. Typically, the Gaussian mixture is utilized by each HMM state for the modulization of sound wave spectral representation. The GMM-HMM model is parameterized by λ (A, B, μ) μ is the state prior probability vector; A=(aij) is a transition probability matrix; B= {(b1, …, bn}and is an ensemble where bj represents the state GMM j. The state is generally related to a phone sub-segment in speech [29] [30] . 

In this study, An HMM-based ASR was implemented to evaluate the difference of cough between healthy and COVID-19 infected people. The system was divided into three phases depending on their performances. The first one is the training phase, whose function is to create knowledge about the cough and their type to be used in the system. The second one is the HMM model bank, which organizes the system knowledge produced by the first step. Finally, there is the recognition phase whose function is to figure the feature matched with the trained model of each and every class. The parameters of the system were 25 millisecond Hamming window duration with a step size of 10 milliseconds, MFCC coefficients with 22 sets as the length of cepstral filtering and 26 filter bank channels, 13 as the number of MFCC coefficients, and 0.97 as the pre-emphasis coefficients.

In addition, the first four formants (F1, F2, F3 and F4) were extracted to make analysis and compare the formants frequencies for both groups (Normal and patients by COVID-19) . These formants were manually measured using spectrograms, automatic forming track detection and spectra with analysis parameters set to: maximum number of formants, 5; maximum formant frequencies 6000 Hz; window of analysis 0.025 Values were taken in a central and stable part of cough, also Pitch or (F0) was extracted using the command 'Get Pitch', with analysis parameters set to: pitch floor 75 Hz, pitch ceiling 500 Hz. The cough sound is divided into three phases (See Fig. 3 ), the first one is an explosive expiration due to the glottis suddenly opening, the second is the intermediate phase with the attenuation of cough sounds, and the third is the voiced phase due to the closing of the vocal cord [31] . We based especially on phase three for measuring these parameters. 

Our study includes the compilation of a data corpus of coughing sounds recorded in a controlled environment. During recording the cough, we used a microphone and a laptop with 4GB of RAM and an Intel Core i5 CPU of 1.2GHz speed. Besides the operating system used in our experience is Ubuntu 14.04 LTS. The microphone was placed at a distance of 20 cm to the mouth of the subjects. The actual distance could vary from 10 cm to 30 cm due to the subject's movement. We kept the sampling rate at Fs = 16 k samples/s and 16-bit resolution to obtain the best sound quality. The database used in our system contains 10 people, divided into two categories, the first consists of 5 normal people and the second contains 5 people infected with the COVID-19 disease, for more detail see Table 1 . Concerning the recoding with healthy subjects, we have recorded the cough in our laboratory. In the case of infected people, we have recorded the sound data in the quarantine rooms. All subjects were without any respiratory disease according to personal history and basic examination. At this stage, we have faced several difficulties, most notably reaching people affected during the onset of symptoms, as well as recording the resulting natural cough from volunteer patients. The coughs were extracted from the recordings by detecting bursts of audio energy delimited by silence, and then manually validating and adjusting the start and end times of the detected region. This resulted in 10 segments of audio per person, with each segment corresponding to one complete cough. An audio recording for each cough was saved in ".wav" files. 

This step involved the generation of acoustic models using Sphinxbase and Sphinxtrain. We have exploited lexicon, language model, filler dictionary, phone list, transcription, fileids files and wave audio data. The a generated model includes the information needed to extract the probabilities of recordings Fig.4 summarizes the cough acoustic model preparation step.

The lexicon allows the given correspondence between the transcribed word file and the phonemes exploited in the file extension phone. The dictionary extends pronunciations for each word that is presented in the Language Model and it contains the words we want to train followed by their pronunciation, it separates words into subword sequences units. Our dictionary includes the proposed symbolic representations of the cough sound. The pronunciation dictionary plays the role of an intermediary among the language and acoustic models.

The language model determines the used word in a speech application where each word must be mentioned in the lexicon file. A language model indicates a limitation set on the sequence of the words accepted in a given language [32] . These limitations can be exploited for example by the grammar rules via statistics on each wordestimated on a training speech data. The Language model in binary is used to change the language model file to the N-gram language form.

The training and testing transcription files include coughing sounds that are organized in sequence with capital letters, punctuation and tagging symbols for initialization and ending sentences and followed by the cough corpus file-name.fileids files contain the path of sound files where for each recording file we generate a line with the file name and the path in the control file as it is presented in Fig. 5 

In the filler file, we list the silent events as "words". In our work, this file includes the entries shown in Fig. 6 that are explained as follow:

<s> : the silence of utterance used beginning </s> : utterance silence end <sil>: the silence of the interior statement. 

The formants of cough will be analyzed to determine their values. These are expected to confirm helpful in subsequent speech processing tasks such as COVID-19 cough recognition and classification.

In this section, we describe our cough recognition system that allows us to illustrate the difference between the cough of healthy people and the cough of infected ones. Our designed system is based on the Mel frequency spectral coefficients in the extraction phase that are modeled by Gaussian-Mixture Models. In addition, the hidden Markov model is used as classifier for the cough classification. The conducted experiments were based on the cough sounds in the training and testing phases where each cough sound was modeled by five HMM states. The state transition was left-to-right and the Gaussian-Mixture Model was used in the modelization of observation probability density functions. The number of mixtures in this model is 8. All training and recognition experiments were implemented with the CMU SPHINX. The training was performed using a cough of patient people, while testing was performed using healthy cough and people who have COVID-19 diseases. For the first experiment, the system was trained by using the cough sound of healthy people (5 speakers) and tested by the coughing of healthy ones (3 speakers). In the second experiment, the system was trained by the same data of first experiment people (5 speakers) and tested with the cough sounds of COVID-19 infected people (3 patients). the recognition rate of each experiment was recorded. Fig. 7 illustrates the cough recognition rates of the two experiments. For the first one, the system accuracy is 93.33% and for the second, the recognition rate is 86.66%. The difference between the obtained recognition rates based on two experiments is 7%. Both experiments have shown that it is possible to observe the difference between healthy people and COVID-19 patients. The small observed difference can be caused by other factors like influence of coronavirus on the vocal cord or glottis, Also, the database size can play an important role in the obtained results. Fig.7 Cough recognition rates of non-infected and infected people.

The aim of this part of the experiments was to perform and evaluate the acoustic analysis of the values of pitch (F0) that were extracted as well as the measurements of the formant frequencies F1, F2, F3, and F4 for two types of cough. For the calculation of these frequencies (in Hz), we calculated the average of ten coughs for each person and concentrated on phase three which indicate the voiced phase, the duration of this phase was about 60 msec. Fig. 8 presents the extracted features of male coughing sounds based on the average of three patients and three healthy people where Fig. 9 shows the same features based based on female coughing data with two patients and two healthy people.

Based on the overall measurement results. In the case of males, the pitch (F0) average is lower by 21 Hz for healthy people compared to those of patients. While the opposite was for women where the value for healthy people is a little higher than patients by Approximate 6 Hz. Also, it can be seen from Figures 8 and 9 The median is generally considered to be the best representative of the data central location. The more skewed the distribution, the greater the variation among the median and mean, and the greater affirmation should be placed on utilizing the median as opposed to the mean. For the F0 values, the mean is higher than the median for females with a difference of 12 Hz for infected and 11 Hz for non-infected ones. Concerning males findings, the variation of mean and median is 3 Hz for healthy people and 0 Hz for patients. For F3 formant extracted values, the median is lower than the mean for females with a variance of 100 Hz for patients and 16 for healthy, whereas the opposite was observed with males by a difference of 72 Hz and 42 Hz for patients and healthy one, respectively. For F4 observed values, the differences between the mean and median are noticed with all experimental sets, for females the calculated difference is 37 Hz and 17 Hz for patients and healthy, respectively. For males, the variations are 102 Hz for patients and 46 Hz for healthy ones. In the F4 formant case, the mean is higher than the median for all sets except the healthy males set where is the opposite.

The coughing sound gives information about the pathophysiological mechanisms of coughing by several parameters as well as the structural nature of the tissue, The pitch of the vibration is determined mainly by the degree of the stretch of vocal cords, by their approximation one to another and by the mass of their edges. In the literature studies [33] [34] , thier obtained value's value of the pitch (F0) defined by different authors ranges from 300 to 700 Hz in normal condition whereas in cough sounds of bronchitis the bands between 500-1200 Hz are the most expressive. Based on our findings, we can say that for the healthy people we are in agreement with those of [33] but for infected ones, we have observed a difference between COVID-19 infected people's values and other diseases like bronchitis the bands, perhaps that the behavior of glottis behaves differently in COVID-19 similar to other pathological conditions. Moreover, the four formants values obtained by the females in both groups healthy and COVID-19 infected are lower than obtained by men with an except in F4 for patients. while in the vowels the four formants for females usually are higher than compared those of men as mentioned in [35] [36] , through our findings, we noticed a difference between healthy and COVID-19 cough sounds by physical sound features, especially in F3 and F4. These formants are able to change in relation to the vocal tract dimension cavity and their reduction would lead to increased frequencies. On the other hand, we cannot compare our cough COVID-19 diseases formant values with published results because currently, we could not find results for this type in the researches.

Generally, it was interesting to find out the changes in cough sound in pathological conditions caused by COVID-19. In this paper, we have presented HMM automatic speech recognition and formant based analysis of cough sounds by exploiting a spectrogram technique. Agreement between the recognition outputs and formants analysis can be inferred from these results. The overall accuracy of the cough recognition system was 93.33 % for healthy cough and 86.66% for COVID-19 cough sounds with a difference of 6.67%. In addition, the pitch and formant analysis show that in the case of females, F0, F1, F3 and F4 values are higher for the healthy by the difference of 6 Hz, 88 Hz, 44 Hz and 138 Hz respectively, in the opposite of F2 that is higher for the infected people. In the case of males, the lower values are observed with F0 and F2 for healthy people with a variation of 21 Hz and 20 Hz, respectively, whereas, the F1, F3 and F4 are lower for patients by a diversity of 100 Hz, 250 Hz and 279 Hz. Our obtained results present an agreement among the conclusions drawn from cough recognition results and formants analysis. To the best of our knowledge, this is the first work that tries to examine the accuracy of ASR and parameters sounds for coughs of people with COVID-19 voices.

Influence of simulated mucus on cough sounds in cats

Assessment and measurement of cough: the value of new tools

Methods of recording and analysing cough sounds

Detection of cough signals in continuous audio recordings using hidden Markov models

The origin of cough sounds

Information obtained from tussigrams and the possibilities of their application in medical practice

Formant analysis in dysphonic patients and automatic Arabic digit speech recognition

PEAKS-A system for the automatic evaluation of voice and speech disorders

Voice comparison between smokers and non-smokers using HMM speech recognition system

Vocal parameters analysis of smoker using Amazigh language

Towards the Objective Speech Assessment of Smoking Status based on Voice Features: A Review of the Literature

On the use of the correlation between acoustic descriptors for the normal/pathological voices discrimination

Parametric cepstral analysis for pathological voice assessment

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and corona virus disease-2019 (COVID-19): the epidemic and the challenges

Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial

Coronavirus disease 2019 (COVID-19): a perspective from China. Radiology

Clinical characteristics of COVID-19 patients with digestive symptoms in Hubei, China: a descriptive, cross-sectional, multicenter study. The American journal of gastroenterology

Central nervous mechanisms of cough

Prevalence, pathogenesis, and causes of chronic cough

Pitch extraction and fundamental frequency: History and current techniques

Direct time domain fundamental frequency estimation of speech in noisy conditions

On the use of autocorrelation analysis for pitch detection

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

Comparative analysis of Arabic vowels using formants and an automatic speech recognition system

Version 6.1.03 64-bits

Predicting voice disorder status from smoothed measures of cepstral peak prominence using Praat and Analysis of Dysphonia in Speech and Voice (ADSV)

Speech coding effect on Amazigh alphabet speech recognition performance

Spoken language processing: A guide to theory, algorithm, and system development

A review on automatic speech recognition architecture and approaches

Amazigh digits through interactive speech recognition system in noisy environment

Theory and application of audio-based assessment of cough

Amazigh Digits Speech Recognition System Under Noise Car Environment

Analysis of the cough sound: an overview

Clinical methods for the study of cough

Corner vowels in males and females ages 4 to 20 years: Fundamental and F1-F4 formant frequencies

Study of the characteristic parameters of the normal voices of Argentinian speakers