key: cord-0613798-qmm44iq8 authors: Bader, Mohamed; Shahin, Ismail; Hassan, Abdelfatah title: Studying the Similarity of COVID-19 Sounds based on Correlation Analysis of MFCC date: 2020-10-17 journal: nan DOI: nan sha: 1037cd77612fd80bd834cf87c5944b6799ddd999 doc_id: 613798 cord_uid: qmm44iq8 Recently there has been a formidable work which has been put up from the people who are working in the frontlines such as hospitals, clinics, and labs alongside researchers and scientists who are also putting tremendous efforts in the fight against COVID-19 pandemic. Due to the preposterous spread of the virus, the integration of the artificial intelligence has taken a considerable part in the health sector, by implementing the fundamentals of Automatic Speech Recognition (ASR) and deep learning algorithms. In this paper, we illustrate the importance of speech signal processing in the extraction of the Mel-Frequency Cepstral Coefficients (MFCCs) of the COVID-19 and non-COVID-19 samples and find their relationship using Pearson correlation coefficients. Our results show high similarity in MFCCs between different COVID-19 cough and breathing sounds, while MFCC of voice is more robust between COVID-19 and non-COVID-19 samples. Moreover, our results are preliminary, and there is a possibility to exclude the voices of COVID-19 patients from further processing in diagnosing the disease. Since the horrific outbreak of the COVID-19 has occurred and was declared a global pandemic from the world health organization in March 2020, it has threatened almost all the human beings' lives on earth. For the time being, there are more than 11 million confirmed cases of COVID-19 infections in more than 200 countries. Based on the world health organization (WHO) statistics, there are 528,101 people who passed away after infected by the virus [1] . The new coronavirus is attracting a widespread interest due to its fast dissemination among the people and a vital impact on people with a weak immune system. Besides, the world health organization (WHO) has specified the vital symptoms of COVID-19 as high body temperature, muscle aches, and difficulties regarding breathing and coughing. By the escalation, low abilities to control the spread of the virus and not having the privilege of screening and testing [2] . Due to that, an innovative, effective, and modern way solution is ready to be implemented and integrated into the health sector. Lately, Artificial Intelligence (AI) has been widely implemented in the digital health sector. AI has many applications in the field of speech and audio analysis; it could be implemented in the screening and early detection of the infected people process, which could help control and reduce the number of infected people. It is stated that there is a relationship between the COVID-19 symptoms and a person's speech signal. Thus, it is possible to determine whether the person has an infection or not by performing speech analysis; in addition, the implementation of AI in design chatbots and programmed software would support whoever is suffering from the fallouts of the lockdown and quarantine. This paper is organized as follows: Section II covers the literature review of ASR uses in digital health. Section III explains our methodology. Section IV discusses the experimental results. Finally, section V gives the concluding remarks of this work. AI can be implemented in the digital health sector for the diagnoses and early detection of the symptoms of COVID-19 based on the analysis of the cough, breath, and voice. In addition, it could be utilized to track the mental state of the patients who might be suffering from the aftermath of lockdowns and quarantine [3] . The detection of patients' symptoms severity, physical and mental state can be obtained by analyzing their speech recordings. Moreover, a low cost and a reliable health state detection software can be established by monitoring and analyzing the sleep-quality, severity of illness, fatigue, and anxiety [4] . The health condition of human beings and their mental state can be obtained from the analysis of the features of sounds. Diseases that are related to the respiratory system can be detected using the machine learning algorithms analysis. Moreover, differentiating among coughs to determine the type of disease can be done by evaluating the acoustic features using several classifiers such as Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) [5] . Besides, Voice Activity Detection (VAD) has been done based on the Mel-Frequency Cepstral Coefficients (MFCC) similarity, by collecting the correlation coefficients. It has shown that finding the similarity of the audio sound in a noisy background using MFCC is the most suitable method compared to other features which were evaluated [6] . An algorithm for matching patterns, recognizing and differentiating speech is developed based on using the MFCC as the extracted features and the principle of collecting the correlation coefficients [7] . Also, continues or heavy cough can be considered as a sign or indication for some kind of respiratory disease, thus tracking the condition and health state of patients can be done by implementing the auto cough detection by evaluating the acoustic features using deep learning algorithms. Features extraction such as Short-Time Fourier Transform (STFT), Mel Filter-Bank (MFB), and MFCC have been evaluated using classifiers such as CNN and LSTM. Moreover, the differentiation among coughs has been made based upon the acoustic features [8] . In this paper, we propose, analyze and study the evaluation of the MFCC acoustic features and the correlation analysis of these features, which were extracted from infected patients and healthy individuals to illustrate if there is a relationship or not. Also, an initial hypothesis would be provided on what symptoms should be considered best for tracking, monitoring, and diagnosing. In this work, collecting data is considered as the first step. Presently, data gathering is underway globally from both contaminated patients and healthy individuals. Thus, our dataset is comprised of 7 healthy speakers (4 male and 3 female), and 7 COVID-19 infected patients (5 male and 2 female). Data of COVID-19 infected patients was collected from a hospital in Sharjah, United Arab Emirates. Each speaker was asked to cough four times, to take a deep breath, and to count from one to ten. Furthermore, the patients must sit with their head upright in a relaxed manner while recording their speech signals. As a result, three recordings per speaker were acquired within the data collection session using a mobile, which can affect the quality of the sound. Besides, data of healthy speakers were also collected from the United Arab Emirates speakers using the same mechanism. The total collected number of samples used in this study was 42 ((7 healthy speakers × 3 recordings) + (7 infected speakers × 3 recordings)). Due to the inconvenience caused by the epidemic, we were only able to get a proper dataset categorized either as healthy or COVID-19 samples. Speech Pre-processing Speech signals pre-processing is seen as a significant step after capturing the database. To increase the performance of our analysis, we need to perform pre-processing for our recordings. This process is done by isolating the part of the captured sound from silence [9] . Furthermore, assume that x(n) is a speech signal before the pre-processing, and s(n) is the clean speech signal, with a noise denoted by d(n). Then our speech signal before the preprocessing can be represented by the following expression, x(n) = s(n) + d(n) (1) PRAAT is the program we used to complete Pre-Processing. PRAAT is a software tool that is designed to interpret and modify voice signals. It has several characteristics; however, we aim to eliminate portions of the silence. Therefore, we have to cut the silence portion at the beginning and at the end of the captured recordings. Our speech signals have a set of information. The determination of this information is an essential task. Thus, the efficiency of this phase is vital for our analysis. In this work, the features extracted for the best parametric representation of acoustic signals are called Mel-Frequency Cepstral Coefficients (MFCCs) [9] . We have various features that can be extracted from the collected recordings. However, we only focused on extracting MFCCs since they are considered the most important features in distinguishing COVID from non-COVID sounds [5] . Also, MFCC is commonly used in speaker and emotion recognition [11] , [12] [13], [14] , [15] . Furthermore, MFCC relies on human listening features that cannot realize the frequencies above 1KHz. Therefore, signals are expressed in MЕL scale, which utilizes one low-frequency filter below 1000 Hz, and another high-frequency filter above 1000 Hz [16] , [17] . The computation steps of the MFCC is clarified in Fig. 1 . Mel frequency has been computed using the following six steps [6] , [18] , [19] : 1. Pre-emphasis: In this step the speech signal is passed to a high-pass filter. We can represent the phase pre-emphasis by, y(n) = x(n) -a * x(n -1) where x(n) is the input speech signal, the output signal is referred to as y(n), and the value of a is approximately between 0.9 and 1.0. The purpose of this step is to increase the energy in the signal at higher frequencies. 2. Framing: Is the process of dividing the speech signal that has N samples into segments within the range of 20 ms to 40 ms. Adjoining frames are separated by M (M