key: cord-0819141-7v5wpu88 authors: Mahmoudi, Mohammad Reza; Baleanu, Dumitru; Mansor, Zulkefli; Tuan, Bui Anh; Pho, Kim-Hung title: Fuzzy Clustering method to Compare the Spread Rate of Covid-19 in the High Risks Countries date: 2020-08-22 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.110230 sha: b1d780ae69e2d7f71a5d53ad3b74dd510bcc9b26 doc_id: 819141 cord_uid: 7v5wpu88 The numbers of confirmed cases of new coronavirus (Covid-19) are increased daily in different countries. To determine the policies and plans, the study of the relations between the distributions of the spread of this virus in other countries is critical. In this work, the distributions of the spread of Covid-19 in Unites States America, Spain, Italy, Germany, United Kingdom, France, and Iran were compared and clustered using fuzzy clustering technique. At first, the time series of Covid-19 datasets in selected countries were considered. Then, the relation between spread of Covid-19 and population's size was studied using Pearson correlation. The effect of the population's size was eliminated by rescaling the Covid-19 datasets based on the population's size of USA. Finally, the rescaled Covid-19 datasets of the countries were clustered using fuzzy clustering. The results of Pearson correlation indicated that there were positive and significant between total confirmed cases, total dead cases and population's size of the countries. The clustering results indicated that the distribution of spreading in Spain and Italy was approximately similar and differed from other countries. Coronaviruses are a large group of viruses that trace respiratory and neurological systems [1] [2] [3] . In 2003 and 2012 two types of these viruses, called SARS coronavirus (SARS-CoV) and MERS coronavirus (MERS-CoV) were observed in some countries [4] . In last months of 2019, a new type of these viruses, called Covid-19 (2019-nCoV) was reported in Wuhan city in China [5] [6] [7] [8] . The reports show that has been observed in more than 220 countries (up to 18 April 2020). Since January to today 18 April 2020, the spread rate of Covid-19 has increased daily in different countries, specially in Unites States America [9] , Spain [10] , Italy [11] [12] [13] [14] , Germany [15] , United Kingdom [16] [17] [18] [19] , France [11, [20] [21] [22] , Iran [23] and many others. The spread rate of the Covid-19 has many dangers and consequently needs strict special policies and plans. Therefore, the study of the relations between the distributions of the spread of this virus in other countries is critical. In this work, the distributions of the spread of Covid-19 in Unites States America, Spain, Italy, Germany, United Kingdom, France, and Iran are compared and clustered using fuzzy clustering technique. At first, we consider the time series of Covid-19 datasets in selected countries. Then, the correlations between these time series are computed. Finally, the observed time series are rescaled and categorized using fuzzy clustering technique. The main novelties of the current research can be summarized as following: 1. The relation between spread of Covid-19 and population's size is studied. 2. The Covid-19 datasets are rescaled based on the population's size of USA. 3. The rescaled Covid-19 datasets of the countries with high spread risk are clustered using fuzzy clustering. This section discusses various topics such as data collection and data analysis techniques. The first subsection deals with the characteristics of research's dataset. Then the methods used to analyze the dataset are described. The dataset of this work contained the entire confirmed and dead Covid-19 cases in high risk countries including Unites States America, Spain, Italy, Germany, United Kingdom, France, and Iran from 22 February 2020 up to 18 April 2020 based on WHO statistics. Table 1 summarized descriptive statistics about the considered dataset. As it can be observed, Unites States America, Spain, Italy, Germany, France, United Kingdom, and Iran have the most means of daily confirmed cases, respectively. Also, Unites States America, Italy, Spain, France, United Kingdom, Iran, and Germany have the most means of daily dead cases, respectively. Figure 1 also shows the plots of daily confirmed cases, dead cases, cumulative confirmed cases, and cumulative dead cases in in Unites States America, Spain, Italy, Germany, United Kingdom, France, and Iran from 22 February 2020 up to 18 April 2020. To study the relations between total confirmed cases, total dead cases and population's size of the countries, the Pearson coefficient of correlation is used. The results are reported in Table 2 . The results indicated that there are positive and significant (p-value lower than 0.05) between total confirmed cases, total dead cases and population's size of the countries. Therefore, because the number of cases is dependent to the size of population, the comparison of the countries based on the number of confirmed cases or dead cases are not scientifically true. To solve this problem, the effect of the population's size should be eliminated. We used rescaled data as following: and Figure 2 shows the plots of rescaled data for daily confirmed cases, dead cases, cumulative confirmed cases, and cumulative dead cases in in Unites States America, Spain, Italy, Germany, United Kingdom, France, and Iran from 22 February 2020 up to 18 April 2020. Table 3 summarized descriptive statistics about the rescaled dataset. As it can be observed, Spain, Italy, Unites States America, Germany, United Kingdom, France, and Iran have the most mean of rescaled daily confirmed cases, respectively. Also, Spain, Italy, France, United Kingdom, Unites States America, Iran, and Germany have the most mean of daily rescaled dead cases, respectively. Clustering [24] is a major task in data mining. It has many applications such as image processing, diagnosis systems, classification, missing value management and imputation, optimization, bioinformatics, machine learning [25] . Recently inspiring by classifier ensemble, the clustering ensemble [26] has emerged. But these methods use hard clustering as base clustering algorithm. Recently soft clustering algorithms [27] have been popular and it has been shown that these methods are superior to traditional hard clustering algorithms [28] [29] [30] . We can use soft clustering and fuzzy clustering interchangeably. Each data point belongs to all clusters (although the membership values are different) in soft clustering. It is worthy to be mention that the different membership values of a data point to all clusters should sum up to one. Fuzzy C-means (FCM) clustering algorithm [30] can be arguably considered to be the most popular soft clustering algorithm. Given a set of records ⃗ ⃗ , a set of fuzzy cluster defined by centroids ⃗ , along with a membership matrix , a soft clustering algorithm intends to divide into partitions , where is achieved according to equation (1) (ties are broken randomly). where is th dimension of th fuzzy cluster centroid. All of centers and membership matrix are optimal, if they minimize the error function presented in equation (2). subject to the constraints ∑ . Matrix is of size whose column vectors are denoted by ⃗ . To solve equation 11, we should employ a new set of Lagrange multipliers for constraints ∑ , and then minimize the final (constraint-free) error function presented in equation (3). For a fix membership matrix , the optimal can be achieved by setting . Equation (4) presents . Solving equation (4) with respect to gives equation (5). For a fixed cluster center matrix , we compute the optimal by setting and . Equation (6) represents . Equation (7) presents . ∑ . To solve equation (6) with respect to we can reach equation (8). Solving equation (8) with respect to we obtain equation (9). ) Substituting this expression in equation (7) results in a new equation and solving the resultant equation in terms of yields to equation (10) . If we substitute equation (10) in equation (9), we can reach a new based on equation (11) . To compare and classify the distributions of the spread of Covid-19 in Unites States America, Spain, Italy, Germany, United Kingdom, France, and Iran, the fuzzy clustering technique is applied on rescaled Covid-19 datasets including confirmed cases, dead cases, cumulative confirmed cases, and cumulative dead cases. As it can be seen in Figure 2 and Table 3 , because of the effect of population's size, the rescaled datasets are different from main datasets and are scientifically good choices to compare different countries. In next subsections the results of fuzzy clustering are reported. To determine the number of clusters Kaiser Index was used and the number was considered as the number of eigen-values of correlation matrix that are more than 1. Table 4 and Figures 3 and 4 provide the results of the fuzzy clustering technique. As it can be observed in Table 4 and Figures 3 and 4 , the rescaled numbers of confirmed cases in these considered countries can be divided in three clusters. Table 4 shows the probabilities of the membership of each country in each cluster. For each country, the maximum value of the probabilities of the membership to each cluster has been bolded. Based on these values, the first cluster consists of Spain (with probability 1.00). Also, the second cluster consists of Unites States America and United Kingdom (with probabilities 0.86 and 0.85, respectively). Moreover, the third cluster consists of Italy, Germany, France and Iran (with probabilities 0.46, 0.86, 0.78 and 0.73, respectively). In other words, Unites States America and United Kingdom are statistically similar; Italy, Germany, France and Iran are statistically similar; and Spain are significantly different form them. Table 5 and Figures 5 and 6 provide the results of the fuzzy clustering technique. As it can be observed in Table 5 and Figures 5 and 6 , the rescaled numbers of dead cases in these considered countries can be divided in three clusters. Table 5 shows the probabilities of the membership of each country in each cluster. For each country, the maximum value of the probabilities of the membership to each cluster has been bolded. Based on these values, the first cluster consists of United Kingdom and France (with probabilities 0.75 and 0.86, respectively). Also, the second cluster consists of Unites States America, Germany and Iran (with probabilities 0.77, 0.99 and 0.95, respectively). Moreover, the third cluster consists of Spain and Italy (with probabilities 0.89 and 0.73, respectively). In other words, United Kingdom and France are statistically similar; Unites States America, Germany and Iran are statistically similar; and Spain and Italy are statistically similar. Table 6 and Figures 7 and 8 provide the results of the fuzzy clustering technique. As it can be observed in Table 6 and Figures 7 and 8 , the rescaled numbers of cumulative confirmed cases in these considered countries can be divided in three clusters. Table 6 shows the probabilities of the membership of each country in each cluster. For each country, the maximum value of the probabilities of the membership to each cluster has been bolded. Based on these values, the first cluster consists of Unites States America, Germany and France (with probabilities 0.90, 0.95 and 0.97, respectively). Also, the second cluster consists of United Kingdom and Iran (with probabilities 0.58 and 0.96, respectively). Moreover, the third cluster consists of Spain and Italy (with probabilities 0.94 and 0.82, respectively). In other words, Unites States America, Germany and France are statistically similar; United Kingdom and Iran are statistically similar; and Spain and Italy are statistically similar. Table 7 and Figures 9 and 10 provide the results of the fuzzy clustering technique. As it can be observed in Table 7 and Figures 9 and 10 , the rescaled numbers of cumulative dead cases in these considered countries can be divided in three clusters. Table 7 shows the probabilities of the membership of each country in each cluster. For each country, the maximum value of the probabilities of the membership to each cluster has been bolded. Based on these values, the first cluster consists of Spain and Italy (with probabilities 0.97 and 0.97, respectively). Also, the second cluster consists of United Kingdom and France (with probabilities 0.89 and 0.94, respectively). Moreover, the third cluster consists of Unites States America, Germany and Iran (with probabilities 0.96, 0.98 and 0.98, respectively). In other words, Spain and Italy are statistically similar; United Kingdom and France are statistically similar; and Unites States America, Germany and Iran are statistically similar. To consider the policies and plans to manage the spread of Covid ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Funding: No fund. The authors declare no conflict of interest. Emerging coronaviruses: genome structure, replication, and pathogenesis Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor Review of bats and SARS. Emerging infectious diseases Transmission scenarios for Middle East Respiratory Syndrome Coronavirus (MERS-CoV) and how to tell them apart. Euro surveillance: bulletin Europeen sur les maladies transmissibles Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China Clinical characteristics of coronavirus disease 2019 in China Active Monitoring of Persons Exposed to Patients with Confirmed COVID-19 -United States The resilience of the Spanish health system against the COVID-19 pandemic. The Lancet Public Health Analysis and forecast of COVID-19 spreading in China, Italy and France COVID-19 and Italy: what next Case-fatality rate and characteristics of patients dying in relation to COVID-19 in Italy COVID-19 in Italy: momentous decisions and many uncertainties. The Lancet Global Health Transmission of 2019-nCoV infection from an asymptomatic contact in Germany Covid-19: UK starts social distancing after new model points to 260 000 potential deaths Novel coronavirus disease (Covid-19): the first two patients in the UK with person to person transmission Coronavirus disease 2019 (covid-19): a guide for UK GPs Covid-19 and the stiff upper lip-the pandemic response in the united kingdom Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label nonrandomized clinical trial First cases of coronavirus disease 2019 (COVID-19) in France: surveillance, investigations and control measures Rapid viral diagnosis and ambulatory management of suspected COVID-19 cases presenting at the infectious diseases referral hospital COVID-19 battle during the toughest sanctions against Iran Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification Diversity based cluster weighting in cluster ensemble: an information theory approach Clustering ensemble selection considering quality and diversity An Ensemble of Locally Reliable Cluster Solutions Reliability-Based Fuzzy Clustering Ensemble. Fuzzy Sets and Systems Elite fuzzy clustering ensemble based on clustering diversity and quality measures A fuzzy clustering ensemble based on cluster clustering and iterative fusion of base clusters Large Sample Inference on the Ratio of Two Independent Binomial Proportions Inference on the Ratio of Means in Two Independent Populations Inferrence on the Ratio of Variances of Two Independent Populations Inferrence on the Ratio of Correlations of Two Independent Populations On the Ratio of Two Independent Skewnesses Testing the Difference between Two Independent Time Series Models A new method to compare the spectral densities of two independent periodically correlated time series Testing the difference between spectral densities of two independent periodically correlated (cyclostationary) time series models Testing the Difference between Two Independent Regression Models Testing the Equality of Two Independent Regression Models On Comparing Two Dependent Linear and Nonlinear Regression Models On Comparing and Classifying Several Independent Linear and Non-Linear Regression Models with Symmetric Errors Modeling caffeine adsorption by multi-walled carbon nanotubes using multiple polynomial regression with interaction effects Evaluation of changes in RDIst index effected by different Potential Evapotranspiration calculation methods On the Probable Error of a Coefficient of Correlation Deduced from a Chebyshev cardinal wavelets for nonlinear stochastic differential equations driven with variable-order fractional Brownian motion A New Method to Detect Periodically Correlated Structure Periodically Correlated Modeling by Means of the Periodograms Asymptotic Distributions On the Asymptotic Distribution for the Periodograms of Almost Periodically Correlated (Cyclostationary) Processes Goodness of fit test for almost cyclostationary processes On comparing and clustering the spectral densities of several almost cyclostationary processes Testing the equality of the spectral densities of several uncorrelated almost cyclostationary processes A novel method to detect almost cyclostationary structure Fuzzy clustering to classify several regression models with fractional Brownian motion errors A comprehensive numerical study of space-time fractional bioheat equation using fractional-order Legendre functions An efficient neuroevolution approach for heart disease detection Parsimonious Evolutionary-based Model Development for Detecting Artery Disease Neuroevolution-based Autonomous Robot Navigation: A Comparative Study A benchmark of recent population-based metaheuristic algorithms for multi-layer neural network training Evolving artificial neural networks using butterfly optimization algorithm for data classification Time series modelling to forecast the confirmed and recovered cases of COVID-19 Diagnosis and clustering of power transformer winding fault types by cross-correlation and clustering analysis of FRA results