key: cord-0925594-w8gjfomz authors: Behnood, Ali; Golafshani, Emadaldin Mohammadi; Hosseini, Seyedeh Mohaddeseh title: Determinants of the infection rate of the COVID-19 in the U.S. using ANFIS and virus optimization algorithm (VOA) date: 2020-06-25 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.110051 sha: 2b66ed26516699b02d14bcf0d59671fb611f12ca doc_id: 925594 cord_uid: w8gjfomz Recently, anovel coronavirus virus disease (COVID-19) has become a serious concern for global public health hazards. Infectious disease outbreaks such as COVID-19can also significantly affect the sustainable development of urban areas. Several factors such as population density and climatology parameters could potentially affect the spread of the COVID-19. In this study, a combination of the virus optimization algorithm (VOA) and adaptive network-based fuzzy inference system (ANFIS) to investigate the effects of various climate-related factors and population density on the spread of the COVID-19. For this purpose, data on the climate-related factors and the confirmed infected cases by the COVID-19across the U.S counties was used.The results show that the variable defined for the population density had the most significant impact on the performance of the developed models, which is an indication of the importance social distancing in reducing the infection rate and spread rate of the COVID-19. Among the climatology parameters, an increase in the maximum temperature was found to reduce the infection rate. Average temperature, minimum temperature, precipitation, and average wind speed were not found to significantly affect the spread of the COVID-19 while an increase in the relative humidity was found to slightly increase the infection rate. The findings of this research show that it could be expected to have reduced infection rate over the summer season. However, it should be noted that the models developed in this study were based on limited one-month data. Future investigation can benefit from using more comprehensive data covering a wider range for the input variables. The rapid spread of the novel coronavirus disease (i.e., , which was started since the late December 2019, has become a serious global issue [1] [2] [3] [4] . As of April 28, 2020, the official reports indicated more than 3,000,000 infected cases and over 217,000 confirmed deaths attributed to the COVID-19 complications. In addition, the rapid spread of the COVID-19 has affected 210 countries worldwide. The official statistics shows that with more than 1,000,000 infected cases and over 59,000 confirmed deaths, the USA is one of the countries where the rapid widespread of COVID-19 has seriously threatened the life of people. Several factors could potentially affect the spread and transmission rates of the viruses including population density and climatology parameters (e.g., wind speed, humidity, precipitation, and temperature) [5] [6] [7] [8] [9] [10] . The sustainable development of urban areas necessities the investigation of the effects of these factors on the transmission rate of the viruses to have an efficient spatial organization of the resident areas. Different climate conditions have been reported to affect the transmission rate of viruses differently. The transmission rate of some viruses such as HIV/AIDS are not affected by climate parameters. The HIV/AIDS virus never leaves the host's internal condition as it transfers through sexual intercourse, blood transfusions, or during pregnancy or breastfeeding from mother to child. For the flu virus, dry and cold climates have been found as favorable conditions to spread the virus, while temperatures above 30 ˚C halt its transmission [11] . With regard to the MERS-CoV, the widespread occurrence of this virus was reported to be between April to August, when high temperature is dominant [12] . Moreover, high ultraviolet index, low relative humidity, and low wind speeds were favorable conditions for the spread of the MERS-CoV [12] . In terms of the spread of the COVID-19, lower spread rate is attributed to warm and humid climate conditions in China [5] . However, a warm and humid climate does not seem to completely stop the spread of the COVID-19 [6] . An accurate model to investigate the climatology-related determinants of the spread of COVID-19 can be helpful for the sustainable development of the urban areas. Machine learning (ML) techniques and algorithms, due to their exceptional ability in knowledge processing, have been proven to provide accurate models in many fields of science and engineering [13] [14] [15] . ML techniques have also been widely used in the study of developing models for the prediction of outbreak [16, 17] . Table 1 provides some examples of the previous studies on using the ML techniques for the prediction of the disease outbreak. Climate-related factors such as temperature, wind, salinity, and rainfall were found as the determinants of norovirus outbreaks. Moreover, depth of water in an oyster bed was found as the most significant factor in the developed model. [23] Swine fever Random forests Precipitation and driest month had the most significant effects on the outbreak of African swine fever. [24] Influenza Neural network, random forests, support vector machine The random forests time series provided better statistical fit than support vector machine and artificial neural network in modeling weekly influenza like illness. [25] COVID- 19 Genetic programming The predictive models based on genetic programming provide high accuracy in determining the factors that affect the infection rate of COVID-19. [ 26] In this study, adaptive network-based fuzzy inference system (ANFIS) and virus optimization algorithm (VOA) were used to investigate the effects of climate-related factors on the spread of COVID-19. For this purpose, a dataset containing the information on COVID-19 spread across the U.S. counties was used. A sensitivity analysis was also performed to identify the most significant factors affecting the spread of COVID-19. The data used in this research to study the climate-related determinants of the spread of the COVID-19 in the U.S. was collected from various sources. The distribution of the confirmed infected cases by the COVID-19 across the country was provided by the USAFacts (2020). Information about the average temperature, maximum temperature, minimum temperature, and precipitation was obtained from the NOAA (2020). It should be noted that the data for the month of March was used for these variables. Data for the information on the average annual humidity, average annual wind speed, and population was collected from the USA.com (2020). The population density, as one of the input variables, indicates the number of people per squared miles. The only output variable in this study was the infection rate, which was defined as the number of confirmed infected cases over the days of infection. The counties with less than 10 confirmed infected cases were removed from the analysis to reduce the errors related to the random effects of these counties. Overall, a total of 1657 counties were to model the spread of the COVID-19. The descriptive statistics of the input and output variables are given in Table 2 . In order to further demonstrate the distribution of the climatology variables in the gathered database, Fig. 2 shows the infection rate, as the indicator of the COVID-19 outbreak, versus the seven input variables. As depicted in this figure, there is a direct relationship between the infection rate and the population density. However, the changing range of the infection rate for a given population density is remarkable. In the case of the other six input variables, there is not an apparent trend between them and the infection rate, which makes the modeling of the COVID-19 outbreak more difficult. The relat ionship between the infect ion rate and the climatology variables. The pairwise relationships between the input variables are depicted in Fig. 3 . As expected, the correlation between the average, minimum and maximum temperatures are high. To introduce the proposed machine learning method, the virus optimization algorithm (VOA) is described at first, followed by the main concepts of the adaptive neuro-fuzzy inference system (ANFIS). Finally, the incorporation of the ANFIS and VOA is explained as the suggested model for the determination of the climatology factors' influence on the COVID-19 outbreak. All viruses with different sources include an envelope, a protein coat, and a genetic VOA is a population-based optimization algorithm in which each virus that attacks a host cell is a candidate solution for the optimization problem [30] . There are three main steps in VOA, including initialization, replication, and maintenance phases. In the initialization phase, the primitive viruses are randomly generated, evaluated, and sorted from the best to the worst virus. Then, all created viruses are classified into strong viruses (SVs) and the common viruses (CVs) in which the SVs and CVs are the best and the worst viruses, respectively. Next, the replication phase starts by producing new viruses by changing the SVs and CVs using the following equations: where , and are the jth dimension of the ith new virus, the common virus, and the strong virus, respectively; rand() is a number between zero and one; and Int is a parameter which set to one at the beginning of the algorithm. If the average performance of all viruses in the current replication is less than that in the previous replication, one unit is added to Int. Fig. 1 . The Pseudo-code of the VOA. ANFIS is a ML method that benefits the fuzzy system in an adapted network structure [31] . This method is an extension of the TSK fuzzy system [32] , which discovers the knowledge between input and output variables of a system using If-Then fuzzy rules. Each fuzzy rule consists of the antecedent and consequence parts in which the former part is presented as fuzzy inputs, and the latter one can be expressed as a linear combination of crisp input variables. Moreover, the fuzzy inputs in the antecedent part of a rule are aggregated with each other with AND logistic operator. The kth fuzzy rule (R k ) of a system with n input variables is as follows: where x i and y k are the ith input variable and the output of kth fuzzy rule, respectively; is the membership function of the ith input variable related to the kth rule; is the regression coefficients in the antecedent part; and is its bias. To measure the quality of the developed ANFIS models in mapping the climatology variables to the infection rate, three statistical indicators, including root mean squared deviation (RMSE), correlation coefficient and coefficient of determination (R 2 ) were used. All these indicators are shown in the followings: where OO i and PO i and are the observed and predicted infection rate of the ith county, respectively; and n c is the number of counties. A model with higher accuracy will have a lower RMSE value and R and R 2 values close to one. In this study, the curve fitting was carried out on the available data about the COVID-19 outbreak of 1657 counties in the USA using three developed ANFIS models. To run the developed ANFIS-VOA models, the MNVR, NIV, NSV, CVGR, and SVGR were, respectively, set as 5000, 50, 10, 8, and 2, where the first two parameters were determined using the trial and error and the rest were selected based on the values obtained in a previous study [35] . The statistical indicators of all developed models are shown in Table 3 . To compare the results of the developed models, the linear regression (LR) model was also used. As inferred from this table, the performance of the ANFIS-VOA-II is better than the other ANFIS and LR models. Moreover, the LR model is by far the worst model, and the classical ANFIS model's performance is weaker than the ANFIS-VOA models. In terms of RMSD, the ANFIS-VOA-II model is 18.73%, 26.68%, and 47.76% better than the ANFIS-VOA-I, classic ANFIS and LR models, respectively. The higher R 2 value of the ANFIS-VOA-II model compared to the other developed models shows the strength of the relationship between this model and the input variables considered in this study. The correlations between the observed and predicted infection rate of all developed ANFIS models are more than 0.7, which shows strong correlations. However, the ANFIS-VOA-II and LR models have the best and the worst ranks, respectively. The MAE of the ANFIS-VOA-II model is 7.3337, which is respectively, 18.96%, 18.63%, and 39.84% lower than the MAEs of the ANFIS-VOA-I, classic ANFIS and LR models. In order to obtain the relative importance of each input variable, a parametric study was performed. In this regard, the change in the infection rate was measured when a variable was altered from its lowest to highest values, and other variables were fixed on their average values. By calculating the changes in the infection rate for all input variables, their values were normalized and expressed in percentage to obtain their relative importance. Fig. 8 illustrates the relative importance of all input variables using the ANFIS-VOA-II model. As revealed in this figure, the population density with the relative importance of 62% is by far the most critical variable. The maximum temperature with the relative importance of almost one-third of the population density is in the second rank, and the humidity variable with the relative importance of about one-ninth of the population density has the third rank. The other five climatology variables have the relative importance of less than 10% so that the sum of their relative importance is still 2% less than the relative importance of the maximum temperature, and it is also about 30% of the relative importance of the population density. Moreover, the precipitation and the average temperature are the two climatology variables held in the lowest rank. showed the most significant effect on the infection rate. This finding highlights the importance of social distancing in reducing the infection rate. Among the climate parameters, maximum temperature was found to have the most significant effect on the infection rate. An increase in the maximum temperature reduced the infection rate. Average temperature, minimum temperature, precipitation, and average wind speed were not found to significantly affect the spread of the COVID-19 while an increase in the relative humidity was found to slightly increase the infection rate. The findings of this research show that it could be expected to have reduced infection rate over the summer season. However, it should be noted that the models developed in this study were based on limited one-month data. Future investigation can benefit from using more comprehensive data covering a wider range for the input variables. Declaration of Interest Statement: The impact of COVID-19 pandemic upon stability and sequential irregularity of equity and cryptocurrency markets Modeling COVID-19 epidemic in Heilongjiang Province Forecast and evaluation of COVID-19 spreading in USA with reduced-space Gaussian process regression Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil High temperature and high humidity reduce the transmission of COVID Investigation of effective climatology parameters on COVID-19 outbreak in Iran Development of an assessment method forinvestigating the impact of climate and urban parameters in confirmed cases of COVID-19: A new challenge in sustainable development Investigating a serious challenge in the sustainable development process: Analysis of confirmed cases of COVID-19 (new type of coronavirus) through a binary classification using artificial intelligence and regression analysis Predicting virus emergence amid evolutionary noise Association between ambient temperature and COVID-19 infection in 122 cities from China Influenza virus transmission is dependent on relative humidity and temperature Climate factors and incidence of Middle East respiratory syndrome coronavirus Estimation of the compressive strength of concretes containing ground granulated blast furnace slag using hybridized multiobjective ANN and salp swarm algorithm Machine learning study of the mechanical properties of concretes containing waste foundry sand Estimation of the dynamic modulus of asphalt concretes using random forests algorithm A predictive management tool for blackfly outbreaks on the Orange River Predicting antigenic variants of H1N1 influenza virus based on epidemics and pandemics using a stacking model Data mining techniques for predicting dengue outbreak in geospatial domain using weather parameters for Spatiotemporal dengue fever hotspots associated with climatic factors in Taiwan including outbreak predictions based on machine-learning Development of artificial intelligence approach to forecasting oyster norovirus outbreaks along Gulf of Mexico coast Development of genetic programming-based model for predicting oyster norovirus outbreak risks Environmental indicators of oyster norovirus outbreaks in coastal waters Modeling and prediction of oyster norovirus outbreaks along Gulf of Mexico Coast Prediction for global African swine fever outbreaks based on a combination of random forest algorithms and meteorological data Comparative evaluation of time series models for predicting influenza outbreaks: application of influenza-like illness data from sentinel sites of healthcare centers in Iran Time Series Analysis and Forecast of the COVID-19 Pandemic in India using Genetic Programming Coronavirus locations: COVID-19 map by county and state Climate at a Glance -(National Center for Environmental Information) Your local guide to cities, towns, neighborhoods, states, counties, metro areas, zip codes, area codes, and schools in USA 2020 A novel metaheuristic for continuous optimization problems: Virus optimization algorithm Adaptive-Network-Based Fuzzy Inference System Fuzzy identification of systems and its applications to modeling and control Fuzzy model identification based on cluster estimation Pattern Recognition with Fuzzy Objective Function Algorithms Predicting the compressive strength of normal and High-Performance Concretes using ANN and ANFIS hybridized with Grey Wolf Optimizer The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.