key: cord-0890907-l5jn0uch
authors: Zivkovic, Miodrag; Bacanin, Nebojsa; Venkatachalam, K.; Nayyar, Anand; Djordjevic, Aleksandar; Strumberger, Ivana; Al-Turjman, Fadi
title: COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach
date: 2020-12-30
journal: Sustain Cities Soc
DOI: 10.1016/j.scs.2020.102669
sha: 9b59288167e8e66a69c2683c3d7514f51a6020a9
doc_id: 890907
cord_uid: l5jn0uch

The main objective of this paper is to further improve the current time-series prediction (forecasting) algorithms based on hybrids between machine learning and nature-inspired algorithms. After the recent COVID-19 outbreak, almost all countries were forced to impose strict measures and regulations in order to control the virus spread. Predicting the number of new cases is crucial when evaluating which measures should be implemented. The improved forecasting approach was then used to predict the number of the COVID-19 cases. The proposed prediction model represents a hybridized approach between machine learning, adaptive neuro-fuzzy inference system and enhanced beetle antennae search swarm intelligence metaheuristics. The enhanced beetle antennae search is utilized to determine the parameters of the adaptive neuro-fuzzy inference system and to improve the overall performance of the prediction model. First, an enhanced beetle antennae search algorithm has been implemented that overcomes deficiencies of its original version. The enhanced algorithm was tested and validated against a wider set of benchmark functions and proved that it substantially outperforms original implementation. Afterwards, the proposed hybrid method for COVID-19 cases prediction was then evaluated using the World Health Organization’s official data on the COVID-19 outbreak in China. The proposed method has been compared against several existing state-of-the-art approaches that were tested on the same datasets. The proposed CESBAS-ANFIS achieved [Formula: see text] score of 0.9763, which is relatively high when compared to the [Formula: see text] value of 0.9645, achieved by FPASSA-ANFIS. To further evaluate the robustness of the proposed method, it has also been validated against two different datasets of weekly influenza confirmed cases in China and the USA. Simulation results and the comparative analysis show that the proposed hybrid method managed to outscore other sophisticated approaches that were tested on the same datasets and proved to be a useful tool for time-series prediction.

The recently discovered coronavirus SARS-CoV-2, known by the name COVID-19 (Coronaviridae Study Group of the International et al., 2020) , is a novel respiratory virus, which was initially detected in humans in December 2019, in Wuhan, China (Chan et al., 2020; Yadav & Saxena, 2020) . Since then, the virus has spread worldwide, affecting more than 200 countries, with the number of reported cases rising to over 13 million infected people, and the number of deaths rising to 570,000 as the mid of July 2020. Since it is a novel virus, scientists and epidemiologists still have a lot to learn about it. One fact that remains The first estimations from the World Health Organization (WHO) were that the novel coronavirus is extremely contagious and dangerous (World Health Organization et al., 2020) . Within the first three months of the pandemic, the virus had spread over all continents and reached almost every country in the world. Most of the countries officials worldwide declared the state of emergency and enforced regulations regarding social distancing, and many other relevant control measures in order to try and control the virus spread and minimize the number of deaths (Sohrabi et al., 2020; Spinelli & Pellino, 2020) . The main goal was to limit the number of infected individuals so that the health system does not get overwhelmed by the people with serious respiratory illness who would require intensive care in hospitals. As a result, airports, schools, faculties, public transport, and many businesses were shut down across the world, and the people were encouraged to practice social distancing and work from home if possible. Nevertheless, some countries were affected by the virus much severely than other countries, which have translated unfortunately in a greater number of deaths.

The officials had widely utilized various epidemiological models (Morens, Taubenberger, Harvey, & Memoli, 2010; Rypdal & Sugihara, 2019; Scarpino & Petri, 2019) to try and estimate the outbreak, identify and estimate the peak of the epidemic as early as possible, and to try predicting the number of potential deaths. Based on these prediction models, the officials decided what measures must be taken in order to control the outbreak, suggested new policies, and also assessed the effectiveness of the measures that were already in place. Therefore, the accuracy of the outbreak prediction model which is being used is critical in order to obtain relevant insight into the possible spread of the virus and death toll of the disease.

As a matter of fact, COVID-19 is not the first coronavirus that has threatened humanity in the past twenty years. The first virus outbreak was the SARS in 2003, followed by the MERS outbreak in 2012. In the past two decades, there were several other disease outbreaks around the world, including Ebola, swine flu, H1N1 flu, the previously mentioned SARS and MERS, and the most recent Zika virus. These outbreaks led to the development of novel and advanced epidemiological models, which were able to predict the outbreaks with high accuracy. Unfortunately, the COVID-19 pandemic has shown a non-linear and very complex nature, as it has been shown in Ivanov (2020) .

The novel coronavirus outbreak has also exhibited a lot of differences compared to the other previous outbreaks, which had put in doubt the practical ability of the existing models to deliver accurate predictions and results. The COVID-19 outbreak still has multiple unknown variables which are influencing the spread of the virus -the complex and varying behavior of the population in different countries, different approaches of the governments and officials when applying the measures to contain the virus spread, declared a state of emergency to name the few. These unknown variables had decreased current models performances drastically (Scarpino & Petri, 2019) . Some of the more recent models have included the influence of social distancing, quarantine, and curfew into their outbreak prediction, i.e. Zhan et al. (2019) and Rypdal and Sugihara (2019) .

The overview of the recent literature which considers the prediction of the virus spread shows a significant amount of research currently going on about this hot topic. Most of the recent research focuses on the estimation of the number of infected people, serious cases (infected individuals who must be taken care of in intensive care units), and fatalities. This kind of research is extremely important for the current outbreak, although the virus has already shown some signs of slowing down in some countries especially in Europe and eastern Asia, where the outbreak control measures have already been relaxed to some extent.

The virus is currently raging in North and South America, with the number of reported infections showing that India and Russia particularly have also been affected a lot, so one can safely say that we are still far away from the global pandemic suppression. This research is also important for the future, as no one is certain whether or not there will be a second wave later this year/or next year, and if there is a second wave, would it be more or less dangerous and lethal than the first wave. Secondly, this research can help in predicting the outbreak of some completely new disease in the following years.

The majority of the recent papers deal with the prediction models. Outbreak prediction with the machine learning approach was discussed in Ardabili et al. (2020) . It investigates a wide range of machine learning models and outlines two models that have shown promising results (MLP or multi-layered perceptron, and ANFIS -adaptive networkbased fuzzy inference system). The conclusion of this research suggests that machine learning can be used effectively to model the outbreak of the disease. This approach was later exemplified on the case of Hungary (Pinter, Felde, Mosavi, Ghamisi, & Gloaguen, 2020) , in order to demonstrate the potential of the machine learning approach and to set a path for future research. The research presented in Suzuki and Suzuki (2020) utilize machine learning approach to estimate the number of reported cases in each province of South Korea, by employing a combination of XGBoost and MultiOutputRegressor as a machine learning model. Alternative machine learning approach was conducted in Liu et al. (2020) , by combining disease estimates from mechanistic models with digital traces, in order to reliably forecast COVID-19 activity in the Chinese provinces at near real-time. More precisely, the proposed method was able to produce stable and accurate forecasts 2 days ahead of the current time. This was done by combining inputs from official health reports from Chinese Center Disease for Control and Prevention, COVID-19-related internet search activity, news, and media activity, and daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model.

In the modern world, a large number of cities developed into smart cities, with significant application of the Internet of Things (IoT) (Silva, Khan, & Han, 2018) . The rapid growth of the population in the cities, together with the urbanization process required introducing new ways to handle this process with minimal consequences to the environment and lifestyle commodity of the citizens. Smart cities evolved through intensive application of IoT, and today the city operations are supported in an intelligent manner, while minimizing the human interaction. The concept of the smart and sustainable city is founded on a basis of a large number of interdisciplinary sciences, working together to achieve ecologically and technologically advanced environment (Bibri & Krogstie, 2017) . These cities are almost exclusively found in technologically advanced and rich countries. However, even in the most developed countries, it was obvious that no one was prepared for a global pandemic of COVID-19 proportions. Lock-downs and curfews were necessary in the smart cities. Additionally, schools, restaurants and small business were closed for months. As a result, various research are focused on the social distancing rules and required sufficient ventilation in the buildings, in order to try to bring the lifestyle of the citizens as close to normal as possible while preventing the virus transmission (Sun & Zhai, 2020) . The lessons learned during the COVID-19 will have a huge impact on the architecture and urbanism, which will never be the same after the pandemic due to the fear of infection (Megahed & Ghoneim, 2020) . Additionally, smart cities will be including new technology for different threats like COVID-19 in the future. The research presented within this paper aimed to show the sophisticated mechanism for predicting the number of new cases of infection on a city level. The simulations conducted in this paper were based on the available dataset from China and other datasets were used for the purpose of the comparative analysis. The proposed method can be easily implemented in smart cities to address any future pandemic situation and help predicting the number of the confirmed cases, which will help in deciding which measures need to be taken and at what time to save lives.

The basic research question, as well as motivation behind the research proposed in this paper, can be defined as follows: Is it possible to further improve current time-series prediction (forecasting) algorithms based on hybrids between machine learning and nature-inspired algorithms?.

To accomplish this goal, an enhanced version of the recently developed beetle antennae search (BAS) algorithm has been adopted for updating parameters of the adaptive neuro-fuzzy inference system (ANFIS) machine learning method. Antecedent and conclusion ANFIS parameters have been taken into the consideration, while the type of membership function was not subject to the optimization process.

Moreover, during practical simulations with the original BAS algorithm, that belongs to the family of swarm intelligence, some deficiencies can be observed. Therefore, for the purpose of this research, first, the basic BAS algorithm has been improved and then both algorithms were tested on the standard set of unconstrained benchmarks to validate enhancements. Next, a framework based on the ANFIS trained by the improved BAS algorithm has been employed to create a prediction model for the virus outbreak anticipation. The main goal of this research is to enhance the prediction accuracy of new cases of COVID-19. The secondary objective is to try to improve the original BAS algorithm by addressing its deficiencies.

Due to the fact that COVID-19 is the most important and urgent global challenge that human beings face, the proposed method has been tested on the COVID-19 dataset from China. Moreover, to show that the proposed method can be successfully applied in predicting time-series of any other disease, additional experiments have been performed with the dataset of weekly confirmed cases of influenza. One additional experiment has been performed to predict if climate and environment variables, such as population density, can have an effect on the infection rate.

The rest of the paper is organized as follows: Section 2 gives an overview of the ANFIS and swarm intelligence metaheuristics application in solving various NP-hard problems, Section 3 provides insights of ANFIS method. In Section 4, details of the original and improved BAS algorithm have been given, as well as of the proposed ANFIS framework implementation. Section 5 presents the simulation results and discussion, while Section 6 provides a conclusion and final remarks of this research along with the recommendations for the future work.

This section presents a literature overview of the ANFIS model and swarm intelligence applications in solving various real-life problems. This section also discusses how swarm intelligence can be applied to the optimization of the ANFIS model parameters.

The ANFIS belongs to the group of artificial intelligence techniques, and it merges artificial neural networks with fuzzy inference systems. Its structure allows it to be used in the process of modeling a large number of systems from various application domains. Neuro-fuzzy systems have been applied to solving many real-world problems. ANFIS was introduced by Jang in 1993 (Jang, 1993) , and it is considered to be one of the most popular neuro-fuzzy systems. It combines the characteristics and advantages of both artificial neural networks and fuzzy inference systems, therefore providing a firm background for problem identification and modeling. As it can be seen from the available literature, ANFIS has been widely used in time series forecasting. Some of the fields where ANFIS was successfully applied include traffic control, medical systems, economic data, image processing, feature extraction, forecasting, etc Karaboga and Kaya (2019a) .

By reviewing the most recent ANFIS publications it can be seen that many successful ANFIS implementations exist. For example, in Naderloo et al. (2012) , ANFIS was applied to predict crop yield, based on the different energy inputs. The ANFIS was also used to estimate the relative viscosity of nanofluids (Baghban, Jalali, Shafiee, Ahmadi, & Chau, 2019) . In Harandizadeh, Toufigh, and Toufigh (2019) , ANFIS approach was used to estimate the bearing capacity of piles, with promising results. Another recent publication shows an approach that employs a hybrid ANFIS model for forest fire probability prediction (Jaafari, Zenner, Panahi, & Shahabi, 2019) .

The ANFIS was also used numerous times for disease diagnosis and spread forecasting. The application of neuro-fuzzy systems in forecasting Measles cases in Ethiopia was discussed in Uyar, Ilhan, Iseri, and Ilhan (2019) . Other applications of ANFIS for disease forecasts include: Hepatitis C virus epidemic (Khodaei-mehr, Tangestanizadeh, Vatankhah, & Sharifi, 2018) , tuberculosis (Mohammed, Ahmed, Al-Mousawi, Azeez, et al., 2018; Uçar, Karahoca, & Karahoca, 2013) , and finally, COVID-19. The COVID-19 related applications include forecasting the confirmed cases of the COVID-19 in China (Al-qaness, Ewees, Fan, & Aziz, 2020), outbreak prediction (Ardabili et al., 2020) and prediction case-study on the state of Hungary (Pinter et al., 2020) . Since the COVID-19 is a relatively new challenge, only a few studies have been found from this domain.

One of the greatest issues and challenges in machine learning algorithms is to establish optimal or near-optimal values of its parameters for tackling a specific problem. Unfortunately, there is no universal rule and to solve each specific problem, a different set of parameters values should be determined. Establishing optimal or near-optimal values of these parameters is an NP-hard task and for its solving metaheuristics approaches could be applied.

NP-hard problems cannot be solved within the polynomial time if only traditional (deterministic) methods are used. NP-hard problems have considerable practical importance and belong to the domain of the theory of computational complexity, which plays a central role in modern computer science. Practical NP-hard problems can be found in machine learning, cloud computing, wireless sensor networks, software, and hardware design and operations research, to name the few. To solve this kind of problems in a reasonable amount of time, the stochastic approach can be applied.

Metaheuristics are the form of the stochastic approaches, and their goal is to find an approximate solution which is good enough (not necessarily the best solution), within the reasonable time (Strumberger, . Recently, metaheuristic algorithms have been utilized in solving a large number of NP-hard problems (Strumberger, Minovic, Tuba, & Bacanin, 2019) . One of the most prominent families of metaheuristics is bio-inspired (natureinspired) algorithms. In general, bio-inspired metaheuristics can be divided into two large distinctive groups. The first group of algorithms is known under the name evolutionary algorithms (EA), while swarm intelligence algorithms represent the second group.

The EA mimics the process of natural selection, which can be defined as the survival of the most fit. The most fit individuals are selected for breeding to produce offspring for the next generation. Therefore, natural selection starts by selecting the most fit individuals from a given population. These individuals breed and produce the offspring which are added to the next generation and which will inherit the beneficial characteristics from the parents. The offspring is assumed to be better than its parents, resulting in a better chance for survival. As the process iterates, in the end, it will result in a generation of the most fit individuals. This logic can be directly applied to a search problem. A set of solutions for a given problem can be observed and the best solutions can be selected.

The most important example of the evolutionary algorithms is a genetic algorithm (GA) (Goldberg, 1989) . The GA was used to solve numerous NP-hard real-life problems in the past, including scheduling and load balancing in the cloud computing (Wang, Liu, Chen, Xu, & Dai, 2014; Zhan, Zhang, Ying-Lin, Gong, & Zhang, 2014) , designing convolutional neural networks (Baldominos, Saez, & Isasi, 2018; Suganuma, Shirakawa, & Nagao, 2017) , feature selection for machine learning (Kim, Street, & Menczer, 2000; Xue, Zhang, Browne, & Yao, 2015) , image processing (Bochinski, Senst, & Sikora, 2017; Nickolay, Schneider, & Jacob, 1997 ) and so on.

The second group of bio-inspired algorithms, swarm intelligence, was inspired by the social behavior expressed by the group of otherwise simple and primitive individuals: bees, ants, moths, fireflies, dragonflies, bats, fish, etc. These individuals in swarms exhibit coordinated and highly intelligent actions, without any dedicated central unit which will organize and coordinate all other individuals. This characteristic of the swarms was used as an inspiration for swarm intelligence algorithms (Yang, 2014) .

One of the first algorithms which was introduced in the domain of swarm intelligence is the particle swarm optimization algorithm (PSO) (Kennedy & Eberhart, 1995) . The PSO performs the search by simulating the behavior of the flocks of birds and fish. This algorithm was successfully applied in solving numerous practical problems, including scheduling problems in the cloud computing (Kumar & Sharma, 2018) . Another important representative of swarm algorithms with numerous applications for different NP-hard problems is artificial bee colony (ABC). The ABC metaheuristic has been tested against benchmark problems (Bacanin & Tuba, 2012) , and had also been applied to practical NP-hard problems, as it can be seen from Kulkarni, Desai, and Kulkarni (2016) , Tuba and Bacanin (2014) , and Cheng, Qu, and Xu (2017) .

Another well-known swarm algorithm is bat algorithm (BA) (Yang, 2010) , with numerous applications in a wide range of domains, i.e. solving the workflow scheduling problem in the cloud computing (Sagnika, Bilgaiyan, & Mishra, 2018) . Another popular swarm metaheuristics is cuckoo search (CS) (Gandomi, Yang, & Alavi, 2013) , which has also been successfully applied to numerous problems, such as cloud computing (Agarwal & Srivastava, 2018) and neural networks training . Ant colony optimization (ACO) was one of the first swarm algorithms, and it has proven to be one of the most efficient approaches, as stated in Tuba (2011, 2013) . Firefly algorithm (FA), inspired by the behavior of the fireflies and their lighting properties, has been extensively used in solving several NP-hard problems, in modified and hybridized versions (Tuba & Nebojsa, 2014) .

There are also other numerous novel swarm intelligence algorithms, and important representatives are monarch butterfly optimization (MBO) and moth search algorithms (MS). MBO was initially proposed by Wang and Deb in 2015 (Wang, Deb, & Cui, 2015) , and was applied to numerous practical NP-hard problems with promising results including wireless sensor networks localization problem , cloud computing optimization problems ) and many others . The MS algorithm, on the other hand, was proposed in 2016 by Wang (2016) . It was inspired by the behavior of moths, more precisely phototaxis, and Lévy flights of the moths. MS has proven to be one of the best algorithms for global optimization benchmark problems, and also showed promising results for solving some real-life NP-hard problems, such as the problem of the drone placement (Strumberger, Sarac, Markovic, & Bacanin, 2018) and localization problem in the wireless sensor networks . Besides already mentioned algorithms, many other swarm algorithms exist, such as elephant herding optimization (EHO) (Correia, Beko, Cruz, & Tomic, 2018; Strumberger, Bacanin, Beko, Tomic, & Tuba, 2017; Strumberger, Beko, Tuba, Minovic, & Bacanin, 2018; Strumberger, Minovic et al., 2019) , tree growth algorithm (TGA) , brain storm optimization (BSO) (Tuba, Strumberger, Bacanin, Zivkovic, & Tuba, 2018) , and many others .

As it has been already mentioned, the ANFIS has shown some very promising results for prediction model development in a wide spectrum of different domains. In order to achieve good prediction, the training process of ANFIS is crucial. Nevertheless, the quality and precision of the model can be further improved drastically, by optimizing the model parameters. There are numerous optimization methodologies available, however, the most promising approach is the application of the swarm intelligence metaheuristics to reinforce the parameters and outputs of the ANFIS. In Karaboga and Kaya (2019b) , ABC algorithm was applied for ANFIS optimization, in order to estimate the number of foreign visitors coming to Turkey. The same authors proposed training ANFIS with and adaptive and hybrid ABC algorithm, as shown in Karaboga and Kaya (2019c) . Another recent paper (Mir, Kamyab, Lariche, Bemani, & Baghban, 2018) discusses ANFIS paired with PSO with a goal to estimate gas density based on the pressure, temperature, molecular weight, and other important gas parameters. The proposed ANFIS-PSO model was more accurate when compared to other gas prediction models. In the manuscript (Al-qaness et al., 2020) the flower pollination algorithm (FPA) by using the salp swarm algorithm (SSA) to improve the ANFIS was proposed. SSA is applied to improve the FPA flaws, such as getting trapped in the local optima.

Hybridized ANFIS approach was utilized recently in numerous applications in the domain of sustainability. Work presented in Seifi, Ehteram, Singh, and Mosavi (2020) deals with six metaheuristics approaches used to hybridize the artificial neural network (ANN) and ANFIS with a goal to predict the monthly groundwater level. Authors were able to conclude that the approach where ANFIS was hybridized with grasshopper optimization algorithm (GOA), called ANFIS-GOA, showed superior performance and enhanced the ANFIS accuracy drastically. ANFIS-PSO approach was used in Adedeji, Akinlabi, Madushele, and Olatunji (2020) to predict the potential power output of wind turbines. The proposed ANFIS-PSO was compared to the standalone ANFIS, and provided better forecast accuracy, with a cost in a higher computational time. The research presented in Xu, Huang, Li et al. (2020) proposed the ANFIS hybridized with the vibration particle swarm optimization (VPSO). ANFIS-VPSO was then utilized to optimize the reasoning system in the milling process and reduce the energy consumption, while improving the efficiency of the tools. In Yaseen et al. (2019) , authors evaluated three different algorithms, namely PSO, GA and differential evolution (DE), and integrated them with the ANFIS with a goal to predict rainfall time series. The presented results showed that all three hybridized approaches, ANFIS-PSO, ANFIS-GA and ANFIS-DE performed better than the conventional ANFIS. Hybrid ANFIS-PSO and ANFIS-DE were analyzed and compared in Dormishi, Ataei, Khaloo Kakaie, Mikaeil, and Shaffiee Haghshenas (2019) with a goal to predict and optimize the performance of gang saw in the process of cutting the carbonate rocks. The obtained results showed that the ANFIS-PSO performance were more superior than ANFIS-DE and conventional ANFIS. Hybridized ANFIS approach was also analyzed in Bemani, Baghban, and Mosavi (2020) , where authors compared and evaluated ANFIS coupled with five different evolutionary algorithms for predicting the diffusivity coefficient of carbon dioxide. ANFIS was hybridized with PSO, GA, ACO, DE and backpropagation (BP) algorithms. Obtained results showed that the hybrid ANFIS-PSO outperforms all other approaches. Finally, ANFIS-VOA (virus optimization algorithm) approach was utilized in Behnood, Golafshani, and Hosseini (2020) for predicting the COVID-19 infection rate by observing various climate-related variables. Different hybridized ANFIS implementations for various research problems are shown in Table 1 .

Neuro-fuzzy systems are widely used today to model various reallife problems. They have gained popularity among the scientific society because they efficiently combine the advantages of fuzzy logic and artificial neural networks. The fuzzy logic component takes care of the learning abilities, while the artificial neural network component takes the feature interpretation from fuzzy logic. By using these two approaches together, it is possible to eliminate the drawbacks of individual components, and neuro-fuzzy systems have proven to have much more superior features. The ANFIS, which was originally developed by Jang in 1993 (Jang, 1993) , belongs to the group of neuro-fuzzy systems. It is based on the Takagi-Sugeno inference model (Angelov & Filev, 2004; Johansen, Shorten, & Murray-Smith, 2000) , which generates the mappings between the inputs and the outputs by obtaining and applying IF-THEN rules. To achieve this goal, the ANFIS model has to be trained. The error is given by the difference obtained when comparing the output during the training with the actual output of the observed system. Based on the error status, the parameters of the ANFIS model are repeatedly updated to achieve the optimum structure of the model. Fig. 1 shows one example of an ANFIS structure, which consists of two inputs and one output, and five layers in total. In fact, the neural network architecture which is utilized in ANFIS consists of five fixed layers: fuzzification (layer one), fuzzy inference system (layers two and three), defuzzification (layer four), and aggregation (layer five).

On layer one, every node is adaptive with one parametric activation function. Membership functions use values of the inputs to obtain fuzzy clusters. Different membership functions can be utilized to calculate the membership values, where some of the most commonly used functions include generalized bell function, trapezium, triangle, gaussian, and sigmoid. These calculated membership values are within the range of [0, 1]. Parameters , , are used to set the form of the utilized membership function, and they are used in ANFIS training. These parameters are often referred to as antecedent parameters. The output is the membership degree of input values which satisfy the membership functions. For example, generalized bell membership function is given with Eqs. (1) and (2).

On layer two, each node is a fixed node, and output is the product of the input signal. Typically, it applies the fuzzy operation AND. 

On layer three, each node is fixed, and it computes the normalized firing strengths for each rule by utilizing the firing strengths which are the output from level two. Normalized firing strength for the rule is computed as a ratio of firing strength of the rule relative to the sum of all firing strengths, represented in Eq. (4):

On layer four, which is known as a defuzzification layer, each node is adaptive. Here, the output for each rule is calculated by multiplying the normalized firing strength from the previous layer by a first-order polynomial. The set of polynomial's parameters { , , } are known as the conclusion parameters, which are used in the training of the ANFIS model. The output for every rule is computed by using Eq. (5):

On level five, every node is fixed and adds all incoming values. Therefore, the final output of ANFIS is a sum of outputs of each rule from the level four, and it can be computed using Eq. (6):

The training process of ANFIS in practice refers to the optimization of the parameters used in the model. ANFIS parameters include the number of inputs to the system, types, and the number of the membership function utilized in the model, and a total number of rules used in the model. Together with the antecedent and conclusion parameters, this represents a set of parameters that can be optimized. In this paper, an enhanced BAS algorithm has been employed to perform the optimization, however, only antecedent and conclusion parameters have been taken into the account, while a generalized bell membership function has been chosen. Therefore, the type of membership function was not subject to the optimization process.

In this section, the method that was proposed in this research will be described. First, an overview of the original BAS will be outlined. Then, insights into basic BAS's deficiencies will be discussed and the improved BAS algorithm that overcomes those deficiencies will be explained. Lastly, a developed ANFIS-based framework that utilizes enhanced BAS metaheuristics for the training will be shown.

The BAS algorithm is a novel bionic algorithm introduced in 2017 by the Jiang and Li (2017) . The algorithm was inspired by the behavior of longhorn beetles, more precisely by the process of detecting and searching for food. The longhorn beetles have two antennas which they use to detect the food smell concentration. If the higher smell concentration is detected by the left antennae, the longhorn beetle will fly to the left. Similarly, if the smell concentration detected by the right antennae is higher, it will fly to the right. By doing so, the beetle is able to find the food successfully in an unknown environment. The BAS algorithm mimics this process and it can achieve efficient optimization, without prior knowledge about the particular form of the function and its gradient. It also requires only one individual, which has a great impact on lowering the computational complexity of the algorithm. This algorithm can be utilized to enhance the calculation efficiency for the back propagation (BP) algorithm in neural networks and help it find the global optimal solution with a higher probability, by determining the hyperparameters in an intelligent manner.

This relatively novel metaheuristic has already shown some promising results on real-life optimization problems. It was applied in Xu, Huang, and Ma (2020) to improve the BP neural network model to predict the gas explosion pressures. It was also used to solve other optimization problems as well, such as path planning for mobile robots with collision-free capability (Wu, Lin, Jin, Chen, Li, & Chen, 2020 intelligent fault diagnosis of wind turbine rolling bearings (Wang, Yao, Cai, & Zhang, 2020) etc.

The BAS algorithm considers the position of the beetle as a vector at time instant ( = 1, 2, …) and defines the concentration of odor at position by the fitness function ( ). The maximum value of the fitness function ( ) marks the source of the odor. Next, BAS algorithm utilizes two rules inspired by the beetle using antennae to search and explore an unknown environment in a random fashion. First, the searching behavior of beetle in a random direction can be modeled by Eq. (7):

Where, stands for the random function, while represents the dimension of the position. Afterwards, the searching behaviors of the right and left antenna respectively can be modeled by Eqs. (8) and (9):

where and mark the positions located on the right and left side of the searching area, respectively. The sensing range of the antenna is marked with , and it corresponds to the exploit ability, which must be large enough to cover an adequate searching area in order to be capable of jumping out of local minimum points at the beginning and then attenuate as time elapses.

The detecting behavior is formulated by the iterative model, which associates the detection of odor by considering the searching behavior, as described by Eq. (10).

where represents the step size of each iteration, and () is the sign function. The searching parameters, such as antenna length and step size , are updated according to the rules given by Eqs. 

The pseudo-code of the original BAS is presented in Algorithm 1 below.

Similarly as in Wu et al. (2020) and Xie, Chu, Zheng, and Liu (2019) , by running empirical simulations, it can be concluded that some components of the original BAS metaheuristics could be enhanced. The main drawbacks of the basic BAS refer to the premature convergence with implications that the search process may be trapped in local optimums. In some runs, the diversity of the population in early iterations is not on the satisfying level and the whole population, due to the stochastic nature, may converge to sub-optimal solutions. This scenario arises when initial pseudo-random individuals are deployed in parts of the search space that are far from optimum regions. The original BAS algorithm in each iteration performs search around the current solution (Eqs. (8)- (10)). Exploration and exploitation processes, as well as the balance between them, are controlled by antenna lengths ( ) and the step size ( ). Those are also parameters that will be updated in each iteration by using Eqs. (11) and (12) for and , respectively. However, by adjusting the values of these parameters it is very hard to establish an appropriate balance between exploration and exploitation, which is in most cases adjusted in favor of exploitation because the search process is guided by the position of the current solution . The simulations with 500 runs have been performed for standard unconstrained benchmark instances, and it was evident that on average 25% the original BAS could not converge properly and that leads to the unacceptable mean values. Therefore it can clearly be concluded that the original BAS could be improved by establishing stronger exploration in early iterations. At this stage of the algorithm's execution, it is important to have stronger diversification, so the algorithm could find part of the search region, where the optimum solution resides.

To efficiently address observed deficiencies of the basic BAS, two mechanisms have been incorporated:

1. inspired by the approach presented in C., G., H., and T. (2019), Cauchy mutation operator has been adopted in the original BAS to improve solutions diversity and 2. to control whether Cauchy perturbation (mutation) operator will be executed, or not, a mechanism that is similar to one used in the ABC algorithm has been implemented (Karaboga & Akay, 2011) First, the goal is to improve exploration ability and solutions diversity of original BAS in early phases of execution by incorporating Cauchy mutation operator. The Cauchy distribution is utilized to conduct Cauchy variation on solutions that do not converge in consecutive iterations ( is additional control parameter of improved BAS which will be explained later). The basic idea behind this approach is that some solutions may be trapped in the local extreme hence an external intervention is required (Cao, Iosifidis, Chen, & Gabbouj, 2018) , so the search can be redirected towards exploration (global search) (El-Ela, El-Sehiemy, & Abbas, 2018).

For single dimension random variable (0, ) of Cauchy distribution, density function is defined as C. et al. (2019):

where = 1 represents standard Cauchy distribution. In many evolutionary algorithms' implementations, Cauchy and Gaussian mutations operators are used. However, since the peak of Cauchy distribution at the origin is smaller than in Gaussian and the speed of converging towards axis indefinitely at both ends is slower, Cauchy distribution is more efficient in generating random numbers, which can be substantial in avoiding algorithm to fall into sub-optimum domains (C. et al., 2019) . In the proposed improved BAS implementation, for each parameter of solution , a step vector of length (in this case represents number of solutions' parameters) is generated in the following way:

where and denote the maximum and minimum value of parameter , respectively, is the Cauchy mutation probability, while the Cauchy variation expression is represented as (0, 1). New solution in iteration is then generated by using Eq. (15):

The global exploration ability of Cauchy operator is needed in early iterations of the algorithm's run. However, in the later phases, when the search has converged to the optimal domain, this mechanism is not useful anymore. To control this behavior, a new parameter has been added -Cauchy mutation invocation ( ). If the condition ≤ is satisfied, then the Cauchy mutation is triggered, otherwise, the new solution is created as in the original BAS algorithm, according to Eq. (10).

Moreover, in early phases of algorithms execution, Cauchy mutation operator will be applied (Eq. (15)) only to solutions that are not being improved in consecutive iterations. The consecutive iterations stagnation ( ) is another control parameter of the improved BAS algorithm. To incorporate this behavior, each solution in the population is encoded by using attribute -not improved counter ( ). In each iteration, if the solution is not improved by using the standard BAS equation, the is incremented by one. Finally, when conditions = and ≤ are met, Cauchy operator will be triggered. As it can be seen from Eq. (14), only for certain parameters, when condition ≤ is satisfied, Cauchy variation expression is applied. This method allows for greater control over the global search. If this operator would be applied to all parameters, the exploration will be too strong, and the balance will be set in favor of a global search that would generate lower solutions' quality.

Moreover, in order to establish better adjustable balance between exploitation and exploration, attenuation coefficient ( ) has been employed for the step size and the antenna length (sensing diameter) along with the minimum antenna length (sensing diameter length) 0 . Similar approach was performed in Xie et al. (2019) and Wu et al. (2020) . Expression of the original BAS for calculating these two parameters (Eqs. (11) and (12)) are replaced with the following ones:

Motivated by the nature of modifications, the proposed approach is named Cauchy exploration strategy BAS (CESBAS). By introducing Cauchy mutation and three additional control parameters ( , and ) proposed improved CESBAS outperforms basic BAS versions, as it is shown in Section 5. It must be noted here that the optimal values of the parameters which were used in this manuscript were determined empirically, by conducting simulations with trial and error approach.

Pseudo-code of the proposed CESBAS metaheuristics is given in Algorithm 2. In the provided pseudo-code, represents total number of solutions in the population, denotes the fitness function, marks th solution in the th iteration, while and denote representation of the best solution and its fitness, respectively. 

Since the ANFIS parameters have a significant influence on the overall ANFIS system performance, and an optimum combination of parameters' values represent NP-hard optimization problem, swarm algorithms could be applied to improve ANFIS time series forecasting. The goal of the proposed hybrid method is to enhance ANFIS performance by determining its parameters via CESBAS metaheuristics approach. The hybrid approach was named CESBAS-ANFIS.

The process of training ANFIS refers to the optimization of its structure and parameters for a specific problem. The number of inputs and rules, along with type and number of membership functions provide the total number of parameters in the ANFIS structure. In the proposed M. Zivkovic et al. (2 )) + 20 + (1) (0, 0) = 0 −5 ≤ , ≤ 5 f2

Rastrigin ( ) = ∑ =1 [ 2 − 10 cos(2 ) + 10] (0, … , 0) = 0 −5.12 ≤ ≤ 5.12 f3

Sum Squares ( ) = ∑ =1 ( 2 ) (0) = 0 ∈ [−10, 10] f4

Sphere 100] approach, the total number of parameters that should be optimized is represented as the sum of antecedent and conclusion parameters. In devised hybrid CESBAS-ANFIS method, a similar strategy has been used as in Al-qaness et al. (2020) . The proposed method is based on the classic ANFIS model and employs five layers (Fig. 1) . Input variables are provided in Layer 1, while Layer 5 generates foretasted values. The best weights between layers 4 and 5 are determined by the CESBAS approach in the ANFIS training process. At the beginning of execution, the CESBAS-ANFIS prepares input data by formatting it in time series form. As in Al-qaness et al. (2020), for this purpose, autocorrelation function (ACF) has been used, as means to find the patterns in the data. Variables with ACF value greater than 0.2 have been considered. To train and evaluate the model, the train-test-split approach has been used, with the 75% of data set used for training, while the remaining 25% was used for testing. Moreover, the fuzzy c-mean (FCM) method was used for ANFIS model construction. The ANFIS parameters are then trained by the CESBAS metaheuristics. The best solution (ANFIS structure) generated by the CESBAS is then returned to the ANFIS and the test phase is performed with this solution. Each CESBAS solution represents one ANFIS structure. The length of each solution is the sum of antecedent and conclusion parameters. The type of membership function was not considered.

To calculate fitness of each solution (potential ANFIS structure) in the training phase by the CESBAS, the mean square error (MSE) metrics is used:

wherêand represent the predicted and the actual data for each observation, respectively, and the total number of observations is denoted as . The fitness of each solution from the population ( ), is then calculated by utilizing the following expression:

It should be noted here that only generalized bell function has been considered as membership function in the conducted simulations since as it was mentioned earlier the membership function was not considered as a variable of the optimization process. Flow chart diagram of the proposed CESBAS-ANFIS is shown in Fig. 2 .

The experimental section is divided into two parts. In the first part, results obtained on standard tests for unconstrained (boundconstrained) benchmarks have been shown and analyzed with the goal of validating proposed CESBAS on a wider range of benchmark instances. Since the original BAS was also tested on these benchmarks, a comparative analysis with the original BAS has been performed, as well as with one other enhanced BAS implementation and one improved PSO approach, for which the results were retrieved from the modern literature (Xie et al., 2019) .

In the second part of the simulation section, results for predicting COVID-19 cases will be shown on one practical study and a comparative analysis is performed with other approaches that were tested on the same datasets and in the same experimental environment (Al-qaness et al., 2020). BAS and CESBAS approaches have been implemented in the Python environment. For testing ANFIS-BAS and ANFIS-CESBAS hybrid methods, anfis 0.3.1 module for Python has been utilized. Moreover, for the purpose of results' visualization, data science Python libraries have been utilized: scipy, pandas, pyplot and seaborn. Since each fitness function evaluation requires training and testing ANFIS with the available dataset, it utilizes a lot of computational resources. Therefore all simulations were performed on computer platform with 6 NVIDIA GTX 1080 GPUs with Intel ® CoreTM i7-8700K CPU and 32 GB of RAM running under Windows 10 x64 operating system.

Before validating CESBAS on the practical problem of COVID-19 outbreak prediction, experiments have been conducted on six wellknown unconstrained benchmark instances with 20 dimensions. Formulations of benchmark functions (dataset) that were utilized in the simulations are given in Table 2 .

Since BAS belongs to the group of relatively novel metaheuristics, in the literature survey of recent computer science literacy, authors have identified only one paper that provides simulation results of original BAS on standard unconstrained benchmarks (Xie et al., 2019) , therefore for comparative analysis purposes the same benchmarks have been used that were utilized in this paper. Moreover, for the sake of objective comparative analysis, simulations were performed under the same conditions as in Xie et al. (2019) . In this manuscript, an improved BAS was proposed and both original and improved versions were tested in 50 independent runs with 200 iterations per run and with only one solution in the population. Improved BAS was compared with the original BAS, as well as with linear decreasing weight PSO (LDWPSO). The LDWPSO algorithm was tested with 3 solutions in the populations since in the original and improved BAS three functions evaluations are performed in each iteration for each solution (left and right antenna and centroid). For more details regarding specific parameters' setup of original BAS, improved BAS, and LDWPSO, please refer to Xie et al. (2019) .

Specific parameters' of the CESBAS were set as follows: and 0 were set to 0.95 and 0.01, respectively, was set to 0.5, to 5, while the value for was adjusted to 80. The values of all parameters were determined empirically by conducting simulations. It must be pointed out that the same values for and 0 were used as in the original BAS (Eqs. (11) and (12)) (Jiang & Li, 2017) . Simulation environment parameters of CESBAS approach for unconstrained benchmarks are summarized in Table 3 .

In order to better present the search history and how the CESBAS algorithm performs the search, 2D Gaussian KDE (Kernel density estimation) and Surface plot of the Gaussian 2D KDE have been generated for all six unconstrained benchmark functions after 100 iterations. 2D Gaussian KDE are given in Fig. 3 , while the Surface plot is shown in Fig. 4 The results for quality and convergence speed in terms of the number of iterations were taken as criteria for comparison with mean and standard deviation metrics calculated over 50 independent runs. Results for basic BAS improved BAS and LDWPSO were retrieved M. Zivkovic et al. algorithm is improved BAS, which was proposed in Xie et al. (2019) .

In results quality comparison, four out of six tests, CESBAS outperforms improved BAS. For relatively easy benchmarks, 3 and 4, both algorithms in all runs managed to obtain global optimum. Moreover, in 1, 2 and 5 instances CESBAS also managed to establish better standard deviation of results than improved BAS, while in 3, 4 and 6 tests, metaheuristics showed the same performance.

As noted before, in Table 4 , convergence time was shown in terms of the number of iterations that the algorithms took to converge to the best solution in each run. In the case of simpler functions 3 and 4, improved BAS exhibits better convergences speed than the proposed CESBAS for both indicators, mean and standard deviation. However, for all remaining functions, CESBAS managed to outperform the improved BAS. The LDWPSO was proven to be the worst approach achieve the right part of the search space which led to worse mean values. Even for simple 3 and 4 benchmarks, BAS failed to achieve optimum. The same can be stated for convergence speed. The proposed improved CESBAS managed to significantly improve results quality as well as convergence speed of the original BAS algorithm and obtained a significantly better results in all tests for all criteria and performance metrics. Visual comparative analysis of results quality between CESBAS and the basic BAS is given in Figs. 5-7. Swarm plot diagrams of the best obtained results for 50 independent runs is shown in Fig. 5 . Each point in the diagram represents the result of one run. Similarly, in Figs. 6 and 7, box plot diagrams (box and whiskers) and histogram are shown, respectively from the same data set. From the presented figures it is clear that in some runs, basic BAS did not converge to the optimum region, and these results are distributed far away from the median value.

The visual representation and comparative analysis between the proposed CESBAS and the original BAS was performed while taking into account convergence speed criteria and is presented in Fig. 8 . On the given graphs, convergence speed averaged over 50 independent runs is shown. From the given figure it is obvious that CESBAS shows much better convergence than original BAS metaheuristics.

As noted above, in the second part of simulations, CESBAS was validated against an important and current challenge of predicting COVID-19 cases by using the dataset from China. In this subsection, performance metrics that are used for testing the proposed CESBAS-ANFIS method are shown first. Then, the employed dataset and control parameter setup are shown, and finally comparative analysis with other state-of-the-art approaches that were tested on the same dataset and under similar experimental conditions is presented. For more details about the proposed CESBAS-ANFIS hybrid approach, please refer to Section 4.3. 

The quality and performance of the proposed CESBAS-ANFIS approach have been evaluated by utilizing standard metrics for regression: root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared relative error (RMSRE) and coefficient of determination ( 2 ). Mathematical expressions of these metrics are provided in the following few paragraphs.

The RMSE is calculated as given in Eq. (20).

wherêand stand for the predicted and actual values, respectively, and the data sample size (number of observation) is given with the parameter . The MAE, MAPE, RMSRE and ( 2 ) indicators are calculated by Eq. (21), Eq. (22), Eq. (23) and (24), respectively:

where denotes the average value of . Smaller values of RMSE, RMSRE, MAE, and MAPE indicate better performance of the proposed approach, while the higher value of 2 indicates a better correlation and thus better results' quality.

To test the performance of the proposed method, certain time period of COVID-19 dataset from China was used. Moreover, the performance of the proposed ANFIS-CESBAS was also compared with the hybrid between flower pollination algorithm and salp swarm algorithm (FPASSA) which was originally tested for the same problem (Al-qaness et al., 2020) . In this paper, the FPASSA was also used for updating AN-FIS parameters (FPASSA-ANFIS). Additionally, a comparative analysis was conducted with other state-of-the-art swarm algorithms used for updating ANFIS parameters as well as with the original BAS algorithm.

For the sake of performing all of the aforementioned comparisons, in this experiment, the same dataset was taken and simulated in a similar experimental conditions as in Al-qaness et al. (2020) . It can be noted that a larger dataset could have been taken, however, in this scenario it would not be possible to compare the performance of the proposed method with other algorithms, due to the fact that only a few methods, which results are published in the state-of-the-art journals, were implemented and evaluated for COVID-19 cases prediction. Since the implementation of the original BAS algorithm for this problem does not exist in the literature, to evaluate improvements of the CESBAS over original BAS for training ANFIS parameters, during this research, BAS-ANFIS has also been implemented. The dataset that was employed in experiments was retrieved from the World Health Organization (WHO) by merging reports of daily confirmed cases in China from January 21, 2020, till February 18, 2020. The data was captured from the following URL: www.who.int/ emergencies/diseases/novel-coronavirus-2019/situation-reports/. This dataset is shown in Table 5 , while its visual representation is provided in Fig. 9 . Also, as in Al-qaness et al. (2020), 25% of the data was utilized for testing and the remaining 75% was used for training. In order to allow readers to see more clearly how the proposed CESBAS algorithm works, and to see the discrepancies between predicted and actual cases, the results which CESBAS obtained are shown along with the actual confirmed cases in Table 6 . It can be noticed from Table 6 that the proposed algorithm was not able to predict the large surge of the new cases between 16.2.2020 and 17.2.2020, leading to a slightly worse metrics. This is expected behavior, as in any machine learning algorithm for predicting the time series there is an error due to external unpredictable factors. The accuracy of the prediction depends on two factors, reducible and irreducible error, as shown in Eq. (25):

where ( −̂) 2 denotes the average of the squared distance between the actual and predicted value of Y, [ ( ) −̂( )] 2 represents the reducible error, while ( ) denotes the irreducible error. Irreducible error cannot be reduced, no matter how well the prediction is performed (James, Witten, Hastie, & Tibshirani, 2014) . The reason for this is that the prediction depends on some unmeasured variables which are useful, but unknown.

In order to establish a better analysis of CESBAS-ANFIS performance, additional simulations have been conducted by using two datasets of confirmed influenza cases on a weekly basis, as in Al-qaness et al. (2020) . The data for the first dataset (influenza dataset 1 -IDS1) was retrieved from the Center for Disease Control and Prevention (CDS) and this data refers to the time period between fourteenth week in 2015 and the sixth week in 2020 (Center for Disease Control and Prevention (CDS), 2020). The second dataset (influenza dataset 2 -IDS2), which was captured from the WHO website, and it comprises the data of confirmed influenza cases in China from week one in 2016 until the week 8 in 2020 (World Health Organization (WHO), 2020). Table 3 . Algorithms are executed in 30 independent runs and best values are noted in the comparative analysis table.

In both experiments, with COVID-19 and influenza datasets, previous time series have been considered as independent variables, while the prediction of COVID-19 and influenza new cases are considered as dependent variables. Time-series dataset was prepared in the following way: time-series data is categorized into four inputs for the last four consequently even days' of confirmed cases that are used for predicting , as the next day's confirmed case. There is no prediction horizon, as in every following iteration the algorithm takes the predicted values for calculating the next value, and so on. This methodology is visualized in Fig. 10 .

In Al-qaness et al. (2020), besides proposed FPASA-ANFIS, results of other bio-inspired algorithms that were also used for updating ANFIS parameters were shown: ANFIS-GA, ANFIS-PSO, ANFIS-GA and ANFIS-FPA. Additionally, standard machine learning algorithms and methods were also included in the analysis including: artificial neural network (ANN), K nearest neighborhoods (KNN), support vector regression (SVR) and bare bones ANFIS. All these approaches were included in comparative analysis along with the original BAS. Comparative analysis is given in Table 7 . In the presented table, the results of the best run are recorded for each method along with the performance metrics. Comparative analysis also includes computation time, however, this metrics cannot be objectively compared since the approaches in this paper have been tested on different computation platform than algorithms shown in Al-qaness et al. (2020) . It can be noted that authors in Al-qaness et al. (2020) have not provided details of the computation platform that was used in simulations.

In provided comparative analysis (Table 7) best obtained results for each performance metric were marked in bold style. Established results categorically prove that the proposed CESBAS-ANFIS substantially outperforms all other approaches included in the comparative analysis by establishing the best results for all performance indicators that were taken into consideration. Hybrid FPASSA-ANFIS approach shows relatively good performance, as the second-best method included in the analysis, however still significantly lower than the proposed CESBAS-ANFIS. For example, the FPASSA-ANFIS manages to obtain 2 score of 0.9645, while CESBAS-ANFIS achieves 2 of 0.9763, which is relatively high. Also, from the presented results it can be seen that the basic BAS (BAS-ANFIS) shows relatively modest performance when compared to other metaheuristics and performs alike ABC. Both, BAS-ANFIS and ABC-ANFIS for all metrics obtain similar results, with the slight advantage of the BAS metaheuristics. Considering ''pure'' machine learning approaches (ANN, KNN and SVR), BAS-ANFIS performs better. It is also interesting to notice that the bare bones ANFIS obtains better results' quality than ABC-ANFIS and BAS-ANFIS. manages to predict total cases in China with significantly greater accuracy than BAS-ANFIS.

Since both approaches were executed in 30 independent runs, with the goal of more detailed comparative analysis, results have been ranked, where the run with the rank 1 obtains best results, and run with the rank 30 obtains the worst result. Based on this data, a visual comparative analysis has been generated between CESBAS-ANFIS and BAS-ANFIS for RMSE and MAE indicators by using bar charts. This comparison is given in Fig. 12 . Moreover, to perform detailed comparative analysis and to see results' quality distribution of 30 runs, swarm plot diagrams have also been generated of the best results obtained in each run for RMSE and MAE metrics. This analysis is provided in Fig. 13 .

From the presented visual swarm plot comparative analysis it can be clearly seen that in some runs, BAS-ANFIS (labeled with b) in the figure misses the right part of the search space those results can be considered as outliers. In this example, in five runs, the BAS-ANFIS completely underscores and generates results with low quality (high RMSE and MAE values) . This behavior is a consequence of the bad exploitationexploration trade-off. However, CESBAS-ANFIS in all runs manages to generate satisfying results and there is no single run, where it misses the right part of the search space. In conclusion, it is stated that also in this practical example can be seen that the CESBAS overcomes drawbacks of the original BAS algorithm. January 21, 2020 -February 18, 2020 and November 10, 2020 -December 10, 2020 . When analyzing COVID-19 data that is available on the Internet, one more reliable source has been found on the web site Our World in Data (URL: https://ourworldindata. org/coronavirus-source-data). It can be observed that there are slight discrepancies between data from this web site and the data provided by the WHO reports for the observed period of time in China. In the WHO data, there has been a substantial increase in the number of new cases between February 12, 2020 and February 13, 2020 (from 44k to 59k). However, according to Our World in Data, even larger new cases increase happened between February 16, 2020 and February 17, 2020 (from 51k to 70k). This could potentially have an influence on the accuracy of prediction hence the authors decided to test CESBAS-ANFIS and BAS-ANFIS by using this dataset as well. This approach was taken to evaluate the robustness of CESBAS-ANFIS and BAS-ANFIS frameworks. This method can also reveal if significant changes in performance metric values would be seen. This dataset is visually represented in Fig. 14 Experiments have been performed under the same conditions as that was used for the WHO data and the results are presented for CESBAS-ANFIS and BAS-ANFIS in Table 8 .

Results presented in Tables 7 (data from the WHO) and Table 8 (data from the website Our World in Data) are just slightly different, which is excepted. Therefore, it can be concluded that both methods, CESBAS-ANFIS and BAS-ANFIS are not susceptible to changes in datasets. Visual representation of obtained results for CESBAS-ANFIS and BAS for the dataset retrieved from the Our World in Data are shown in Fig. 15 .

Finally, the authors have taken recent data related to confirmed cases in China from the Our World in Data source, from the period of previous month (from October 10, 2020 till November 9, 2020) and trained proposed CESBAS-ANFIS model. The goal was to try to predict number of potential cases in the following thirty days time period (from November 10, 2020 until December 10, 2020). Results of the prediction are shown in Table 9 .

In order to further evaluate the proposed CESBAS-ANFIS method, additional experiment was conducted by including the climate-related variables. This experiment was conducted by utilizing the same experimental setup as given in Behnood et al. (2020) . That study utilized various climate factors to predict the speed of the COVID19 spread in the USA, while data was obtained from various sources. The observed factors included the average temperature, minimum temperature, maximum temperature, precipitation, humidity, wind speed and population density. Finally, hybridized ANFIS-VOA approach was utilized to predict the infection rate based on the aforementioned inputs. More details can be found in Behnood et al. (2020) . The results of the conducted simulations are shown in the linear regression, stand-alone ANFIS, ANFIS-VOA-I and ANFIS-VOA-II were obtained from the referenced paper (Behnood et al., 2020) .

From the presented results, it can be seen that ANFIS-BAS slightly outperforms the ANFIS-VOA-I approach, while it is still behind the ANFIS-VOA-II method. The proposed ANFIS-CESBAS method, however, manages to slightly outperform ANFIS-VOA-II method, and all other evaluated methods. The infection rate trends with the changing of the most important climate input variables are given in Fig. 16 . It can be observed that infection rate rises drastically with the increase of the population density, which could justify the need for the social distancing. It is also notable that the infection rate drops with the increase of the average temperature, while it also slightly drops with the increase of the wind. Finally, infection rate shows trends of increasing with the increase of the humidity.

Finally, the comparative analysis has been performed with approaches presented in Al-qaness et al. (2020) for confirmed influenza datasets. The description of this dataset, as well as control parameters used in simulations, are given in Section 5.2.2.

Based on the results presented in Table 11 , it can be stated that when average results are taken into account, the proposed CESBAS-ANFIS establishes a better performance than all other approaches that have been included in the analysis. In simulations with IDS1 dataset, only FPASSA-ANFIS managed to obtain better MAPE value than CESBAS-ANFIS, while both approaches perform the same in terms of 2 metric comparison. However, the second best approach in simulations with IDS2 dataset proved to be PSO-ANFIS that managed to outperform CESBAS-ANFIS in MAE and RMSRE performance indicators. For other metrics, CESBAS-ANFIS showed better results.

Similarly, as in the case of COVID-19 prediction simulations, original BAS exhibited similar performance as the ABC. Also, as in the previous test, the proposed CESBAS managed to completely outperform original BAS by establishing a better balance between intensification and diversification.

In this manuscript, a novel method has been proposed to predict new COVID-19 cases by employing hybridized algorithm between machine learning, adaptive neuro-fuzzy inference system (ANFIS) and enhanced beetle antennae search (BAS) swarm intelligence metaheuristics. Since one of the greatest challenges in any machine learning approach is parameters' optimization and adjustments for a specific practical problem, enhanced BAS algorithm was utilized for solving this task. The proposed Cauchy exploration strategy BAS (CESBAS) was tested on a standard set of unconstrained benchmarks and proved to be a robust metaheuristics that significantly outscored all other approaches including the original BAS.

Additionally, CESBAS algorithm has been incorporated for updating ANFIS parameters and it was tested on a practical COVID-19 new cases prediction. The proposed method has been tested on the COVID-19 case study because it is currently the most important challenge the entire humanity is facing. However, the method can be generalized and applied to predict any time-series. Simulation results and comparative analysis showed that the proposed hybrid method has outperformed other sophisticated approaches that were tested on the same datasets and proved to be an useful tool for time-series prediction. The primary contribution of this paper is reflected in the fact that the prediction accuracy has been enhanced for the number of new confirmed disease cases on the COVID-19 case study. The ongoing COVID-19 outbreak showed complex nature, and the promising results from this research can provide an alternative disease outbreak modeling approach, which can be used by the authorities to decide what measures should be taken, and when to implement them. The prediction accuracy of the proposed model also suggests that, in case of a disease outbreak, machine learning models can be used together with traditional epidemiological models to predict the number of new confirmed cases. Proposed method can easily be applied to any time-series prediction.

The secondary contribution of this research is the enhancement of the original BAS algorithm. Moreover, proposed CESBAS proved to be a very efficient metaheuristics that can be adapted for solving other real-life NP hard challenges. The challenge with the proposed approach is presented in the testing process. That is because every change in control parameters of the utilized metaheuristics will require a new set of simulation runs. Keeping in mind that the training of ANFIS is extremely resource intensive and time consuming process, it would be necessary to work with graphical processing units and Cuda platform to provide the interested researchers with timely results. In this version of the algorithm, everything was done locally, offline. Datasets were retrieved into the csv files, and csv files were then loaded to the python environment by utilizing the pandas library. As part of the future work it is possible to modify the current solution to be online, by utilizing a RESTful web service which would expose the appropriate endpoints, making the service available to other researchers.

It is recommended as part of future research in this domain to encompass additional modifications and improvements of the original BAS algorithm. Future work can focus on hybridizing BAS with other machine learning methods for classification, as well as for regression. Additionally, it is also possible to adapt basic and enhanced BAS versions for solving various NP hard challenges (WSN localization and energy consumption problem, cloud computing scheduling, portfolio optimization, etc.), since this metaheuristics showed a promising potential in this domain.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Wind turbine power output very short-term forecast: A comparative study of data clustering techniques in a PSO-ANFIS model

A cuckoo search algorithm-based task scheduling in cloud computing

Advances in computer and computational sciences

Optimization method for forecasting confirmed cases of COVID-19 in China

An approach to online identification of Takagi-Sugeno fuzzy models

Covid-19 outbreak prediction with machine learning

Artificial bee colony (ABC) algorithm for constrained optimization improved with genetic operators

Firefly algorithm for cardinality constrained meanvariance portfolio optimization problem with entropy diversity constraint

Developing an ANFIS-based swarm concept model for estimating the relative viscosity of nanofluids

Evolutionary convolutional neural networks: An application to handwriting recognition

Determinants of the infection rate of the COVID-19 in the US using ANFIS and virus optimization algorithm (VOA)

Estimating CO2-brine diffusivity using hybrid models of ANFIS and evolutionary algorithms

Smart sustainable cities of the future: An extensive interdisciplinary literature review

Hyper-parameter optimization for convolutional neural network committees based on evolutionary algorithms

Hierarchical resource scheduling method using improved cuckoo search algorithm for internet of things. Peer-to-Peer Networking and Applications

Generalized multi-view embedding for visual recognition and cross-modal retrieval

Center for Disease Control and Prevention (CDS) (2020). Influenza dataset

A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster

Artificial bee colony algorithm-based multiple-source localization method for wireless sensor network

The species severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2

Elephant herding optimization for energy-based localization

Performance evaluation of gang saw using hybrid ANFIS-DE and hybrid ANFIS-PSO algorithms

Optimal placement and sizing of distributed generation and capacitor banks in distribution systems using water cycle algorithm

Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems

Genetic algorithms in search, optimization and machine learning

Application of improved ANFIS approaches to estimate bearing capacity of piles

Predicting the impacts of epidemic outbreaks on global supply chains: A simulation-based analysis on the coronavirus outbreak (COVID-19/SARS-CoV-2) case

Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability

An introduction to statistical learning: With applications in R

ANFIS: adaptive-network-based fuzzy inference system

BAS: beetle antennae search algorithm for optimization problems

On the interpretation and identification of dynamic Takagi-Sugeno fuzzy models

An ant colony optimization algorithm with improved pheromone correction strategy for the minimum weight vertex cover problem

Ant colony optimization algorithm with pheromone correction strategy for the minimum connected dominating set problem

A modified artificial bee colony (ABC) algorithm for constrained optimization problems

Adaptive network based fuzzy inference system (ANFIS) training approaches: a comprehensive survey

Estimation of number of foreign visitors with ANFIS by using ABC algorithm

Training ANFIS by using an adaptive and hybrid artificial bee colony algorithm (aABC) for the identification of nonlinear static systems

Particle swarm optimization

ANFIS-Based optimal control of hepatitis C virus epidemic

Feature selection in unsupervised learning via evolutionary search

Multistage localization in wireless sensor networks using artificial bee colony algorithm

PSO-COGENT: Cost and energy efficient scheduling in cloud environment with deadline constraint

A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using internet searches, news alerts, and estimates from mechanistic models

Antivirus-built environment: Lessons learned from Covid-19 pandemic

Applying ANFIS-PSO algorithm as a novel accurate approach for prediction of gas density

Seasonal behavior and forecasting trends of tuberculosis incidence in Holy Kerbala

The 1918 influenza pandemic: lessons for 2009 and the future

Application of ANFIS to predict crop yield based on different energy inputs

Parameter optimisation of an image processing system using evolutionary algorithms

COVID-19 pandemic prediction for Hungary; a hybrid machine learning approach

Inter-outbreak stability reflects the size of the susceptible pool and forecasts magnitudes of seasonal epidemics

Workflow scheduling in cloud computing environment using bat algorithm

On the predictability of infectious disease outbreaks

Modeling and uncertainty analysis of groundwater level using six evolutionary optimization algorithms hybridized with ANFIS

Towards sustainable smart cities: A review of trends, architectures, components, and open challenges in smart cities

World health organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19)

COVID-19 pandemic: perspectives on an unfolding crisis

Static drone placement by elephant herding optimization algorithm

Enhanced firefly algorithm for constrained numerical optimization, IEEE congress on evolutionary computation

Resource scheduling in cloud computing based on a hybridized whale optimization algorithm

Elephant herding optimization algorithm for wireless sensor network localization problem

Performance of elephant herding optimization and tree growth algorithm adapted for node localization in wireless sensor networks

Moth search algorithm for drone placement problem

Modified monarch butterfly optimization algorithm for RFID network planning

Monarch butterfly optimization algorithm for localization in wireless sensor networks

Wireless sensor network localization problem by hybridized moth search algorithm

Modified and hybridized monarch butterfly algorithms for multi-objective optimization

Cloudlet scheduling by hybridized monarch butterfly optimization algorithm

Dynamic tree growth algorithm for load scheduling in cloud environments

Dynamic search tree growth algorithm for global optimization

A genetic programming approach to designing convolutional neural network architectures

The efficacy of social distance and ventilation effectiveness in preventing COVID-19 transmission

Machine learning model estimating number of COVID-19 infection cases over coming 24 days in every province of South Korea (XGBoost and MultiOutputRegressor)

Cuckoo search and bat algorithm applied to training feed-forward neural networks

Artificial bee colony algorithm hybridized with firefly metaheuristic for cardinality constrained mean-variance portfolio problem

Multilevel image thresholding by fireworks algorithm

Improved seeker optimization algorithm hybridized with firefly algorithm for constrained optimization problems

Bare bones fireworks algorithm for capacitated p-median problem

Cooperative clustering algorithm based on brain storm optimization and K-means

Tuberculosis disease diagnosis by using adaptive neuro fuzzy inference system and rough sets

Forecasting measles cases in ethiopia using neuro-fuzzy systems

Moth search algorithm: a bio-inspired metaheuristic algorithm for global optimization problems

Monarch butterfly optimization. Neural Computing and Applications

Load balancing task scheduling based on genetic algorithm in cloud computing

Mahalanobis semi-supervised mapping and beetle antennae search based support vector machine for wind turbine rolling bearings fault diagnosis

Coronavirus disease 2019 (COVID-19): situation report, 72. World Health Organization

World Health Organization (WHO) (2020)

A new fallback beetle antennae search algorithm for path planning of mobile robots with collision-free capability

Ship predictive collision avoidance method based on an improved beetle antennae search algorithm. Ocean Engineering, 192, Article 106542

A novel intelligent reasoning system to estimate energy consumption and optimize cutting parameters toward sustainable machining

A beetle antennae search improved BP neural network model for predicting multi-factor-based gas explosion pressures

A survey on evolutionary computation approaches to feature selection

Transmission cycle of SARS-CoV and SARS-CoV-2

A new metaheuristic bat-inspired algorithm

Swarm intelligence based algorithms: a critical analysis

Novel hybrid data-intelligence model for forecasting monthly rainfall with uncertainty analysis

Real-time forecasting of hand-foot-and-mouth disease outbreaks using the integrating compartment model and assimilation filtering

Load balance aware genetic algorithm for task scheduling in cloud computing

Conditioning optimization of extreme learning machine by multitask beetle antennae swarm algorithm

The paper is supported by the Ministry of Education, Science and Technological Development of Republic of Serbia, Grant No. III-44006.