key: cord-0772549-9wnqr5ex authors: Gaurav, Akshat; Gupta, Brij B.; Panigrahi, Prabin Kumar title: A Novel Approach for DDoS Attacks Detection in COVID-19 Scenario for Small Entrepreneurs date: 2022-02-03 journal: Technol Forecast Soc Change DOI: 10.1016/j.techfore.2022.121554 sha: ed2c3ee8eaccbbb629eef4f447fb7eae9ed00835 doc_id: 772549 cord_uid: 9wnqr5ex The current COVID-19 issue has altered the way of doing business. Now that most customers prefer to do business online, many companies are shifting their business models, which attracts cyber attackers to launch several kinds of cyberattacks against commercial companies simultaneously. The most common and lethal DDoS attack of them disables the victim’s online resources. While large businesses can afford defensive measures against DDoS assaults, the situation is different for new entrepreneurs. Their lack of security resources restricts their ability to ward off DDoS attacks. Here, we aim to highlight the problems that prospective entrepreneurs should be aware of before joining the business, followed by a filtering mechanism that efficiently identifies DDoS assaults in the COVID-19 scenario, which is the subject of our research. The suggested approach employs statistical and machine learning techniques to discriminate between DDoS attack data and regular communication. Our suggested framework is cost-effective and identifies DDoS attack traffic with a 92.8% accuracy rate. The European Commission, the EU's executive arm, defines micro, small, and medium-sized entrepreneurs (Maroufkhani et al., 2020; Gupta et al., 2013; Che and Zhang, 2019) . This term applies outside the geographical area covered by the EU's authority (Commission, 1996) . In response to the Council of Industry request, the European Commission suggested in 1992 that it restrict the definition of small and mediumsized entrepreneurs the Commission used in its expansion (Berisha and Pula, 2015) . April 1996 saw the first proposal that laid the groundwork for a distinct definition of SMEs. The community and national definitions might result in discreprency (Berisha and Pula, 2015) . Table 1 summarizes the classification of small and medium-sized entrepreneurs according to European Union and World Bank. COVID-19 is one of the world's most serious disasters, having claimed about 4,498,451 lives and displaced millions of people from many nations (COVID-19; Sedik et al., 2021; Alowibdi et al., 2021) . COVID-19 has a significant impact on human life and requires company owners to migrate to an online platform (Pashchenko, 2021; Papadopoulos et al., 2020; Alowibdi et al., 2021; Rahman et al., 2021) . While the majority of large business owners have the knowledge and resources necessary to implement the proposed defense techniques against various cyber-attacks, small and mediumsized entrepreneurs lack the proposed knowledge and tools necessary to protect their online business platform from any type of cyber-attack (Millaire et al., 2017; Carías et al., 2021; Yeniman Yildirim et al., 2011) . DDoS attacks (Bhushan and Gupta, 2019; Dahiya and Gupta, 2021; Chhabra et al., 2013) are the deadliest of all kinds of cyber-attacks since they render the online business platform inaccessible to its consumers, making it impossible for them to do business. Small and medium-sized entrepreneurs suffer economic losses as a result of DDoS attacks, which raises the small and medium-sized companies' overall losses. As represented in Figure 1 , in a DDoS attack, ORCID(s): Table 1 Classification of Small and Medium Entrepreneurs (Berisha andPula, 2015) . European 2 ). India is the most effected country by the DDoS attack (Azure, 2021) . Due to this, small commercial businesses find it very difficult to identify and prevent DDoS attacks due to their limited budgets. Apart from DDoS attacks, small and medium-sized entrepreneurs must contend with the flash crowd. The crowded flash crowd is an instance in which many people simultaneously attempt to visit one specific website, which subsequently hampers the performance of that portal. Thus, if small and medium-sized businesses filter out flash crowd traffic inadvertently, this undermines their reputation and escalates their losses. An online shopping business in Australia recently had to pay a significant amount of refunds because it was unable to manage a large amount of traffic produced by consumers. Because the fundamental features of DDoS attacks and flash crowds are almost identical, it is difficult to distinguish between the two types of scenarios. There are many defensive techniques available against DDoS attacks , but none of them fully resolves the issue. There are many reasons for this, including the following: • Due to the abundance of open-source tools accessible on the internet, anybody may use them to launch an assault. • DDoS attacks almost always include faked IP packets, making it almost impossible to pinpoint the source of the attack. Additionally, the length of an attack has decreased to approximately 4 minutes in recent years. Thus, the affected machine crashes before any protection solution can detect the attack. As a result, obtaining comprehensive information on DDoS attacks is very challenging. • The lack of a standard benchmark for DDoS defense filters in the computer industry makes it impossible to compare defensive products with their counterparts on the market directly. • Numerous new technologies are entering the market, including cloud computing, fog computing, industrial computing systems, and the Internet of Things. Thus, it is a difficult job to enhance traditional defensive techniques in such a manner that they can be used in these situations. The previous subsection explains the details about the small entrepreneurs and the effect of DDoS attacks on them. In this context, we proposed a DDoS detection approach for small entrepreneurs. The main contribution of our proposed approach is as follows: Our proposed approach is reactive; hence, it is more accurate and its response time is less than other recent DDoS detection techniques. Our proposed approach uses statistical and machine learning approaches for the detection of DDoS attacks. Our prospered approach is economical; hence, small entrepreneurs efficiently use it. The rest of the paper is organized as follows: Section 2 reviews the latest work in the field of DDoS and flash crowd detection. Section 3 gives the motivation of our proposed approach. Section 4 explains the components of our proposed approach, and Section 5 represents the simulation results. Finally, Section 6 concludes the paper. Researchers proposed different security techniques (Masud et al., 2020; Nguyen et al., 2021; Gupta et al., 2021b; Rahman et al., 2021; Wang et al., 2020) for the identification and detection of DDoS attacks. In this section, we review some techniques proposed by the researchers for the detection of DDoS attacks. The authors in (Mishra et al., 2021) suggested a low-cost defensive method against DDoS attacks based on differences in entropy between DDoS attacks and normal traffic. Additionally, the authors suggested a method for mitigating the attack's intensity. The suggested approach has three benefits over other current techniques: i) it has a high detection rate, (ii) it has a low false-positive rate, and (iii) it has the capacity to mitigate. The authors ) provide a method for addressing authentication and security problems associated with smart vessels in maritime transport. By authenticating devices in maritime transport and detecting different cyberattacks such as DDoS. The suggested method employs an identity-based technique to authenticate smart vessel access. However, this approach is applicable to maritime transport only. The author (Khan and Quadri, 2020) of this article attempts to outline the issues that a potential entrepreneur should consider before entering the autonomous vehicle business, followed by innovative ideas that may assist in overcoming the essay's hurdles. The presented proposals and guidelines are developed using the cybersecurity principles of confidentiality, integrity, and availability. The authors (Zhou et al., 2021a) present the construction of a secure data sharing method and a cyber-attack detection approach utilising identity-based encryption (IBE) and deep learning algorithms in their study. The suggested system utilises identity-based encryption to handle access control to the VANET's smart cars, guaranteeing that no personally identifiable information is released. Deep learning technology is used to evaluate network abnormalities and block malicious packets. Attacks on a virtual machine monitor (VMM) that regulates the VMs may be identified by examining packet transmission data. Thus, the authors (Shidaganti et al., 2020) suggest a method in this article to halt such identified assaults at their source and analyze the proposed solutions for a few distinct kinds of such attacks. The authors suggest selective cloud egress filtering (SCEF) that includes modules for dealing with identified threats. If an attack is identified, the SCEF notifies the VMM of which VMs are involved, allowing for targeted remediation. In this study, the authors' (Zhang et al., 2020) goal is to provide a broad framework for understanding the features of vulnerabilities in information systems, such as which category a particular vulnerability belongs to, the possible dangers it presents, and the critical indications for resolving it. Additionally, the authors gather data on actual vulnerabilities discovered in companies' information systems through a main vulnerability report site. Four layers of features are extracted: word, phrase, subject, and record. The experimental findings demonstrate that the broad framework assists in characterizing the modes and patterns associated with different kinds of vulnerabilities.Authors in (Cvitić et al., 2021) proposed DDoS detection technique in IoT devices that is based on boosting. Authors (Tewari and Gupta, 2020) proposed RFID tags for mutual authentication in IoT devices. Author (A. Dahiya, 2021) proposed a game theory-based security mechanism during COVID-19 scenario. The authors (Dahiya and Gupta, 2020) suggested a method for mitigating DDoS assaults via a multi-attribute-based auction. A reputation-based detection method has been suggested, in which the marginal utility of a user is used to determine his reputation. Two distinct payment schemes for normal and fraudulent users have been suggested along with the identification method. A greedy resource allocation strategy is used to properly distribute resources among authorised users. Malicious users who manipulate their offer to get the (Bhushan and Gupta, 2018) Low High SDN based (Zhou et al., 2021a) Moderate High IBE based Moderate High IBS signature based (Tewari and Gupta, 2020) Low Moderate RFID tags (Cvitić et al., 2021) Low Moderate Boosting based greatest share of restricted resources are penalised under the differential payment system. The authors (Dahiya and Gupta, 2021 ) offer a DDoS detection method based on Bayesian game theory. The service provider and legitimate users are supposed to monitor the network for an extended period of time and accumulate probabilistic information about whether or not another user is malevolent. The service provider and legal users use this probabilistic information to adjust their behaviour in response to the presence of malicious users on the network. Taking these assumptions and facts into account, the authors offer a Bayesian pricing and auction method for achieving Bayesian Nash Equilibrium points in a variety of situations in which probabilistic knowledge benefits genuine consumers and service providers. Additionally, we offer a reputation evaluation and updating system that takes into account payment and participation characteristics when determining a user's trustworthiness. Apart from the above explained approaches, there are many other techniques and methods that are proposed by the researchers for increasing security and privacy (Zhou et al., 2020 (Zhou et al., , 2019 (Zhou et al., , 2021c . However, those approaches are not efficient in the context of small and medium enterprises. Comparison of some important approaches is represented in Table 2. As a result of the preceding explanation, we can see the potential damage caused by a DDoS attack. Two distinct attack techniques have been used to inflict damage on small entrepreneurs: a bandwidth depletion attack and a resource depletion attack. In a bandwidth depletion assault, the attacker floods the small and medium entrepreneur network's available bandwidth with malicious packets; in a resource depletion attack, the attacker attempts to use all available resources on the small and medium enterprise network. Researchers have spent significant efforts on methods for detecting and mitigating DDoS attacks. As mentioned before, the features of the flash crowd are similar to those of DDoS assaults; however, since the flash crowd traffic is produced by legal users, blocking this traffic may result in economic loss or diminished reputation for small and medium-sized entrepreneurs. Additionally, attackers may attempt to mimic the features of the flash mob in order to avoid detection filters. Numerous methods have been developed to distinguish DDoS attacks from other types of attacks. These approaches identify flash crowd situations through the use of information theory-based methodologies. We discovered that when information theory-based techniques are combined with static methods such as packet scoring, the efficacy of information theory-based methods is enhanced since individual packet analysis is feasible. As a result, we concentrated our efforts on creating a filtering approach that uses information theory and static techniques. We used machine learning techniques to quickly and accurately identify different attack traffic. This article provides a detection technique that uses entropy to find DDoS attacks for small and medium entrepreneurs. The two phases of our proposed method are as follows: the entropy calculation phase and the machine learning stage. The entropy calculation step determines the entropy of incoming packets, and the second phase uses machine learning models to determine whether a packet is malicious or legal based on its entropy value. This section explains the entropy calculation phase of our proposed approach. Definition 1. Suppose there is a discrete random variable X with a range of n possible values; the probability of the occurrence of that discrete random variable is given by Equation 1. (1) Theorem 1. The entropy value,̂ (X), is stationary, which means that it has the same value over two distinct time periods. Proof: The majority of DDoS attackers use tools in order to launch their assaults to target small and medium enterprises. For faking the destination address of attack packets, these tools make use of a preset software that runs in the background. Consequently, we can show that there is a linear function that represents all of the faked addresses. Therefore, the cluster probability is represented as: ( ) = ( ( )); ∈ , 2, ... ( 2 ), ... ( ) , Thus, the cluster entropy associated with DDoS attack traffic may be described by a stable stochastic process. = − ∑ =1 ( ) = − ∑ =1 ( ( )) = ( ) Where ∈ ( 1 ), Above equation reduced as: Thus, we can say that entropy is a concave function Proof: Using Jensen's inequality, we can represent the DDoS attack traffic by a monotonically increasing function. The flash crowd probability distribution, on the other hand, is modeled as a monotonically growing concave function, using Jensen's inequality If 1 = 1 1 , 1 2 ... 1 represents probability of different clusters during flash crow scenario and 2 = 2 1 , 2 2 ... 2 represents cluster entropy. Packets are more randomly distributed during DDoS attack, so we can say that Therefore, from the above equations and the definition of entropy, we can say that 2 ( ) >̂ 1 ( ) where 2 ( ) represents the entropy of DDoS attack and 1 ( ) represents the entropy of flash crowd scenario. This phase consists of computing the entropy of the incoming traffic and organizing the data set for further analysis and usage in the next step. With the aim of determining which machine learning technique is the most successful in differentiating DDoS attack traffic from normal traffic, we will analyze our data set during this phase. We employed six of the most frequently used machine learning techniques when analyzing the data set, which are detailed in the following section. Vapnik et al.(Cortes and Vapnik, 1995; Vapnik, 1995) developed the support vector machine (SVM) (Zhang et al., 2004) , a machine learning approach that is used in regression (Smola and Schölkopf, 2004; Schölkopf et al., 2000) and pattern recognition techniques' (Schölkopf et al., 2000; Burges, 1998 ). An SVM maps the data in the input space to a linear-separable high-dimensional feature space using the kernel mapping method (Scholkopf et al., 1999) . The decision function of an SVM is proportional to the number of SVs and their weights, and it also links kernels chosen a priori, such as Gaussian and polynomial (Scholkopf et al., 1999; Smola et al., 1998) . suppose a and b ∈  represents the input variable and output variable. Then, the input variable is transformed into the output variable by using the linear estimate function. where 'W' is the weight, is the non-linear mapping, and 'c' is constant. Now, we have to find the optimal 'f' with the lowest error. Authors (Cortes and Vapnik, 1995; Vapnik, 1995; Zhang et al., 2004) suggest different ways to find the optimal value of 'f' with the least amount of error. Where 'K' is constant and is small positive number. The last term in Equation 8 is reduces to (Zhang et al., 2004 ) By using the Lagrange multiplier technique above defined two equations reduced to the following equation Where K is SV kernel. A Novel Approach for DDoS Attack Detection in the COVID-19 Scenario for Small and medium-sized Entrepreneurs LR used a linear equation (Equation 11) to classify the data points. We used the sigmoid function (Equation 12) to limit the output (Equation 13) from the linear equation. This method divides the data set repeatedly according to a criterion that optimizes data separation, producing a treelike structure. A decision tree structure is comprised of just two components: the Decision Node and the Leaf Node, as represented in Figure 3 . A decision node has many branches, whereas a leaf node represents the decision's final output. When developing a Decision Tree, the most difficult problem is determining which attributes should be used for the tree's leaf node and decision node. In this case, the information gain method is used to make the selection, which involves partitioning the dataset and selecting the most appropriate choices for decision nodes and leaf nodes. To distinguish between leaf nodes and decision nodes in the information gain method, entropy is utilized as a distinguishing factor (algorithm 1). where E(D) is the entropy of the dataset, 'w' is the weight, and E(f) is the entropy of the feature. Random forest (RF) is a classification and regression technique based on ensemble learning that is particularly well suited for issues requiring data sorting into classes. Breiman and Cutler were the ones that came up with the algorithm (Breiman and Cutler, 2007) . In RF, prediction is accomplished via the use of decision trees. During the training phase, multiple decision trees are built and then utilised for class prediction; this is accomplished by taking into account the voted classes of all individual trees during the training phase (Figure 4) . The class that has the largest margin function (MF) (Liu et al., 2012) is considered the output.The margin function Input: Dataset (D) Output: Decision Tree creation Start Select the root node from dataset D Create a queue containing the root element (A) while All elements of D are not analysised do for Each element in D do Use Information gain to select whether a node is a decision node or leaf node end end End One of the most effective machine learning algorithms is gradient boosting. When we speak about machine learning algorithms, we often consider them to have two basic kinds of errors: bias and variance. As one of the boosting methods, gradient boosting is used to reduce the model's bias error (Natekin and Knoll, 2013) . The gradient boosting technique may be thought of as a mathematical optimization procedure whose objective is to create an additive model with the smallest loss function. In this way, the gradient boosting method continuously increases the number of decision trees that decrease the loss function during each step. The gradient boosting method performs better if the contribution of the new decision tree is reduced at each iterative step using a shrinkage parameter ' ', referred to as the learning rate (Touzani et al., 2018) . The shrinking method in gradient boosting is based on the principle that a greater number of tiny steps results in more accuracy than a smaller number of major activities. algorithm 2 explains this procedure. Algorithm 2: Gradient Boosting (Natekin and Knoll, 2013) Input: Dataset (D) Output: Gradient Boosting Start Set initial estimation̂ as constant Set number of iterations to k Set the loss function ( , ( )) Set shrinking parameter Set base learner model h(x, ) NLP often employs the Multinomial Naive Bayes method for probabilistic learning, which is a strategy for probabilistic learning. For the independent variable, it is necessary to predict the tag using the Bayes theorem. It calculates the likelihood of each title in a sample and provides the label with the greatest chance of being correct. This subsection explains details of the algorithm applied at the gateway router of small and medium enterprises. Because our suggested method is applied to the router, it can rapidly and efficiently detect the DDoS assault. Our suggested method is reactive, in the sense that it is capable of detecting and mitigating attack traffic in real time. The following are the stages in our suggested strategy: • The method we offer examines the incoming traffic for each time frame . • The attribute values of the packets are retrieved and aggregated for each time frame. • Following that, the entropy value for the packets in the time frame is computed. • The entropy value is then input into the trained machine learning model, which predicts whether a DDoS assault or normal traffic causes the incoming data. Algorithm 3 gives the details of our proposed approach, and the attributes used in the algorithm are explained in Table 3 The simulation is conducted out using OMNET++. The attack packets overwhelm the victim with a huge amount of erroneous traffic in this scenario. In this instance, the attacker node produces packets every one second, while the genuine nodes emit packets every five seconds. The whole simulation takes 100 seconds to complete, and during that time, all log data is gathered. Because our suggested method is not protocol-specific, we simulate it using a generic routing protocol. Null values are removed from the dataset during the preparation step, since the dataset contains a large number of them. Figure 5b and Figure 5a illustrate the change in entropy between DDoS attack period and non-attack time. Finally, the dataset has been partitioned into a training and a testing set, allowing for an exhaustive evaluation of the proposed technique on both sets. This subsection compares six different machine learning techniques used to analyze the dataset prepared in the previous subsection. We calculate the following statistical parameters for the comparison. • Precision-It determines the proportion of genuine pack-A Novel Approach for DDoS Attack Detection in the COVID-19 Scenario for Small and medium-sized Entrepreneurs • Recall-It quantifies the proportion of valid packets that are not rejected as a result of the suggested method. • Accuracy-It assesses the adequacy of our suggested strategy. • F-1 Score-It assesses the effectiveness of the suggested strategy. where si True positive, is false positive, ̂ is true negative, and ̂ si false negative. A confusion matrix ( Figure 6 ) is often used to assess the performance of the proposed model. In this matrix, the actual goal values are compared to the predictions produced by the machine learning model. This information provides us with a comprehensive picture of how well our classification model is doing as well as the kind of mistakes it is committing on a consistent basis. We can compute the precision, accuracy, recall, and f-1 score using a confusion matrix. This part analyses several machine learning methods and determines which one is the most suitable for our suggested strategy. First, we compute the accuracy, precision, recall, and f-1 score for each of the six machine learning methods. The value of these statistical characteristics for machine learning methods is shown in Table 4 . To better understand the results, we represent the statistical parameters in the graphical formation in Figure 7 . Figure 7 and Table 4 clearly show that the LR-based technique has the highest recall rate and accuracy. Hence, we can say that the LR-based technique can be used with our proposed approach to identifying DDoS attacks. Following the COVID-19 pandemic, the way people work has changed dramatically. Because the majority of consumers now work online and prefer online sales and purchases, company owners are migrating to online platforms. This new transaction makes it simpler for hackers to steal users' sensitive information or disrupt the web platform's regular operations. Cyberattackers often use a DDoS attack due to its simplicity of deployment and ability to fully consume the target system's resources. The DDoS attack's objective is to bring the victim's system to halt or to deplete its processing capacity. When a flash crowd is present, which is when real people generate large quantities of bandwidth, the DDoS attack becomes more difficult to detect. Given this, identifying DDoS attacks efficiently and accurately has long been a major research challenge. Due to the similarities between DDoS attacks and flash crowd, it is almost difficult to distinguish them. In this context, we present a method in this article that identifies DDoS assaults effectively and distinguishes them from the flash crowd for small and medium-sized entrepreneurs using entropy and machine learning. The dataset was generated using the OMNET++ discrete event simulator and used to train six machine learning algorithms. The accuracy, precision, recall, and f1 score are used to determine the efficacy of machine learning techniques. On the datasets, certain models, such as LR, outperformed others, including DT, SVM, LR, MNB, RF, and GB, in terms of accuracy. We want to do further testing on a variety of data sets in the future. Game Theory for Cyber Security during COVID-19 Pandemic: A Holistic Approach Coronavirus pandemic (covid-19): Emotional toll analysis on twitter Azure ddos protectionâĂŤ2021 q1 and q2 A Novel Approach for DDoS Attack Detection in the COVID-19 Scenario for Small and medium-sized Entrepreneurs ddos attack trends Defining small and medium enterprises: a critical review Detecting ddos attack using software defined network (sdn) in cloud computing environment Distributed denial of service (ddos) attack mitigation in software defined network (sdn)-based cloud computing environment Random forests-classification description. Department of Statistics A tutorial on support vector machines for pattern recognition Cyber resilience self-assessment tool (cr-sat) for smes Contextual determinants of e-entrepreneurship: Opportunities and challenges A novel solution to handle ddos attack in manet Commission recommendation of 3 april 1996 concerning the definition of small and medium-sized enterprises Support-vector networks Coronavirus disease (COVID-19) âĂŞ World Health Organization Boosting-based ddos detection in internet of things systems Multi attribute auction based incentivized solution against ddos attacks A reputation score policy and bayesian game theory based incentivized mechanism for ddos attacks mitigation and cyber defense A comprehensive survey on ddos attacks and recent defense mechanisms. Handbook of Research on Intrusion Detection Systems Identity-based authentication mechanism for secure information sharing in the maritime transport system Blockchain-assisted secure fine-grained searchable encryption for a cloud-based healthcare cyber-physical system The usage and adoption of cloud computing by small and medium businesses Augmenting cybersecurity in autonomous vehicles: Innovative recommendations for aspiring entrepreneurs New machine learning algorithm: Random forest Big data analytics adoption: Determinants and performances among small to medium-sized enterprises A lightweight and robust secure key establishment protocol for internet of medical things in covid-19 patients care What all cyber criminals know: Small & midsize businesses with little or no cybersecurity are ideal targets Defense mechanisms against ddos attack based on entropy in sdn-cloud using pox controller Gradient boosting machines, a tutorial. Frontiers in neurorobotics 7 Secure blockchain enabled cyber-physical systems in healthcare using deep belief network with resnet model The use of digital technologies by small and medium enterprises during covid-19: Implications for theory and practice A multimodal, multimedia point-of-care deep learning framework for covid-19 diagnosis Input space versus feature space in kernel-based methods New support vector algorithms Efficient deep learning approach for augmented detection of coronavirus disease Scef: A model for prevention of ddos attacks from the cloud A tutorial on support vector regression The connection between regularization operators and support vector kernels Secure timestamp-based mutual authentication protocol for iot devices using rfid tags Gradient boosting machine for modeling the energy consumption of commercial buildings The nature of statistical learning. Theory Visual saliency guided complex image retrieval Factors influencing information security management in small-and medium-sized enterprises: A case study from Turkey Wavelet support vector machine A General Framework to Understand Vulnerabilities in Information Systems A finegrained access control and security approach for intelligent vehicular transport in 6g communication system Residual visualization-guided explainable copy-relationship learning for image copy detection in social networks. Knowledge-Based Systems 228 Coverless image steganography using partial-duplicate image retrieval Coverless information hiding based on probability graph learning for secure communication in iot environment Region-level visual consistency verification for large-scale partial-duplicate image search