key: cord-0220514-cb5epsmf authors: Podder, Prajoy; Bharati, Subrato; Mondal, M. Rubaiyat Hossain; Paul, Pinto Kumar; Kose, Utku title: Artificial Neural Network for Cybersecurity: A Comprehensive Review date: 2021-06-20 journal: nan DOI: nan sha: cc38538654c2dc300dbe269f00ee9698699d7891 doc_id: 220514 cord_uid: cb5epsmf Cybersecurity is a very emerging field that protects systems, networks, and data from digital attacks. With the increase in the scale of the Internet and the evolution of cyber attacks, developing novel cybersecurity tools has become important, particularly for Internet of things (IoT) networks. This paper provides a systematic review of the application of deep learning (DL) approaches for cybersecurity. This paper provides a short description of DL methods which is used in cybersecurity, including deep belief networks, generative adversarial networks, recurrent neural networks, and others. Next, we illustrate the differences between shallow learning and DL. Moreover, a discussion is provided on the currently prevailing cyber-attacks in IoT and other networks, and the effectiveness of DL methods to manage these attacks. Besides, this paper describes studies that highlight the DL technique, cybersecurity applications, and the source of datasets. Next, a discussion is provided on the feasibility of DL systems for malware detection and classification, intrusion detection, and other frequent cyber-attacks, including identifying file type, spam, and network traffic. Our review indicates that high classification accuracy of 99.72% is obtained by restricted Boltzmann machine (RBM) when applied to a custom dataset, while long short-term memory (LSTM) achieves an accuracy of 99.80% for KDD Cup 99 dataset. Finally, this article discusses the importance of cybersecurity for reliable and practicable IoT-driven healthcare systems. Cybersecurity is the complete package of all techniques and technologies responsible for defending networks, software, and data from attacks [1, 2] . The mechanism of cyber defense is available at the network, data level, host and application. Some cybersecurity tools like firewalls, the system of intrusion detection, the system of intrusion protection etc., are always active at each end to identify security breaches and stop attacks [3, 4] . Nevertheless, with the increasing number of systems having Internet-connection, the risk of attacks is increasing day by day. With the realization of Internet of things (IoT) networks, cybersecurity is becoming more important than ever. Computer networks including IoT are vulnerable to many security threats. Some attacks are of known pattern can be easily managed. However, attackers are developing zero-day exploits, where the attack takes places as soon as a weakness in the system is detected. Such an attack has no previous record and the attack can damange the computer system before the problem is solved. Moreover, the system must be defended not only from external threats but also need to be protected from insider threats, such as misuse of the authorized access, which can be an individual or mean to be a part of the organization. The main challenge is finding out the compromising system's indicators from the attack's lifecycle, which may have meaningful signs of a future attack. However, this could be a difficult job because of massive quantities of datagenerating continuously from lots of cyber-enabled devices. Data Science uses the extensive range of data made by the cyber defense system, including the security information and event management (SIEM) scheme, sometimes overflowing the specialist in security with the event warnings, identifying patterns, related events, and detecting abnormal behaviour to improve cybersecurity. Hybrid detection in security amalgamates anomaly and misuse detection. This system is mainly used to decrease the rate of false-positive value of anonymous attacks and enhance the rate of detection of recognized intrusions. Maximum DL approaches are hybrid methods [5, 6] .Previous reviews, i.e., those in [7] [8] [9] have illustrated applications of machine learning (ML) for the solution of cyber-related problems, but deep learning (DL) methods have not been focused on those papers. Some works illustrate DL approaches for cybersecurity. These approaches have some limitations in the applications on cybersecurity [10, 11] . This paper reviews cybersecurity using DL. Moreover, DL methods in cybersecurity and the difference between DL and shallow learning are broadly discussed, and the results of different DL methods are reported. The rest of the paper is organized as follows. Section II discusses the differences between DL and ML, Section III introduces different DL methods in the context of cybersecurity. DL and shallow learning are compared in Section IV. The performance results of different DL methods are reported in Section V. Finally, the paper concludes in Section VI. Both ML and DL are subsets of artificial intelligence (AI). The differences between ML and DL include the following: a) Dependencies of data: The performance of DL models are not comparatively better than traditional ML models for small-scale data volumes. The reason behind this is DL models need a large portion of data to comprehend the data flawlessly. On the other hand, traditional ML algorithms use the established rules [14] . b) Hardware dependencies: Graphics Processing Unit (GPU) can be considered essential hardware for training the DL models properly. The GPU is mainly applied to optimize matrix processes effectively since DL models require a lot of matrix operations. On the other hand, traditional ML algorithms do not usually require high-performance machines with GPUs [18] . c) Processing in feature: The procedure of driving domain knowledge into a feature extractor in order to decrease the complexity of data is termed feature processing. Patterns are usually generated in feature processing, and therefore, ML and DL algorithms work better. However, this stage is timeconsuming, and specialized knowledge is required in this case. The performance of most ML models rely on the features accuracy (i.e., pixel values, textures, shapes, locations, etc.) extracted. Attempting to derive high-level features openly from personal data is a main difference between traditional ML and DL algorithms [17] . Accordingly, DL decreases the designing effort to an extracting features for every problem. d) Execution time: Large execution time is needed to train a DL model owing to its having various parameters. The training step also takes longer. On the contrary, less execution time (only seconds to few hours) is needed to train a ML model. Nevertheless, the time required in testing stage is just the contrast. DL models need very short testing time compared with some ML models. This section illustrates different types of DL methods used in cyber security. Deep Belief Networks (DBNs) is brought in a seminal paper by Geoffrey Hinton. DBNs are a class of Deep Neural Networks (DNNs). A DBN is composed of several layers of hidden casual variables. Besides, there are connections exists between the layers and no connections between units within each layer [12] . It is the combination of probability and statistics with ML and neural networks. Figure 1 shows different types of DBN. An unsupervised method is an autoencoder where the input is given as a vector. The network attempts to match and the output is the same as the input vector. One can generate a lower or higher dimensionality illustration of the data by getting the input and varying the recreating the input with its dimensionality. Data encoding operation (i.e., feature compression) is executed in the network with a small dimension of hidden layers. A denoising autoencoder can play an important role in order to eliminate the noise and reconstruct the original input from the noisy input. Figure 2 illustrates a basic autoencoder. A recurrent neural network (RNN), a subset of neural networks, which is connected between nodes and form a directed graph as shown in Figure 3 . This makes the network in its internal state. It permits to show dynamic sequential behavior. They use their internal memory to process arbitrary sequences of input and the signal travels both forward and backward by creating loops in the network [13] [14] [15] . Typically, it is more complex to train RNNs due to the disappearance of the gradients. However, the improvements in architecture and training have formed various RNNs. This model is simpler to train. The long short-term memory (LSTM), an improved system of RNN, was first brought by Hochreiter and Schmidhuber in 1997 [16] . LSTM is making a major change in speech recognition and set a revolutionary record on some traditional models in certain speech applications. It is introduced to solve RNNs short term memory problem. LSTM units connect to the situation in the following time stage. The configuration of the units that accumulate information is called a memory cell. Convolutional neural network (CNN) is a portion of deep NN that processes as well as analyze visual imagery input. If a colored or grayscale image is considered as input, then the image will be stored in pixels like 2D array. In addition, CNNs are also applied for managing audio spectrograms with 2D arrays. However, the model of CNN contains three kinds of layers, including classification layers, pooling layers and convolution layers [15, 17] . An illustration of CNN is shown in Figure 4 . GANs are deployed in unsupervised ML, where 2 neural networks contest against one another in a game of zero-sum to overcome one another. It is introduced by the work of Goodfellow. Figure 5 shows the block diagram of GAN. The generator produces output data using the similar features as real time data by using input data. Then, the discriminator analyze the real data, whether the input is real or fake [18] . There is a wide range of applications in GAN system, including optical flow estimation [98] , caption generation [97] , image enhancement [96] , and DCGAN for Facebook [99] . Recursive neural networks relate a number of weights recursively. It has a number of inputs. At first, the primary 2 inputs are nurtured in the model as one. A node output is then considered as an input for the following node. Many natural language processing and image segmentation use this type of model. This section provides a brief comparison between DL and shallow learning algorithms. DL has multiple layers, as shown in Figure 6 . Besides, in DL, a deep network has several hidden layers, while shallow neural networks typically have 1-hidden layer. The neuron layers are linked with adaptive weights, besides the neighbor network layers are generally staying associated. However, there are two kinds of shallow network architecture: supervised and unsupervised. In supervised learning, the labels remain known to learn a work. Moreover, feature extraction is achieved individually. This forms of DL model derives higher-level features from the raw input with the help of its multiple hidden layers. Figure 7 illustrates a deep neural network. There are several levels between the input layer and output layer; the output layer is considered as higher level, and input layer are considered as lower level. From the lower-level concepts, higher-level concepts are defined. Although feature extraction can be obtained from the few initial layers of DL network. The DL architecture is of three types: unsupervised, hybrid and supervised. Advance feature extraction in shallow neural networks is performed separately because they have only one hidden layer. However, deep networks are capable of learning. However, with great computational power, several GPUs are needed for DL methods and it costs too much time to train DL models [19] . However, DL takes too much time to analyze and extract relevant information from the huge amount of data and the data is not formed properly. Table 1 summarizes various DL methods applied by researchers for malware detection and classification. Most researchers use restricted Boltzmann machine (RBM) method. Table 2 summarizes various DL methods applied for intrusion detection. Most researchers use autoencoder and RNN method. Table 3 summarizes the DL method used in order to detect other type of cyber-attacks. KDD Cup 99 dataset formed for the challenge of KDD in 1999 is one of the most commonly used datasets in order to detect the various type of intrusions. KDD means Knowledge Discovery. About more than four million network traffic records exist in this dataset. Twenty two different types of attacks are contained in this dataset that can be categorized into four families such as denial-of-service (DoS), R2L, for example, predicting the password, U2R, and probing. The other datasets used in various research papers for the classification of various threats have been described in Table 4 with short details. Several performance metrics are depicted in Figure 8 . DL models have shown significant improvements over traditional ML-based solutions, signature-based methods and rule-based methods in order to address cybersecurity problems. Table 5 illustrates the performance results achieved adopting different DL models. The results are reported in terms of precision, false negative rate (FNR), classification accuracy, F1-score, true positive rate (TPR), etc. We have reviewed 85 papers. From the review, it can be seen that most researchers have focused on malware classification and detection of various types of intrusion in the network. Cyberphysical autonomous systems which is not only sensor-based but also communication-enabled (e.g., automotive systems), biometrics behavioral (i.e., signature dynamics) are considered as increasing areas for DL applications of security. As we become more reliant on network-connected devices, we will see an increase in the number of cyber-physical systems and computational systems, each having its own set of attack vectors owing to its unique baseline. For malware and intrusion detection, RBMs were the most often utilized DL technique. RNNs were another popular solution for tackling the largest range of cyber security challenges feasible (i.e., network intrusions, cyber-physical intrusions, malware, host intrusions and names of malicious domain). The large use of RBMs and autoencoders, around 50%, is most likely owing to a scarcity of labeled data, and unlabeled data is pre-trained and fine-tuned using a little quantity of labeled data. RNNs are likely popular because many cyber security jobs or data may be treated as a time series problem. This is beneficial to RNNs. Conclusions on the success of any approach are difficult to make since various studies utilize various datasets and measurements. Certain tendencies, however, are remarkable. The performance of various areas of the security business varied greatly. Domains constructed employing a variety of techniques seem to have the most consistent DGA-produced hazardous domains, with TPRs ranging from 1% to 1.5 % and accuracy values ranging from 0.9959 to 0.9969, equivalent to 96.01 to 99.86%. Network intrusion detection techniques, on the other hand, have a performance range of 92.33 to 100 percent with a TPR of 1.58 to 2.3 percent and an accuracy range of 44 to 99 percent. A high classification accuracy of 99.72% is reported for RBM when applied to a custom dataset [34] , while accuracy of 99.80% is achieved by LSTM for KDD Cup 99 dataset [66] . Historically, the capacity to detect network intrusions has significantly been reliant on the kind and quantity of attacks carried out. Another crucial component influencing overall performance was the training set's relationship between benign and dangerous data. This quandary stems from the difficulties of getting legally harmful materials. Because authentic data might be difficult to get, data is often generated using viral simulations and reverse engineering. The use of any new tool, especially DL tools, is universally frowned upon because they are ultimately black boxes. As a result, when errors occur, determining the cause is impossible, and unlike DL applications such as the marketing sector, larger costs and hazards are associated with cybersecurity missteps. A cybersecurity analyst may waste time analyzing false alarms, or an automated response to intrusion detection may erroneously restrict access to critical services. Furthermore, a DL tool can completely ignore a cyber-attack. Another barrier to adoption is that many of the currently available systems focus on a specific hazard, such as virus detection. Researchers should investigate methods for generalizing or combining multiple DL approaches in order to cover a broader range of attack vectors and provide a more comprehensive solution. Multiple DL detection techniques must be used concurrently, and information gathered by various techniques may also be used to improve local performance. Cybersecurity has become an important issue for IoT since IoT can contribute to managing pandemics, particularly the novel coronavirus disease (COVID-19). One example of the use of IoT for COVID-19 is to mitigate the causative virus from being spreading. This can be done by the screening of temperature, tracing the contacts, and several other ways. Detecting early cases of the infection, tracing, and then isolating the suspected patients can be done with IoT. Note that IoT-driven healthcare systems and IoT-driven COVID-19 diagnosis systems are emerging techniques that can be useful to patients and doctors. Another example is facilitating the new lifestyle during COVID-19, including home-office, distant learning, fitness training at home, etc. These activities enable the running of businesses, educational institutions, government offices without risking the people's health. Another use case of IoT is to resolve machinery issues for controlling medical inventory, tracking tagged nebulizers, oxygen cylinders, and other medical equipment. For tackling a pandemic, IoT can be used along with other techniques such as near field communication, radio frequency identification, WiFi, light fidelity, sensor networks, etc. These technologies require small portable devices that have low computation power and low battery life. As a result, ensuring cybersecurity for small IoT devices is a more challenging task compared to traditional computers, server, smartphones and laptops. Cyber attacks evolve rapidly, so it is difficult to incorporate security measures in IoT devices quickly. Unless the cyber attacks are mitigated, IoT cannot be effectively used in controlling pandemics. Security threats such as phishing, spamming, ransomware, Distributed DoS [137] [138] [139] [140] [141] [142] [143] may affect the reliability of IoT-driven healthcare and COVID-19 diagnosis [132] [133] [134] [135] [136] systems. Hence, understanding the possible security threats and finding appropriate mitigation techniques is essential in the context of IoT and other networking scenarios. This paper focuses on the use of DL in improving the security system. As attacks of malicious against cyber system networks are advancing, the cyber defender needs to be more advanced. Cybersecurity personnel should have the capability to remark and employ original signatures to identify original attacks. DL approaches to cybersecurity applications offer a smart opportunity to identify novel malware variants and attacks of zero-day. In this review, we have described the applications of DL systems to different types of cybersecurity attack types. These attacks are mainly application software, targeted networks, data and host system. Likewise, this paper illustrates that the standard datasets are very important to advancing DL in the cybersecurity domain. The paper aims to draw a complete review of DL methods, the needs of DL in cybersecurity, and to encourage future research of DL in cybersecurity. Finally, this article discusses the use case scenarios of IoT in the context of COVID-19, and highlights the importance of cybersecurity for IoT devices. Currently, he is an Associate Professor in Suleyman Demirel University, Turkey. He has more than 100 publications including articles, authored and edited books, proceedings, and reports. He is also in editorial boards of many scientific journals and serves as one of the editors of the Biomedical and Robotics Healthcare book series by CRC Press. His research interest includes artificial intelligence, machine ethics, artificial intelligence safety, optimization, the chaos theory, distance education, e-learning, computer education, and computer science. A Survey of Data Mining and Machine Learning Methods for Cyber Security Evaluating computer intrusion detection systems: A survey of common practices Virtualization layer security challenges and intrusion detection/prevention systems in cloud computing: A comprehensive review A survey of techniques for internet traffic classification using machine learning Towards an energy-efficient anomaly-based intrusion detection engine for embedded systems An overview of anomaly detection techniques: Existing solutions and latest technological trends An overview of IP flow-based intrusion detection The use of computational intelligence in intrusion detection systems: A review Machine learning techniques applied to cybersecurity Threats and countermeasures of cyber security in direct and remote vehicle communication systems Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security. arXiv 2018 Hierarchical recurrent neural networks for long-term dependencies Sequence to sequence learning with neural networks Long short-term memory Deep convolutional neural networks for LVCSR Generative adversarial nets Deep learning: Methods and applications. Found. Trends Signal Process Network intrusion detection through stacking dilated convolutional autoencoders Droid-sec: Deep learning in android malware detection Android malware characterization and detection using deep learning Malware classification with recurrent networks Deep learning for classification of malware system call sequences Malware detection with deep neural network using process behavior Droiddelver: An android malware detection system using deep belief network based on API call blocks DeepFlow: Deep learning-based malware detection by mining Android application for abnormal usage of sensitive data DeepAM: A heterogeneous deep learning framework for intelligent malware detection Deep neural network based malware detection using two dimensional binary program features A toolkit for detecting and analyzing malicious software Application of Deep Belief Networks for opcode based malware detection Deep android malware detection DL4MD: A deep learning framework for intelligent malware detection Combining restricted Boltzmann machine and one side perceptron for malware detection Hybrid analysis for detection of malware Efficient dynamic malware analysis based on network behavior using deep learning Detector: A robust and scalable approach toward detecting malwareinfected devices Deep learning for secure mobile edge computing Deep learning based cryptographic primitive classification Large-scale malware classification using random projections and neural networks Malware Classification with Deep Convolutional Neural Networks A multi-task learning model for malware classification with useful file access pattern from API call sequence Deep learning for automatic malware signature generation and classification MtNet: A multi-task neural network for dynamic malware classification Adversarial perturbations against deep neural networks for malware classification. arXiv 2016 Botnet detection in the Internet of things using deep learning approaches An intrusion detection model based on deep belief networks Cyberattack detection in mobile cloud computing: A deep learning approach Toward an online anomaly intrusion detection system based on deep learning Intrusion detection using deep belief networks Comparison deep learning method to traditional methods using for network intrusion detection A hybrid malicious code detection method based on deep learning Network intrusion detection for cyber security using unsupervised deep learning approaches Deep and machine learning approaches, for anomaly-based intrusion detection of imbalanced network traffic Kitsune: An ensemble of autoencoders for online network intrusion detection Malware traffic classification using convolutional neural network for representation learning A deep learning approach for intrusion detection using recurrent neural networks Deep Learning Based Artificial Neural Network Approach for Intrusion Detection Deep learning approach for network intrusion detection in software defined networking Deep Learning Based Intrusion Detection System for Internet of Things Deep learning: The frontier for distributed attack detection in Fog-to-Things computing A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks Deep Learning-Based Feature Selection for Intrusion Detection System in Transport Layer Applying long short-term memory recurrent neural networks to intrusion detection Applying recurrent neural network to intrusion detection with hessian free optimization LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems Long Short Term Memory Recurrent Neural Network Classifier for Intrusion Detection An intellectual intrusion detection system model for attacks classification using RNN Distributed attack detection scheme using deep learning approach for Internet of things Leveraging LSTM Networks for Attack Detection in Fog-to-Things Communications Semi-Supervised Deep Neural Network for Network Intrusion Detection Long short term memory recurrent neural network classifier for intrusion detection An effective intrusion detection classifier using long short-term memory with gradient descent optimization A neural network architecture combining gated recurrent unit (GRU) and support vector machine (SVM) for intrusion detection in network traffic data Deep learning for classification of malware system call sequences eXpose: A character-level convolutional neural network with embeddings for detecting malicious urls, file paths and registry keys End-to-end encrypted traffic classification with one-dimensional convolution neural networks Malware traffic classification using convolutional neural network for representation learning An intrusion detection method based on DBN in ad hoc networks Intrusion detection using deep belief network and probabilistic neural network,'' in Proc A signal processing approach for cyber data classification with deep neural networks Deep Packet: A Novel Approach for Encrypted Traffic Classification Using Deep Learning. arXiv 2017 The Applications of Deep Learning on Traffic Identification Apply stacked auto-encoder to spam detection Deep Belief Networks for Spam Filtering. in Tools with Artificial Intelligence Cloud-based cyber-physical intrusion detection for vehicles using Deep Learning Application of recurrent neural networks for user verification based on keystroke dynamics Classification for DGA-Based Malicious Domain Names with Deep Learning Architectures DeepDGA: Adversarially-tuned domain generation and detection Predicting domain generation algorithms with long short-term memory networks Automatic Detection of Malware-Generated Domains with Recurrent Neural Models DGA Botnet Detection Using Supervised Learning Methods Inline DGA detection with deep networks A LSTM based framework for handling multiclass imbalance in DGA botnet detection An Analysis of Recurrent Neural Networks for Botnet Detection Behavior Photo-realistic single image super-resolution using a generative adversarial network. arXiv 2016 Generative adversarial text to image synthesis. arXiv 2016 Learning optical flow with convolutional networks Unsupervised representation learning with deep convolutional generative adversarial networks Intrusion detection in 802.11 networks: Empirical evaluation of threats and a public dataset Improving detection of Wi-Fi impersonation by fully unsupervised deep learning A self-adaptive deep learning-based system for anomaly detection in 5G networks An empirical comparison of botnet detection methods Bambenek Consulting-Master Feeds. Available online DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket Microsoft Malware Classification Generating Test Data for Insider Threat Detectors Bridging the gap: A pragmatic approach to generating insider threat data KDD Cup 99 A detailed analysis of the KDD CUP 99 data set Dissecting android malware: Characterization and evolution DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket Intrusion Detection Evaluation Dataset A Malware Classification Method Based on Basic Block and CNN LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems Data analytics for novel coronavirus disease Hybrid deep learning for detecting lung diseases from X-ray images Applications and Challenges of Cloud Integrated IoMT Effect of fault tolerance in the field of cloud computing Fault tolerance in cloud computing-an algorithmic approach Review on the security threats of internet of things Optimized NASNet for Diagnosis of COVID-19 from Lung CT Images IoT Driven Healthcare Monitoring System. Fog, Edge, and Pervasive Computing in Intelligent IoT Driven Applications Forecasting the Spread of COVID-19 and ICU Requirements Application of Machine Learning for the Diagnosis of COVID-19 Artificial neural network based breast cancer screening: a comprehensive review Machine Learning to Predict COVID-19 and ICU Requirement