Submitted 28 August 2020 Accepted 10 November 2020 Published 14 December 2020 Corresponding author Fawaz Mahiuob Mohammed Mokbal, fawaz@emails.bjut.edu.cn Academic editor Shuihua Wang Additional Information and Declarations can be found on page 17 DOI 10.7717/peerj-cs.328 Copyright 2020 Mokbal et al. Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS Data augmentation-based conditional Wasserstein generative adversarial network-gradient penalty for XSS attack detection system Fawaz Mahiuob Mohammed Mokbal1,2,*, Dan Wang1,*, Xiaoxi Wang3 and Lihua Fu1 1 College of Computer Science, Faculty of Information Technology, Beijing University of Technology, Beijing, China 2 Faculty of Computer Science, ILMA University, Karachi, Pakistan 3 State Grid Management College, Beijing, China * These authors contributed equally to this work. ABSTRACT The rapid growth of the worldwide web and accompanied opportunities of web applications in various aspects of life have attracted the attention of organizations, governments, and individuals. Consequently, web applications have increasingly become the target of cyberattacks. Notably, cross-site scripting (XSS) attacks on web applications are increasing and have become the critical focus of information security experts’ reports. Machine learning (ML) technique has significantly advanced and shown impressive results in the area of cybersecurity. However, XSS training datasets are often limited and significantly unbalanced, which does not meet well- developed ML algorithms’ requirements and potentially limits the detection system efficiency. Furthermore, XSS attacks have multiple payload vectors that execute in different ways, resulting in many real threats passing through the detection system undetected. In this study, we propose a conditional Wasserstein generative adversarial network with a gradient penalty to enhance the XSS detection system in a low- resource data environment. The proposed method integrates a conditional generative adversarial network and Wasserstein generative adversarial network with a gradient penalty to obtain necessary data from directivity, which improves the strength of the security system over unbalance data. The proposed method generates synthetic samples of minority class that have identical distribution as real XSS attack scenarios. The augmented data were used to train a new boosting model and subsequently evaluated the model using a real test dataset. Experiments on two unbalanced XSS attack datasets demonstrate that the proposed model generates valid and reliable samples. Furthermore, the samples were indistinguishable from real XSS data and significantly enhanced the detection of XSS attacks compared with state-of-the-art methods. Subjects Artificial Intelligence, Computer Networks and Communications, Data Mining and Machine Learning, Security and Privacy, World Wide Web and Web Science Keywords Data augmentation, Conditional-Wasserstein generative adversarial net, Imbalance dataset, XSS Attack, Web applications security How to cite this article Mokbal FMM, Wang D, Wang X, Fu L. 2020. Data augmentation-based conditional Wasserstein generative ad- versarial network-gradient penalty for XSS attack detection system. PeerJ Comput. Sci. 6:e328 http://doi.org/10.7717/peerj-cs.328 https://peerj.com/computer-science mailto:fawaz@emails.bjut.edu.cn https://peerj.com/academic-boards/editors/ https://peerj.com/academic-boards/editors/ http://dx.doi.org/10.7717/peerj-cs.328 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://doi.org/10.7717/peerj-cs.328 INTRODUCTION Over the last decade, the worldwide web has grown exponentially, and web applications are increasingly being deployed to provide sustainable and accessible services to the public. These have attracted the attention of governments, companies, and individuals. Similarly, cyberattacks on web applications are increasing; consequently, increasing the severity of web applications and users’ risks. Cross-site scripting (XSS) attack is one of the prevalent and growing attacks on web applications. Successful XSS attacks lead to various degrees of consequences for users, governments, and businesses. For the user, XSS attacks can be used to steal sensitive user information such as user credentials and session tokens or impersonate the user to carry out authorized actions on behalf of the user. For businesses and governments, XSS attacks can be used to change the appearance or behavior of target websites and steal confidential information. These authorities may face dire consequences, including loss of reputation, legal battles, and financial losses (Deepa & Thilagam, 2016). Cybercriminals exploit security vulnerabilities within web applications often caused by several factors, including the level of application programmers’ experience in security and inheriting vulnerabilities from open-source and third-party packages. These security vulnerabilities could allow cybercriminals to inject malicious content into the HTML trusted pages displayed to end-users (Sarmah, Bhattacharyya & Kalita, 2018). State-of-the-art XSS attack detection systems are applied on the server-side, client-side, or both. The analysis methods used to distinguish between malignant and benign payload could be static, dynamic, or hybrid (Sarmah, Bhattacharyya & Kalita, 2018). However, these methods have limitations, such as low detection rate (DR), high false positive (FP)/negative rates, and often not scalable over time (Mitropoulos & Spinellis, 2017). Therefore, they are inefficient, especially with emerging techniques and evolving forms of XSS payloads developed continuously by cybercriminals (Lekies et al., 2017; Zhou & Wang, 2019). In 2019, XSS attacks became the most widespread attack vectors. Approximately 40% of cyberattacks have been attributed to XSS attacks, according to Precise Security research (Precise Security, 2020), which is expected to increase in the future significantly. Furthermore, the overall number of new XSS vulnerabilities in 2019 (2,023) increased by 79.20% compared with that in 2017 (1,129) as per the National Vulnerabilities Database (National Institute of Standards and Technology, 2020). Additionally, there are various reports and warnings from information security experts in Industrial Control Systems Vulnerabilities Statistics (Andreeva et al., 2016). Many studies in related literature used FP as metrics to measure model accuracy instead of DR, which reveals the effect of unbalanced data and can be expensive in the cybersecurity domain (Elkan, 2001). Technically, the DR represents the effective detection of attacks and is a critical factor in detection systems. When the DR is not clearly defined, it raises concerns about the cybersecurity system’s effectiveness. Consequently, there is an increase in the number of major risks unidentified by various tools/models (Deepa & Thilagam, 2016; Lekies et al., 2017). Existing machine learning (ML) techniques are proven to be highly efficient in handling security challenges. The algorithms are trained using data of previously known behaviors (supervised learning), and each class of behavior is recognized to be either anomalous or Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 2/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 legitimate. However, web pages are a mixture of multiple languages such as JavaScript and HTML, which were formally unstandardized, enabling the use of various coding techniques that are susceptible to attacks. Therefore, XSS attacks have peculiar and irregular characteristics; further, the volume of labeled data on XSS attacks with up-to-date cases is limited and highly unbalanced (Obimbo, Ali & Mohamed, 2017; Nunan et al., 2012; Mokbal et al., 2019). Consequently, applying most standard ML algorithms to XSS data in the straightforward are unsuitable and challenging compared with other well-developed clean data domains (Vluymans, 2019; Mokbal et al., 2019). To our best knowledge, the limited and unbalanced data of XSS cyber-attack based on ML learning have not been addressed in the literature, which is worth to study. The detection system is invariably affected by the class imbalances problem. Specifically, ML algorithms focus on maximizing accuracy, which technically means that all misclassified errors are handled equally and uniformly, implying that the algorithms do not handle unbalanced datasets even if they are accurate and clean (Vluymans, 2019). A learning algorithm may discard instances of the minority class in the dataset in such a problem. The attack samples are often of the minority class and handled as noise while recognizing only samples of the majority class (Buda, Maki & Mazurowsk, 2018). Therefore, the ML-based model design should consider the dataset’s weight and evaluation criteria (Vluymans, 2019). The traditional methods for addressing the challenges of limited and unbalanced data that could be used are oversampling the minority class or undersampling the majority class. Yet, each method has its limitations. Oversampling can lead to overfitting, whereas undersampling may discard useful data, which subsequently leads to loss of information (Vluymans, 2019). To mitigate the challenges of limited and highly unbalanced XSS attack dataset, we proposed a data augmentation method based on the conditional GAN and Wasserstein GAN with a gradient penalty (C-WGAN-GP). Our proposed method aims to achieve the minority class’s oversampling using a more robust generative approach to rebalancing the training dataset by adding identical and valid samples to the minority class. Samples generated based on the minority class’s overall distribution are generalized using the C- WGAN-GP generative network instead of local information as the traditional methods do. The generative adversarial network (GAN) (Goodfellow et al., 2014) is considered a potential solution to the challenges described above. It is a type of a deep generative model that aims to learn the joint probability distribution of samples and labels from training data, which can be further used for several applications such as predictor training, classifier, and data augmentations (Pan et al., 2019). The main contributions of this study can be summarized as the following: • We proposed the WGAN-based adversarial training with conditional minority class (attack labels) to generate valid and indistinguishable samples of real XSS attack data. To preserve various features covering the data space range and enable the generator to learn the original data space distribution, we pass the upper and lower data space to the conditional generator. Furthermore, the augmented data are not added to the Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 3/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 real training data arbitrarily; the process is performed only if the generated sample x̃ satisfies the critic. Thus, it ensures that the added samples are identical to the real data and improving the training data. • We further propose a boosting classification model using the XGBoost algorithm trained on the augmented training dataset generated by C-WGAN-GP, which significantly improved the attack detection efficiency. • The proposed method is evaluated with two real and large unbalance XSS attack datasets. Experiments show that our proposed augmentation framework generates valid samples indistinguishable from real XSS data and outperformed state-of-the-art methods with XSS attacks detection. Although we presented the proposed framework formally for XSS attack detection, it can be generalized and extended to other applications areas. The rest of this study is presented as follows: In ‘Related Work’, we gave the most related literature. In ‘Proposal and experimental methodology’, we introduced the model design and the methodology of experiments. We presented the results and discussion in ‘Results and Discussion’. ‘Conclusions’ is the conclusion and future work. RELATED WORK Web applications have become part of our everyday lives and have achieved significant success and substantial financial gains for organizations; consequently, ML-based XSS attacks detection has gained much attention from the research community. However, there are challenges in using ML-based methods, including finding or designing an adequate, accurate, and balanced dataset designed for ML algorithms usage. Unfortunately, there is no public and standard dataset intended for this purpose (Obimbo, Ali & Mohamed, 2017; Nunan et al., 2012; Mokbal et al., 2019), where researchers can create their datasets based on their requirements and orientation. The authors (Rathore, Sharma & Park, 2017) proposed a classifier model against XSS attacks for social sites using their dataset, which consists of 100 samples collected from XSSed, Alexa, and Elgg. They applied multiple algorithms and achieved the best results by using the random forest algorithm. However, the dataset used to train the algorithm is small, possibly selective, and may not reflect the real attacks. Moreover, the DR score of 0.949 was considered inadequate. Wang et al. (2017) proposed a hybrid analysis method to detect malicious web pages by extracting and using three sets of features: URL, HTML, and JavaScript. The reported DR was 88.20%, implying that the method fails to detect 11.80% of real threats. Another research work (Wang, Cai & Wei, 2016) proposed a deep learning model (stacked denoising autoencoder) to detect malicious codes. They used sparse random projections to reduce the dimensions of the data. Despite the model’s complexity, the DR score was 0.9480, which is inadequate for detecting malicious attacks. Moreover, the model has a high FP rate of 4.20% with a high computational cost. Wang et al. (2014) used an ensemble learning method (ADTree and AdaBoost) to detect XSS attacks. However, the DR score of 0.941 is inadequate, with a high FP rate of 4.20%. Mokbal et al. (2019) proposed a scheme based on dynamic feature extraction and Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 4/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 deep neural network to detect XSS attacks. Using their developed dataset, they achieved an estimated DR of 98.35%. However, the model is a deep neural network, which has potential high computational costs. Multiple studies (López et al., 2013; Haixiang et al., 2017) have thoroughly investigated the problem of unbalanced data. The problem can be mitigated at two levels: first, at the model level, by modifying existing algorithms to focus more on the minority class, such as embedding cost-sensitive methods with an ensemble learning algorithm, and second, at the data level, by preprocessing data before it is fed into the algorithm (López et al., 2013). The data-level approach uses either the undersampling or the oversampling method. Undersampling mitigates class imbalance by randomly removing some samples from the majority class in the training dataset. Conversely, oversampling mitigates the class imbalance by duplicating some minority class samples in the training dataset. However, these methods can result in the loss of important information and overfitting, respectively. Kovács et al. (2002) proposed the synthetic minority oversampling technique (SMOTE) as an oversampling method for the minority class. However, this method generates an equal number of synthetic samples for each real sample from the minority class without considering neighbor samples. Consequently, the overlap between class increases with the potential generation of noisy samples (Vluymans, 2019). Adaptive versions of SMOTE have been proposed, including borderline SMOTE and DBSMOTE. Borderline SMOTE (Han, Wang & Mao, 2002) concentrates synthetic samples along the borderline within classes. Cluster-based algorithm DBSMOTE (Bunkhumpornpat, Sinapiromsaran & Lursinsap, 2012) assembles data samples into clusters using DBSCAN clustering and adaptive synthetic sampling (He et al., 2008). However, these methods are based on local information instead of on the overall-minority class distribution (Douzas & Bacao, 2018). PROPOSAL AND EXPERIMENTAL METHODOLOGY This section presents the different generative networks, including GAN, CGAN, WGAN, and WGAN-GP, in addition to our proposed model. The model architecture, experimental methodology design, XGBoost attack detector, and datasets are also presented as follows. GANs GAN is recently introduced as a novel approach to train a generative model, which has achieved success in different fields, including images and natural language processing (Goodfellow et al., 2014). The network comprises two adversarial models: first, the generative model G for learning the distribution of data and, second, the discriminator D, which estimates the probability that a sample is from the real training data instead of G. Both models G and D in the network compete to outsmart the other where G and D can be a nonlinear mapping function, such as a multilayer perceptron. The generator G learns the distribution pg over data x and constructs a mapping function from noise space of uniform dimension pz(z) to data space as G(z,θg ). The discriminator D(x,θd) returns a single scalar to estimate the probability that an instance x came from the real data distribution rather than pg . Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 5/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 Both G and D are trained together, such that the parameters for G are adjusting to minimize log(1−D(G(z))) and parameters for D are adjusting to minimize logD(X), similarly to the two-player minimax game accompanied by value function V (G,D) as in Eq. (1). min G max D V (G,D)=Ex∼Pr [ logD(x) ] +E∼ x∼Pg [ log(1−D(G(z))) ] (1) A CGAN extends GAN by adding space y to both the G and D to control data generation. The additional space y could be supplied from real data (class label) or data from other sources (Mirza & Osindero, 2014). The training phase of both CGAN and GAN is similar, and the minimax objective function of D and G is as shown in Eq. (2). min G max D V (G,D)=Ex∼Pr [ logD ( x|y )] +E∼ x∼Pg [ log ( 1−D ( G ( z|y ) ,y ))] (2) where pr is the real data distribution and pg is the CGAN model distribution implicitly defined as x̃ = ( z,y ) ,z ∼p(z),y ∼p(y), where y and noise z are combined as input to the hidden layer. The CGAN and GAN use Jensen–Shannon (JS) divergence shown in Eq. (3) to measure generative samples. JS ( pr,pg ) =KL(pr||pm)+KL(pg||pm), { kl is ullback−Leibler divergence pm=(pr +pg )/2 (3) However, both GAN and CGAN have unstable training (vanishing gradients) and mode collapse problems (Pan et al., 2019). To overcome these problems, the WGAN optimizes the original GAN objective using Wasserstein-1 distance, also known as the earth-mover distance (EMD) instead of JS (Arjovsky, Chintala & Bottou, 2017), where EMD measures the distance between the actual distribution of the data and the distribution of the generative model as in Eq. (4). W ( pr,pg ) = inf γ ∈ ∏ (pr,pg ) E(x,y)∼γ [∣∣∣∣x−y∣∣∣∣] (4) where ∏ (pr,pg ) represents all possible joint distribution sets (x,y) of pr and pg of real and generated data distribution, respectively. Such that, for each feasible joint distribution γ , a real instance x and a generated instance y can be sampled, and the instance distance [||x−y||] is calculated. Therefore, the expected value γ E(x,y)∼γ[||x−y||] of the instance to the distance under the joint distribution γ can be calculated. The value function of WGAN was obtained by utilizing Kantorovich–Rubinstein duality (Villani, 2009), as shown in Eq. (5). min G max D∈F V (G,D)= E x∼pr [D(x)]− E ∼ x∼pg [ D ( ∼ x )] (5) where F is the set of 1-Lipschitz functions restricted by k, ∣∣F(x)−F(y)≤k∣∣x−y|, and pg i is the model distribution. The value function is minimized concerning G determined by x̃ =G(z),z ∼p(z). Therefore, the discriminator called a critic minimizes the W (pr,pg ). Nevertheless, the WGAN still faces gradient extinction or gradient explosion because of Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 6/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 weight clipping in the discriminator. The gradient penalty (GP) was added to the total loss function in the WGAN distance discriminator to achieve training stability (Gulrajani et al., 2017). The new objective value function is adjusted, as shown in Eq. (6). min G max D V (G,D)= E x∼pr [D(x)]− E ∼ x∼pg [ D ( ∼ x )] −λEx̂∼px̂ [ (∇x̂D(x̂)||2−1) 2] (6) where x̂ =εx+(1−ε),x̃ is a convex function combination of real data distribution pr(x) and the model data distribution pg(z), ε∼unform[0,1], whereas λ is the gradient penalty parameter. C-WGAN-GP model This study proposes using data augmentation based on a GAN that takes real samples as inputs and outputs adversarial samples. The learning algorithm of our proposed model is based on CGAN (Mirza & Osindero, 2014) and WGAN-GP (Gulrajani et al., 2017), where both networks are integrated. Precisely, we used the WGAN-GP optimization approach to optimize CGAN. The integrated generative network is called C-WGAN-GP. Our goal is to generate synthetic samples of attack class (minority) with identical distribution to real XSS attack scenarios. The primary idea is to use the learning of the joint probability distribution over x samples and y labels from the training data to perform data augmentation only if the generated sample x̃ satisfy the critic. The problem of unbalanced data can be mitigated by using an augmented data in the classification tasks, therefore improving the robustness and performance of the XSS attack detection for unbalanced data. A well-trained generator with joint distributions set (x,y) of pr and pg of real and generated data distribution optimized using GP should be able to generate (x̃,y) samples within the tolerated latent space and identical to the original data of (x,y), therefore providing valuable information to the detector as additional training data. To ensure that only useful instances are added to augment the training dataset, only generated cases that satisfy the critic are added to the original data. The y labels of the minority class, which are XSS attacks in our case, are used as a conditional parameter. Passing the upper and lower real data space to the generator provides the generator with additional auxiliary information to define the latent space. The latent space establishes the scope of samples in the data variance. Therefore, the generator using the auxiliary latent space generates samples within the tolerated latent space identical to the real data. Consequently, the discriminator distinguishes the synthetic samples as real within small feedback loops needed to train the generator, reducing computational cost while providing high-quality generated data. In the discriminator D, the pr and pg are linked with y in a joint hidden layer representation, whereas in generator G, the y is combined with p(z) in the same manner. The objective minimax function of models D and G is as shown in Eq. (7), whereas Eqs. (8) and (9) represent the loss reduction functions of D and G, respectively. min G max D∈F V (G,D)= E x∼pr [ D(x|y) ] − E ∼ x∼pg [ D ( ∼ x|y )] −λEx̂∼px̂ [ (∇x̂D ( x̂|y ) ||2−1) 2] (7) Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 7/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 min L(D)= E ∼ x∼pg [ D ( ∼ x|y )] − E x∼pr [ D ( x|y )] +λ.Ex̂∼px̂ [ (∇x̂D ( x̂|y ) ||2−1) 2] (8) min L(G)=− E ∼ x∼pg [ D ( ∼ x|y )] (9) Generative model design Generative networks have gained popularity in image data; however, we are interested in digital datasets. Therefore, the technique used is similar but differs in design and implementation. In our model, we did not apply convolutional layers. In the generator model G, the concatenate input layer equal to (z + c), where z is the vectors of noise set within the range of batch size and data dimension. The c is the conditioning variable’s dimension that equals 1. The model has three hidden layers of the neural network—the number of units in hidden layers equal to 128, 256, and 512, respectively. The output layer equal to z and the concatenate layer is equal to the input layer (z + c). In the discriminator (critic) D, the same architecture is used but in descending order of hidden layers, which equal to 512, 256, 128, respectively. The third layer is a linear activation function that equals to 1. The batch size and numbers of epochs for the network are 128 and 4000, respectively. The activation function used for the generator and discriminator is the rectified linear unit. The C-WGAN-GP is fitted using Adam optimizer with α, β1, and β2 parameters calibrated to le-4, 0.5, and 0.99, respectively. The alpha or (α) for short refers to the learning rate, while beta1 (β1) and beta2 (β2) refer to the exponential decay rate for the first-moment and second-moment estimates, respectively. The value of the GP coefficient λ for C-WGAN-GP is set to 10. The parameter k of D is tuned to 4, whereas k of G is tuned to 1. A conditional critic neural network is trained to approximate the EMD using the minority class for control mode up to 4000 training steps. Note that the parameters are defined empirically, whereas the performance reduces significantly by changing the parameters. The other different hyperparameters not mentioned are consistent with those originally reported. During the testing phase, the generated samples added to the real dataset are the samples approved by the critic. Algorithm 1 presents the generative approach for XSS attack data. Note that the other generator network architectures are similar, with negligible differences, which may be necessary for the implementation. Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 8/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 Box 1. The C-WGAN-GP algorithm. Algorithm 1: C-WGAN-GP Required: Batch size m =128, n-critic =4, penalty parameter =10 and Adam optimizer parameters (𝜆 )𝑙𝑟 = 0.0001,𝛽1 = 0.5, 𝛽2 = 0.99 1: for ►𝑖𝑛𝑡𝑖𝑚𝑎𝑙 𝑣𝑎𝑙𝑖𝑢𝑚𝑠 𝑐𝑟𝑖𝑡𝑖𝑐 𝑎𝑛𝑑 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑜𝑟: 2: 𝑠𝑒𝑡 𝜃𝑐 = 0 , 𝜃𝑔 = 0 3: 𝑤ℎ𝑖𝑙𝑒 𝜃 ℎ𝑎𝑠 𝑛𝑜𝑡 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑑 𝑑𝑜 6: ►𝐸𝑥𝑒𝑐𝑢𝑡𝑒 𝑛 𝑐𝑟𝑖𝑡𝑖𝑐 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑡𝑒𝑝𝑠 𝑓𝑜𝑟 𝑑𝑖𝑠𝑐𝑟𝑖𝑚𝑖𝑛𝑎𝑡𝑜𝑟 (𝑐𝑟𝑖𝑡𝑖𝑐 ) 7: 𝑓𝑜𝑟 𝑡 = 1,…, 𝑛_𝑐𝑟𝑖𝑡𝑖𝑐 𝑑𝑜 8: 𝑓𝑜𝑟 𝑖 = 1,…, 𝑚 𝑑𝑜 9: 𝑆𝑎𝑚𝑝𝑙𝑒 {(𝑥𝑖,𝑦𝑖)} 𝑚𝑖 = 1~Ɗ 𝑎 𝑏𝑎𝑡𝑐ℎ 𝑜𝑓 𝑠𝑖𝑧𝑒 𝑚 𝑜𝑓 𝑟𝑒𝑎𝑙 𝑑𝑎𝑡𝑎 𝑤𝑖𝑡ℎ 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑙𝑎𝑏𝑒𝑙 𝑦𝑖 10: 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑛𝑜𝑖𝑠𝑒 {(𝑧𝑖)} 𝑚𝑖 = 1~𝑝𝑧(𝑧), 𝑎 𝑟𝑎𝑛𝑑𝑜𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝜀~ 𝕌[0,1] 11: ►𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑎 𝑓𝑎𝑘𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑋𝑖 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑚𝑖 𝑟𝑒𝑎𝑙 𝑙𝑎𝑏𝑒𝑙 𝑦𝑖 12: 𝑋𝑖←𝐺(𝑧𝑖|𝑦𝑖,𝜃𝑔) 13: ►𝐶𝑜𝑚𝑝𝑢𝑡 𝑝𝑒𝑛𝑎𝑙𝑡𝑟𝑦 𝐺𝑝 𝑎𝑛𝑑 𝑙𝑜𝑠𝑠 𝑓𝑜𝑟 𝑐𝑟𝑖𝑡𝑖𝑐 14: 𝑥𝑖←𝜀𝑥𝑖 + (1 ‒ 𝜀)𝑥𝑖 15: 𝐺𝑝(𝜃𝑐)← 1𝑚∑𝑚𝑖 = 1[𝑚𝑎𝑥(‖∇𝑥 �ℱ(𝑥𝑖|𝑦𝑖,𝜃𝑐)||2 ‒ 1)2] 15: 𝔼(𝜃𝑐)←∇𝜃𝑐[ 1𝑚∑𝑚𝑖 = 1ℱ(𝑥𝑖|𝑦𝑖,𝜃𝑐) ‒ 1𝑚∑𝑚𝑖 = 1ℱ(𝑥𝑖|𝑦𝑖,𝜃𝑐)] + 𝜆.𝐺𝑝(𝜃𝑐) 16: 𝑒𝑛𝑑𝑓𝑜𝑟 17: 𝜃𝑐←𝐴𝑑𝑎𝑚(𝔼𝜃𝑐,𝜃𝑐,𝛼,𝛽1,𝛽2) 18: 𝑒𝑛𝑑𝑓𝑜𝑟 19: ►𝐸𝑥𝑒𝑐𝑢𝑡𝑒 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑜𝑟 𝐺 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑡𝑒𝑝 20: 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 {𝑦𝑖} 𝑚𝑖 = 1~Ɗ 𝑎 𝑏𝑎𝑡𝑐ℎ 𝑜𝑓 𝑠𝑖𝑧𝑒 𝑚 𝑜𝑓 𝑟𝑒𝑎𝑙 𝑙𝑎𝑏𝑒𝑙 𝑦𝑖 21: 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑛𝑜𝑖𝑠𝑒 {𝑧𝑖} 𝑚𝑖 = 1~𝑝𝑧(𝑧) 22: ►𝐶𝑜𝑚𝑝𝑢𝑡𝑒 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝑤𝑖𝑡ℎ 𝑟𝑒𝑠𝑝𝑒𝑐𝑡 𝑡𝑜 𝑡ℎ𝑒 𝐺 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 23: 𝔼(𝜃𝑔)←∇𝜃𝑔[ 1𝑚∑𝑚𝑖 = 1ℱ(𝑔(𝑧𝑖|𝑦𝑖,𝜃𝑔)|𝑦𝑖,𝜃𝑐) 24: 𝜃𝑔←𝐴𝑑𝑎𝑚( ‒ 𝔼𝜃𝑔,𝜃𝑐,𝛼,𝛽1,𝛽2) 25: 𝑒𝑛𝑑 𝑤ℎ𝑖𝑙𝑒 26: 𝑙𝑜𝑎𝑑 𝑡ℎ𝑒 𝑏𝑒𝑠𝑡 𝑠𝑡𝑒𝑝 𝑤𝑒𝑖𝑔ℎ𝑡 𝑓𝑜𝑟 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑣𝑒 𝑛𝑒𝑡 27: 𝑑𝑎𝑡𝑎_ 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 [ ] 28: 𝑤ℎ𝑖𝑙𝑒 𝜃 ℎ𝑎𝑠 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑑 𝑑𝑜 29 ►𝑔𝑒𝑛𝑒𝑟𝑎𝑡 𝑋𝑖 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 30: for each 𝑠𝑎𝑚𝑝𝑙𝑒 𝑋𝑖 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑟𝑒𝑎𝑙 𝑙𝑎𝑏𝑒𝑙 𝑦𝑖 𝑑𝑜 31: 𝑖𝑓 𝑋𝑖 𝑠𝑎𝑡𝑖𝑠𝑓𝑦 𝑡ℎ𝑒 𝑐𝑟𝑖𝑡𝑖𝑐 𝑡ℎ𝑒𝑛 32: 𝑑𝑎𝑡𝑎_𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑.𝑎𝑝𝑝𝑒𝑛𝑑 [𝑋𝑖] 33: 𝑒𝑛𝑑 𝑓𝑜𝑟 34: 𝑒𝑛𝑑 𝑤ℎ𝑖𝑙𝑒 35: 𝐽𝑜𝑖𝑛 𝑑𝑎𝑡𝑎_𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 𝑋𝑖 𝑡𝑜 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑥𝑖 36: 𝑡𝑟𝑒𝑎𝑡𝑖𝑛𝑔 𝑎 𝑛𝑒𝑤 𝑋𝐺𝐵𝑜𝑜𝑠𝑡 𝑚𝑜𝑑𝑒𝑙 1 PeerJ Comput. Sci. reviewing PDF | (CS-2020:08:51806:1:1:NEW 10 Oct 2020) Manuscript to be reviewedComputer Science Experimental methodology This study proposed generative model C-WGAN-GP, an oversampling solution of minority class to solve unbalanced XSS attack data. We trained the detector (XGBoost) using the real training dataset and test it using the test dataset before without augmented data, and then the results were recorded for comparison. Subsequently, we trained each of the GANs models using real training dataset to generate synthetic data. We repeated the training of the detector using the augmented data and test it using the test dataset. The results of each model were recorded for comparison. Similarly, traditional oversampling methods were trained and tested. The C-WGAN-GP generator performance was evaluated in two directions. First, we assessed the performance of the C-WGAN-GP generative adversarial network against the Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 9/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 other four GANs. Second, we compared our C-WGAN-GP model with two traditional oversampling methods include SMOTE (Han, Wang & Mao, 2002) and adaptive synthetic (ADASYN) (Bunkhumpornpat, Sinapiromsaran & Lursinsap, 2012). The systemic flowchart of the proposal is shown in Fig. 1. Detector We use an external model to evaluate the quality of the data generated by our proposed method and other methods. The XGBoost boosting model was used for all experiments to assess the augmented data’s quality on XSS attack detection performance. The XGBoost is a state-of-the-art boosting algorithm that is simple to apply and interpret, is highly efficient, does not require advanced data preparation, and has many advanced functions (Chen & Guestrin, 2016). The algorithm’s learning rate, tree size, and tree number hyperparameters tuned to 0.3, 4, and 100, respectively. Datasets To our knowledge, there is only one public dataset for intrusion detection that includes XSS attacks that we were able to find called CICIDS2017 designed by the Canadian Institute of Cybersecurity (Sharafaldin, HabibiLashkari & Ghorbani, 2018). The released CICIDS2017 dataset contains 80 features that include regular traffic and recent common attacks. To provide attack features, we used a CICFlowMeter tool to extract features from PCAPs files and used Selenium with Damn Vulnerable Web Application to run automatic XSS attacks. However, there are only 652 XSS attacks traffic, and over 140,008 regular traffic, making it a highly unbalanced dataset. We have added another dataset proposed in our previous work (Mokbal et al., 2019). The dataset includes 67 features with labels that are categorized based on three groups, including HTML, JavaScript, and URL. We extracted 1,000 XSS attack samples and 100,000 benign samples as a second dataset. We applied data preprocessing to each dataset. In both datasets, all samples have two classes, XSS attack and benign, which are set to [1, 0], respectively. The two datasets were split randomly into training and test sets with a 70% and 30% ratio, respectively, where data augmentation was performed only on the training dataset. Missing and infinite values in CICIDS2017 were updated using their features’ mean values, whereas the zero features and duplicate rows were omitted. The number of features with clean data in the CICIDS2017 dataset is 78, and the number of features with clean data in the second dataset is 67. Subsequently, the data’s scale within the range [0, 1] was applied using the minimax function for both datasets. The class-level distribution of datasets is shown in Table 1. The datasets are available at https://doi.org/10.6084/m9.figshare.13046138.v4. Performance evaluation criteria Although many performance metrics have been introduced, GANs do not have an objective measure of the generator model, as there is no consensus on the best metric that captures the strength/limitations of the model and should be used for a fair comparison between models (Borji, 2019). We used precision, detectionRate(DR)/ recall, and F1−score, which are proven and widely adopted methods for quantitatively estimating the quality of discriminative Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 10/20 https://peerj.com https://doi.org/10.6084/m9.figshare.13046138.v4 http://dx.doi.org/10.7717/peerj-cs.328 Unbalanced dataset Testing data Training data Min Max scaling Preprocessing Classifier training Augmented data Result Benign Attack 10-Fold Cross validation (cv) Append the sample generated Jointed real and generated data Critic (D) Generator (G) Accepted No Yes Fine-tuning training A tt a ck c la ss N o is e R e a l tr a in in g d a ta C-WGAN-GP XGBoost Figure 1 Systemic flowchart of C-WGAN-GP. Full-size DOI: 10.7717/peerjcs.328/fig-1 Table 1 The class-level distribution of the datasets. ID Dataset #Samples #Attributes Minority class Majority class #Minority samples #Majority samples 1 CICIDS2017 (Sharafaldin, HabibiLashkari & Ghorbani, 2018) 140,660 77 Attack Benign 652 140,008 2 MLPXSS (Mokbal et al., 2019) 101,000 67 Attack Benign 1000 100,000 models suggested by Google Brain research (Lucic et al., 2018). Precision measures the generated instances similarity to the real ones on average. Whenever the instances generated are similar to the real instances, the precision is high. In GANs, the recall (detection rate) measures diversity. A high recall indicates the generator can generate any instances found in the training dataset. For cybersecurity detection sys, the recall/detection rate denotes the ability of sys to detect the real attacks. The F-score reflects the harmonic mean of precision and recall. Further, the area under the curve (AUC) measure that demonstrates a detector’s ability to distinguish between classes and summarizes the detector’s performance is also collected. The measurements are defined as follows. Precision= TP (TP+FP) (10) DR= TP (TP+FN) (11) Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 11/20 https://peerj.com https://doi.org/10.7717/peerjcs.328/fig-1 http://dx.doi.org/10.7717/peerj-cs.328 F−Score=2 ( (Recall×Precision) (Recall+Precision) ) (12) AUC = 1 2 ( TP TP+FN + TN TN +FP ) (13) RESULTS AND DISCUSSION The C-WGAN-GP-based data augmentation approach was implemented in Python 3.7 using the TensorFlow framework on Linux operating system. The proposed method was implemented alongside four other GAN-based generative methods and two traditional generative methods (SMOTE and ADASYN). All the methods were validated using 10-fold cross-validation. To demonstrate how the attack DR decreased as the gap between normal and malicious classes increased, we injected different ratios of the majority class in the training data to train the XGBoost detector model using AUC and DR criteria. The model tested on the fixed test dataset size of 30% each time. During the attack detection test, the results show that DR decreased from 96.59% with an injected ratio of 2% of the majority class to approximately 91.00% with an injected ratio of 100% of the majority class. These results are shown in Table 2. Using our generative approach, we injected the generated data into a real training dataset to create a new augmented training dataset. The augmented data were used to train the XSS attack detector, which is different from the generative framework to judge the data quality. The detector was tested using the real test dataset. This experiment mechanism was applied to all of the methods we used, and results were reported for each scenario. The average results reported in each trial on the CICIDS2017 dataset are shown in Table 3 (Sharafaldin, HabibiLashkari & Ghorbani, 2018). Using the same mechanism, we repeated the experiments using the dataset extracted from our previous work (Mokbal et al., 2019), and the results are shown in Table 4. Notably, using augmented data generated by the proposed method significantly improved the XSS attacks detection compared with state-of-the-art baseline methods. The results in Tables 3 and 4 show that the DR increased up to 98.95% in the CICIDS2017 dataset and up to 96.67% in the second dataset. That is, our generative model able to generate any sample found in the XSS attack training dataset. The precision measure is also high, which equals 0.99333 on the first dataset and 0.989761 on the second dataset. The precision results imply that the proposed generative model generated samples look similar to the real XSS attack samples on average. Concerning F-Score, the proposed generative model was superior to other generative methods. It achieved the result score of 0.990382 in the first data set and the score of 0.978078 in the second dataset. The AUC measure of the proposed generative model is also continued to be outperformed the other methods in both datasets with a significant margin. Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 12/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 Table 2 The collapse of the detection rate as the unbalanced classes’ gap increased. Ratio Train-auc mean Train-auc std Train-recall mean Train-recall std Test-auc mean Test-auc std Test-recall mean Test-recall std 2% 0.998624 0.000345 0.976264 0.004189 0.995596 0.004673 0.965895 0.008869 5% 0.997947 0.001006 0.966985 0.003804 0.994683 0.001888 0.954185 0.007851 14% 0.991259 0.004218 0.946233 0.003984 0.985622 0.009439 0.937748 0.023061 40% 0.966395 0.002146 0.927482 0.003458 0.965025 0.010199 0.922232 0.022333 100% 0.965471 0.001782 0.923503 0.005247 0.965422 0.007108 0.910094 0.011817 Table 3 Detection results using data augmented generated through different methods on the CICIDS2017 dataset. Criteria None ADASYN SMOTE GAN CGAN WGAN WGAN-GP C-WGAN-GP DR (sensitivity) 0.883333 0.98667 0.96000 0.92667 0.96333 0.97333 0.97833 0.987452 Specificity 0.929967 0.99983 0.99980 0.99993 0.99993 0.99987 0.99996 0.99993 Precision 0.90513 0.98339 0.97959 0.99286 0.99313 0.98649 0.98949 0.99333 F-score 0.894099 0.98502 0.96970 0.95862 0.97800 0.97987 0.983878 0.990382 AUC 0.917548 0.99325 0.97990 0.9633 0.98163 0.9866 0.989148 0.993691 Table 4 Detection results using data augmented generated through different methods on the second dataset. Criteria None ADASYN SMOTE GAN CGAN WGAN WGAN-GP C-WGAN-GP DR (sensitivity) 0.873333 0.956667 0.943343 0.932983 0.933333 0.94667 0.951211 0.966667 Specificity 0.979967 0.999833 0.999933 0.989867 0.999867 0.99973 0.999933 0.9999 Precision 0.956198 0.982877 0.992982 0.985915 0.985915 0.97260 0.977864 0.989761 F-score 0.912889 0.969595 0.967527 0.958719 0.958904 0.95945 0.964353 0.978078 AUC 0.92665 0.97825 0.970138 0.961425 0.9666 0.9732 0.975572 0.983283 The results for ADASYN, WGAN-GP, WGAN, SMOTE, and CGAN showed improved DR performance, respectively, with varying proportions. The effects of the additional samples on the XSS attack DR under the condition of acceptance by the discriminator on the CICIDS2017 dataset are shown in Fig. 2 along with standard deviation. The standard deviation of the C-WGAN-GP is overall small compared to other methods; it also decreases as training steps increase. While the standard deviation of CGAN and WGAN is not smooth and shows more variation. The standard deviation of WGAN-GP was smoother than GAN and WGAN but less smooth and more varied than C-WGAN-GP. This fact indicates the stability of C-WGAN-GP training to some extent. The results suggest that the C-WGAN-GP significantly outperformed CGAN, WGAN and WGAN-GP. The superiority of the C-WGAN-GP over the rest of the GANs is due to the fact that the model is enhanced with the characteristics of two generative networks, CGAN and WGAN- GP. The C-WGAN-GP used minority class labels that act as an extension to the latent space z to generate and discriminate instances well, which inspired from CGAN. Consequently, the model can learn a multi-modes mapping from inputs to outputs by feeding it with different contextual auxiliary information. The C-WGAN-GP optimized using Wasserstein distance with gradient penalty inspired by WGAN-GP. The training process is more stable Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 13/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.328 Figure 2 Effects of the data augmentation on the XSS attack detection rate over the baseline for GAN, WGAN, WGAN-GP, and C-WGAN-GP. (A) CGAN detection rate, (B) WGAN detection rate, (C) WGAN-GP detection rate, and (D) C-WGAN-GP detection rate. The red horizontal dashed line indicates the baseline estimate of each generative model. Full-size DOI: 10.7717/peerjcs.328/fig-2 and less sensitive to model design and configurations of hyperparameter. Further, the loss of the critic is related to the quality of instances created by the generator. Precisely, the lower the critic’s loss when evaluating the instances generated, the higher the expected quality of the instances generated. This criterion is crucial because unlike other GANs that seek stability by finding a balance between two models, WGAN seeks convergence and minimizes generator loss. Furthermore, adding the generated samples of CGAN, WGAN, and WGAN-GP that satisfied the discriminator’s acceptance condition (critic) adds value to the augmented training dataset, which increases detector ability and efficiency. The loss of generated data for C-WGAN-GP compared with that of the other four GAN methods is shown in Fig. 3. It is quite clear that the loss curve of C-WGAN-GP decreased regularly and continuously compared to all other generative methods. The loss curves of GAN and CGAN are unstable, and the models went to collapse mode during the generating phase. The WGAN and WGAN–GP loss curves decreased regularly; however, it is high compared with C-WGAN-GP. Note that GAN and CGAN are using JS divergence, whereas WGAN and C-WAN-GP are using the Wasserstein distance or EMD. Similarly, in the loss curve of real data, the GAN and CGAN face difficulty learning the training data distribution. In contrast, the WGAN and WGAN–GP losses decreased regularly; however, it is high compared with C-WGAN-GP. The C-WGAN-GP seems to learn the training data distribution better than all other generative methods, as shown in Fig. 4. To estimate the proposed method’s generalization ability, we investigated the Wasserstein critic, in which the distance between actual and generated data losses is calculated. This estimate demonstrates how much the data generated by the proposed model and real data are identical. The difference in distance between the real and generated data distribution of WGAN, WGAN-GP, and C-WGAN-GP that generative models learn to minimize is shown in Fig. 5. The distance between generated and real data of C-WGAN-GP is close to zero. That is, The C-WGAN-GP generated samples that are identical to real data distribution; further, the training stability of the proposed generative model is adequate. For further clarification, the XGBoost classification accuracy trained on the five different generative methods’ data is shown in Fig. 6. The XGBoost accuracy curve of C-WGAN-GP data is higher than that of other models, which indicates the quality of the data generated Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 14/20 https://peerj.com https://doi.org/10.7717/peerjcs.328/fig-2 http://dx.doi.org/10.7717/peerj-cs.328 0 500 1000 1500 2000 2500 3000 3500 4000 Training step 0.1 0.2 0.3 0.4 0.5 0.6 G en er at ed L os se s GAN CGAN WGAN WGAN_GP C_WGAN_GP Figure 3 The loss of generated data. Full-size DOI: 10.7717/peerjcs.328/fig-3 0 500 1000 1500 2000 2500 3000 3500 4000 Training step 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 R ea l L os se s GAN CGAN WGAN WGAN_GP C_WGAN_GP Figure 4 The loss of real data. Full-size DOI: 10.7717/peerjcs.328/fig-4 by the proposed model. Figure 7 shows a general visualization example of the data quality generated by C-WGAN-GP compared with other generative methods and displays the collapse mode of GAN and CGAN between 2500 and 4000 of training steps in the second dataset. In addition to the beginning of the gradient extinction of WGAN at 4000 of training steps. CONCLUSIONS This study proposed a conditional critic neural network with a gradient penalty called C- WGAN-GP to improve the XSS attack detection on unbalanced datasets. The C-WGAN-GP Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 15/20 https://peerj.com https://doi.org/10.7717/peerjcs.328/fig-3 https://doi.org/10.7717/peerjcs.328/fig-4 http://dx.doi.org/10.7717/peerj-cs.328 0 500 1000 1500 2000 2500 3000 3500 4000 Training step 0.01 0.02 0.03 0.04 G en er at ed - A ct ua l C rit ic L os s WGAN WGAN_GP C_WGAN_GP Figure 5 Difference between critic (EM distance estimate) loss on generated and real samples. Full-size DOI: 10.7717/peerjcs.328/fig-5 500 1000 1500 2000 2500 3000 3500 4000 Training step 0.99775 0.99800 0.99825 0.99850 0.99875 0.99900 0.99925 X G B bo os t a cc ur ac y GAN CGAN WGAN WGAN_GP C_WGAN_GP Figure 6 The detector loss function over various generated data. Full-size DOI: 10.7717/peerjcs.328/fig-6 is trained to approximate the EM distance with an auxiliary of minority class for control mode to generate valid and reliable synthetic samples with identical distribution to real XSS attack scenarios. We trained a new boosting model using the augmented dataset to improve the XSS attack detection system and mitigate an unbalanced dataset problem. We conducted experiments to compare the proposed method with GAN, CGAN, WGAN, WGAN-GP, SMOTE, and ADASYN using two real-world XSS attack datasets. Experimental results show that the proposed method can train a generator model with improved training stability. The proposed method enhanced the detection of XSS attacks and prevented adversarial examples that have been widely used to target AI cyber defense systems. Furthermore, the Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 16/20 https://peerj.com https://doi.org/10.7717/peerjcs.328/fig-5 https://doi.org/10.7717/peerjcs.328/fig-6 http://dx.doi.org/10.7717/peerj-cs.328 0.0 0.5 html_tag_script 0.2 0.0 0.2 js _s tr in g_ m ax _l en gt h A Class 1 Class 2 0.0 0.5 html_tag_script 0.2 0.0 0.2 B 0.0 0.5 html_tag_script 0.2 0.0 0.2 C 0.0 0.5 html_tag_script 0.2 0.0 0.2 D 0.0 0.5 html_tag_script 0.2 0.0 0.2 E 0.0 0.5 html_tag_script 0.2 0.0 0.2 F 0.0 0.5 html_tag_script 0.2 0.0 0.2 js _s tr in g_ m ax _l en gt h G 0.0 0.5 html_tag_script 0.2 0.0 0.2 H 0.0 0.5 html_tag_script 0.2 0.0 0.2 I 0.0 0.5 html_tag_script 0.2 0.0 0.2 J 0.0 0.5 html_tag_script 0.2 0.0 0.2 K 0.0 0.5 html_tag_script 0.2 0.0 0.2 L 0.0 0.5 html_tag_script 0.2 0.0 0.2 js _s tr in g_ m ax _l en gt h M 0.0 0.5 html_tag_script 0.2 0.0 0.2 N 0.0 0.5 html_tag_script 0.2 0.0 0.2 O 0.0 0.5 html_tag_script 0.2 0.0 0.2 P 0.0 0.5 html_tag_script 0.2 0.0 0.2 Q 0.0 0.5 html_tag_script 0.2 0.0 0.2 R 0.0 0.5 html_tag_script 0.2 0.0 0.2 js _s tr in g_ m ax _l en gt h S 0.0 0.5 html_tag_script 0.2 0.0 0.2 T 0.0 0.5 html_tag_script 0.2 0.0 0.2 U 0.0 0.5 html_tag_script 0.2 0.0 0.2 V 0.0 0.5 html_tag_script 0.2 0.0 0.2 W 0.0 0.5 html_tag_script 0.2 0.0 0.2 X 0.0 0.5 html_tag_script 0.2 0.0 0.2 js _s tr in g_ m ax _l en gt h Y 0.0 0.5 html_tag_script 0.2 0.0 0.2 Z 0.0 0.5 html_tag_script 0.2 0.0 0.2 AA 0.0 0.5 html_tag_script 0.2 0.0 0.2 BB 0.0 0.5 html_tag_script 0.2 0.0 0.2 CC 0.0 0.5 html_tag_script 0.2 0.0 0.2 DD 0.0 0.5 html_tag_script 0.2 0.0 0.2 js _s tr in g_ m ax _l en gt h EE 0.0 0.5 html_tag_script 0.2 0.0 0.2 FF 0.0 0.5 html_tag_script 0.2 0.0 0.2 GG 0.0 0.5 html_tag_script 0.2 0.0 0.2 HH 0.0 0.5 html_tag_script 0.2 0.0 0.2 II 0.0 0.5 html_tag_script 0.2 0.0 0.2 JJ 0.0 0.5 html_tag_script 0.2 0.0 0.2 js _s tr in g_ m ax _l en gt h KK 0.0 0.5 html_tag_script 0.2 0.0 0.2 LL 0.0 0.5 html_tag_script 0.2 0.0 0.2 MM 0.0 0.5 html_tag_script 0.2 0.0 0.2 NN 0.0 0.5 html_tag_script 0.2 0.0 0.2 OO 0.0 0.5 html_tag_script 0.2 0.0 0.2 PP training step 0 training step 100 training step 200 training step 500 training step 1000 training step 2000 training step 4000 Figure 7 (A-PP) General visualization of sample generation for C-WGAN-GP compared to other gen- erative methods. Full-size DOI: 10.7717/peerjcs.328/fig-7 C-WGAN-GP method can be extended to other forms of attacks and other fields, including the medical field, where datasets are highly unbalanced. For future work, we will investigate network training stability to generate data using various designs over different network architectures. It is a significant problem worthy of further research. ADDITIONAL INFORMATION AND DECLARATIONS Funding The authors received no funding for this work. Competing Interests The authors declare there are no competing interests. Author Contributions • Fawaz Mahiuob Mohammed Mokbal conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 17/20 https://peerj.com https://doi.org/10.7717/peerjcs.328/fig-7 http://dx.doi.org/10.7717/peerj-cs.328 prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. • Dan Wang conceived and designed the experiments, prepared figures and/or tables, authored or reviewed drafts of the paper, supervising work, and approved the final draft. • Xiaoxi Wang analyzed the data, performed the computation work, prepared figures and/or tables, and approved the final draft. • Lihua Fu analyzed the data, authored or reviewed drafts of the paper, and approved the final draft. Data Deposition The following information was supplied regarding data availability: All raw data are available as Supplemental Files. Additional data are available at figshare: Mokbal, Fawaz (2020): Cross-Site Scripting Attack (XSS) dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.13046138.v4. Supplemental Information Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj-cs.328#supplemental-information. REFERENCES Andreeva O, Gordeychik S, Gritsai G, Kochetova O, Potseluevskaya E, Sidorov SI, Timorin AA. 2016. Industrial control systems vulnerabilities statistics. Kaspersky Lab, Report DOI 10.13140/RG.2.2.15858.66241. Arjovsky M, Chintala S, Bottou L. 2017. Wasserstein generative adversarial networks. In: 34th international conference on machine learning, ICML 2017, vol. 1. 298–321. Borji A. 2019. Pros and cons of GAN evaluation measures. Computer Vision and Image Understanding 179:41–65 DOI 10.1016/j.cviu.2018.10.009. Buda M, Maki A, Mazurowski MA. 2018. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks 106:249–259 DOI 10.1016/j.neunet.2018.07.011. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C. 2012. DBSMOTE: density- based synthetic minority over-sampling technique. Applied Intelligence 36:664–684 DOI 10.1007/s10489-011-0287-y. Chen T, Guestrin C. 2016. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’16. New York: ACM Press, 785–794 DOI 10.1145/2939672.2939785. Deepa G, Thilagam PS. 2016. Securing web applications from injection and logic vulnerabilities: approaches and challenges. Information and Software Technology 74:160–180 DOI 10.1016/j.infsof.2016.02.005. Douzas G, Bacao F. 2018. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications 91:464–471 DOI 10.1016/j.eswa.2017.09.030. Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 18/20 https://peerj.com https://doi.org/10.6084/m9.figshare.13046138.v4 http://dx.doi.org/10.7717/peerj-cs.328#supplemental-information http://dx.doi.org/10.7717/peerj-cs.328#supplemental-information http://dx.doi.org/10.13140/RG.2.2.15858.66241 http://dx.doi.org/10.1016/j.cviu.2018.10.009 http://dx.doi.org/10.1016/j.neunet.2018.07.011 http://dx.doi.org/10.1007/s10489-011-0287-y http://dx.doi.org/10.1145/2939672.2939785 http://dx.doi.org/10.1016/j.infsof.2016.02.005 http://dx.doi.org/10.1016/j.eswa.2017.09.030 http://dx.doi.org/10.7717/peerj-cs.328 Elkan C. 2001. The foundations of cost-sensitive learning. In: IJCAI international joint conference on artificial intelligence. 973–978. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 3:2672–2680. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A. 2017. Improved training of wasserstein GANs. In: Advances in neural information processing systems 2017- December. 5768–5778. Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G. 2017. Learning from class-imbalanced data: review of methods and applications. Expert Systems with Applications 73:220–239 DOI 10.1016/j.eswa.2016.12.035. Han H, Wang WY, Mao BH. 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Berlin, Heidelberg: Springer, 878–887. He H, Bai Y, Garcia EA, Li S. 2008. ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the international joint conference on neural networks. 1322–1328 DOI 10.1109/IJCNN.2008.4633969. Kovács B, Tinya F, Németh C, Ódor P. 2002. SMOTE: synthetic minority over-sampling technique nitesh. Ecological Applications 30:321–357 DOI 10.1002/eap.2043. Lekies S, Kotowicz K, Grob S, Nava EAV, Johns M. 2017. Code-Reuse attacks for theweb: breaking cross-site scripting mitigations via script gadgets. In: Proceedings of the ACM conference on computer and communications security. New York: ACM Press, 1709–1723 DOI 10.1145/3133956.3134091. López V, Fernández A, García S, Palade V, Herrera F. 2013. An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Information Sciences 250:113–141 DOI 10.1016/j.ins.2013.07.007. Lucic M, Kurach K, Michalski M, Bousquet O, Gelly S. 2018. Are Gans created equal? A large-scale study. In: Advances in Neural Information Processing Systems 2018- December. 700–709. Mirza M, Osindero S. 2014. Conditional generative adversarial nets. ArXiv preprint. arXiv:1411.1784. Mitropoulos D, Spinellis D. 2017. Fatal injection: a survey of modern code injection attack countermeasures. PeerJ Computer Science 3:e136 DOI 10.7717/peerj-cs.136. Mokbal FMM, Dan W, Imran A, Jiuchuan L, Akhtar F, Xiaoxi W. 2019. MLPXSS: an integrated XSS-based attack detection scheme in web applications using multilayer perceptron technique. IEEE Access 7:100567–100580 DOI 10.1109/access.2019.2927417. National Institute of Standards and Technology. 2020. National Vulnerability Database (NVD), Vulnerabilities. Available at https://nvd.nist.gov/vuln. Nunan AE, Souto E, dosSantos EM, Feitosa E. 2012. Automatic classification of cross- site scripting in web pages using document-based and URL-based features. In: 2012 IEEE symposium on computers and communications (ISCC). Piscataway: IEEE, 000702–000707 DOI 10.1109/ISCC.2012.6249380. Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 19/20 https://peerj.com http://dx.doi.org/10.1016/j.eswa.2016.12.035 http://dx.doi.org/10.1109/IJCNN.2008.4633969 http://dx.doi.org/10.1002/eap.2043 http://dx.doi.org/10.1145/3133956.3134091 http://dx.doi.org/10.1016/j.ins.2013.07.007 http://arXiv.org/abs/1411.1784 http://dx.doi.org/10.7717/peerj-cs.136 http://dx.doi.org/10.1109/access.2019.2927417 https://nvd.nist.gov/vuln http://dx.doi.org/10.1109/ISCC.2012.6249380 http://dx.doi.org/10.7717/peerj-cs.328 Obimbo C, Ali K, Mohamed K. 2017. Using IDS to prevent XSS attacks. In: Int’ l Conf. security and management. 233–239. Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y. 2019. Recent progress on gener- ative adversarial networks (GANs): a survey. IEEE Access 7:36322–36333 DOI 10.1109/ACCESS.2019.2905015. Precise Security. 2020. Cross-Site Scripting (XSS) makes nearly 40% of all cyber attacks in 2019 - PreciseSecurity.com. Available at https://www.precisesecurity.com/articles/ cross-site-scripting-xss-makes-nearly-40-of-all-cyber-attacks-in-2019/ (accessed on 29 February 2020). Rathore S, Sharma PK, Park JH. 2017. XSSClassifier: an efficient XSS attack detection approach based on machine learning classifier on SNSs. Journal of Information Processing Systems 13:1014–1028 DOI 10.3745/JIPS.03.0079. Sarmah U, Bhattacharyya DK, Kalita JK. 2018. A survey of detection methods for XSS attacks. Journal of Network and Computer Applications 118:113–143 DOI 10.1016/j.jnca.2018.06.004. Sharafaldin I, HabibiLashkari A, Ghorbani AA. 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP. 108–116 DOI 10.5220/0006639801080116. Villani C. 2009. Optimal transport, old and new. Berlin: Springer Berlin Heidelberg DOI 10.1007/978-3-540-71050-9. Vluymans S. 2019. Learning from imbalanced data. Studies in Computational Intelligence 807:81–110 DOI 10.1007/978-3-030-04663-7_4. Wang Y, Cai W, Wei P. 2016. A deep learning approach for detecting malicious JavaScript code. Security and Communication Networks 9:1520–1534 DOI 10.1002/sec.1441. Wang R, Jia X, Li Q, Zhang S. 2014. Machine learning based cross-site scripting detection in online social network. In: Proceedings - 16th IEEE international conference on high performance computing and communications, HPCC 2014, 11th IEEE international conference on embedded software and systems, ICESS 2014 and 6th international symposium on cyberspace safety and security. Piscataway: IEEE, 823–826 DOI 10.1109/HPCC.2014.137. Wang R, Zhu Y, Tan J, Zhou B. 2017. Detection of malicious web pages based on hybrid analysis. Journal of Information Security and Applications 35:68–74 DOI 10.1016/j.jisa.2017.05.008. Zhou Y, Wang P. 2019. An ensemble learning approach for XSS attack detection with domain knowledge and threat intelligence. Computers and Security 82:261–269 DOI 10.1016/j.cose.2018.12.016. Mokbal et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.328 20/20 https://peerj.com http://dx.doi.org/10.1109/ACCESS.2019.2905015 https://www.precisesecurity.com/articles/cross-site-scripting-xss-makes-nearly-40-of-all-cyber-attacks-in-2019/ https://www.precisesecurity.com/articles/cross-site-scripting-xss-makes-nearly-40-of-all-cyber-attacks-in-2019/ http://dx.doi.org/10.3745/JIPS.03.0079 http://dx.doi.org/10.1016/j.jnca.2018.06.004 http://dx.doi.org/10.5220/0006639801080116 http://dx.doi.org/10.1007/978-3-540-71050-9 http://dx.doi.org/10.1007/978-3-030-04663-7_4 http://dx.doi.org/10.1002/sec.1441 http://dx.doi.org/10.1109/HPCC.2014.137 http://dx.doi.org/10.1016/j.jisa.2017.05.008 http://dx.doi.org/10.1016/j.cose.2018.12.016 http://dx.doi.org/10.7717/peerj-cs.328