key: cord-0059344-0tlm0oe6 authors: Bangui, Hind; Ge, Mouzhi; Buhnova, Barbora title: Improving Big Data Clustering for Jamming Detection in Smart Mobility date: 2020-08-01 journal: ICT Systems Security and Privacy Protection DOI: 10.1007/978-3-030-58201-2_6 sha: 5b954a65b6357e54ddaa440b168da33f92d7d77f doc_id: 59344 cord_uid: 0tlm0oe6 Smart mobility, with its urban transportation services ranging from real-time traffic control to cooperative vehicle infrastructure systems, is becoming increasingly critical in smart cities. These smart mobility services thus need to be very well protected against a variety of security threats, such as intrusion, jamming, and Sybil attacks. One of the frequently cited attacks in smart mobility is the jamming attack. In order to detect the jamming attacks, different anti-jamming applications have been developed to reduce the impact of malicious jamming attacks. One important step in anti-jamming detection is to cluster the vehicular data. However, it is usually very time-consuming to detect the jamming attacks that may affect the safety of roads and vehicle communication in real-time. Therefore, this paper proposes an efficient big data clustering model, coresets-based clustering, to support the real-time detection of jamming attacks. We validate the model efficiency and applicability in the context of a typical smart mobility system: Vehicular Ad-hoc Network, known as VANET. Nowadays, smart mobility has become a critical transportation infrastructure [9, 38] in smart cities as it provides a variety of mobility services, such as relieving the traffic congestion in cities and providing better access to public transport. In [1] , smart mobility is described as: "The use of ICT in modern transport technologies to improve urban traffic". Likewise, in [42] , it is defined as: "local and supra-local accessibility, availability of ICTs, modern, sustainable and safe transport systems". Thus, smart mobility can be considered as a management strategy that produces decisions based on the collected data [9, 31] from vehicle-to-anything communications, such as vehicle-to-vehicle, vehicle-toinfrastructure, vehicle-to-pedestrian, and infrastructure-to-pedestrian [43] . For instance, a road infrastructure monitoring system uses e-bikes proposed in [25] to support municipal transportation activities. Further, an image processing application based on deep learning has been integrated into e-bikes to facilitate the detection of road anomalies such as litter and damage of roads. It can be seen that smart mobility is a fruitful domain that integrates different up-todate techniques such as the autonomous driving Internet of Things (IoT) and machine learning. In [21] , a vehicular interface notation has been developed to help older customers of smart vehicles to control their driving experience better and improve their cognitive ability. Similarly, in [40] , a machine learning system has been proposed to exploit data from autonomous vehicles and external IoT data sources to predict pedestrian's next movement steps using real-time trajectory. The suggested solution ensures safety in urban environments by enhancing autonomous driving efficiently in local public transport services. The recent works indicate that smart mobility is driven by data-intensive processes focused on managing people's mobility and personalizing transport solutions according to the specific needs of cities [8] . On the other hand, smart mobility advances the transport services and quality of citizens' life in smart cities [16, 17, 29] . Enabling smart mobility in urban environments is, however, challenging because attackers are attempting to access or tamper with valuable mobility data (e.g., personal user information) or disrupt network communication. Since smart mobility generates a massive amount of data, such as sensor data on the road or vehicle communication data, various security applications make use of big data analytics to secure smart mobility applications. There are different prevalent attacks in the smart mobility domain. One of the frequently cited ones is the jamming attack. Jamming attacks can severely influence road safety and vehicle communications. In order to detect the jamming attacks in smart mobility, different antijamming applications have been proposed, which, however, suffer from the inefficiency of the data clustering during the jamming attack detection. In this paper, we, therefore, propose a solution to support the real-time detection of jamming attacks via efficient big data clustering of vehicular data. The solution is mainly based on the coresets technique and Vehicular Ad-hoc Network (VANET) is used as a practical scenario of smart mobility [5, 10] , where we consider particularly the application of clustering algorithms in anti-jamming detection solutions designed for securing VANET communications. The remainder of the paper is organized as follows. Section 2 is dedicated to understanding the vulnerability of smart mobility systems by using VANET as an example of smart mobility applications. Then in Sect. 3, we provide an overview of related work concerning anti-jamming applications based on clustering techniques. In Sect. 4, we present a solution that aims at increasing efficiency the clustering process while keeping the quality of analytics. In Sect. 5, we conduct an experiment to show the benefits of the proposed solution. Finally, Sect. 6 concludes the paper and outlines future research. Smart mobility intends to control the behavior of smart devices in urban environments by collecting, sharing, and utilizing trace data. Vehicular Ad-hoc Network (VANET) is a typical smart mobility system that can be used to share data within vehicle-to-anything communication. For example, VANET applications such as smart cars are used to support the safety of traffic flows [20, 35, 39] . The vehicles exchange messages with neighboring vehicles (members of a VANET) and with RSUs (roadside units) to inform them about e.g. their location and speed, and get traffic conditions of the road. However, a malicious attacker may remotely access a target vehicle, possibly tampering with the behavior of the vehicle, such as misinforming the driver. Constant jamming is one type of jamming attacks that is considered as a severe threat to VANET security [22, 39] . In this attack, a jammer regularly sends repeated signals to interfere with the communication between vehicles in the affected network area, where the target vehicles think that the state of the channel is still busy. Consequently, they cannot send or receive packets that can be carrying important information, such as weather and accidents. In other words, when jamming occurs, the sender may send packets. However, the receiver might not be able to receive all the packets sent by the sender. Thus, the failure of receiving or disseminating these packets can lead to the insufficiency of the VANET. Smart mobility systems require the optimized use of detection attack applications to cope with the security and privacy threats. Several detection attack systems are proposed in the previous literature [19, 39] . Table 1 presents a list of typical attacks in smart mobility. However, developing security and privacy solutions is more challenging in smart mobility infrastructure as data are subject to several malicious attacks causing wrong outcomes (i.e., wrong traffic). Especially, the big data clustering technique is used to facilitate the attack detection. Thus, we focus on examining how big data clustering algorithms in smart mobility [5, 10] are investigated to deal with vulnerable attacks. Particularly, it is valuable to study the clustering for detection applications that deal with jamming attacks caused several damages, such as disruption of car-to-car communications. The concept of big data clustering is very important in smart mobility since it contributes to improving the sustainability, scalability, and reliability of smart mobility systems [5] , such as associating mobile nodes into groups, ensuring the stability of channel access management, traffic safety, and QoS Assurance. Many jamming detection approaches in VANET have exploited the advantages of clustering algorithms by collecting jamming measurements and then accurately grouping them into the cluster [6, 32] . For example, in [15] , a novel jamming detection framework was proposed to detect the presence of a jammer in hierarchical cluster-based wireless sensor networks. The proposed anti-jamming detection method also exploits the benefits of the unsupervised hierarchical algorithm for achieving energy efficiency by re-clustering, overcoming network issues (i.e., reducing the communication overhead), decreasing collision, and improving throughput. Similarly, in [23] , a jamming detection solution was developed by leveraging the K-means algorithm, which is one of the most commonly used partitioned clustering algorithms in Big Data Analytics. This work reflects the benefits of using clustering algorithms in VANETs, where the advantages of kmeans are used to differentiate intentional cases from unintentional jamming ( Table 2 ). The collected jamming measurements are grouped into the interference cluster accurately, and then the specific characteristics of each attack are extracted. Thus, the unsupervised method is aimed at determining whether jamming occurs due to a malicious jammer or whether it is caused unintentionally. Consequently, if jamming is correctly identified as interference, vehicles can preserve their communications either by changing their channel or by temporarily altering their route. Likewise, in [4] , a multi-cluster localization (M-cluster) algorithm and an x-rayed jammed-area localization (X-ray) algorithm were successively developed based on fuzzy c-means and K-means to deal with the multi-jammer localization problem in WSNs, which could launch collaborative attacks. In [34] , the advantages of K-means were used to predict the number of multiple jamming attackers and ensure the preset functions of VANET. In [33] , an anti-jamming method based on fuzzy c-means was proposed to determine the localization and number of jamming attackers. Accordingly, the cluster analysis process simplifies data manipulation by finding similar structures in the data and classifying each object according to its nature. As a result, vehicles can adequately avoid malicious jamming attacks, decrease their collisions, and preserve their communications [27, 28, 30] . Nevertheless, the existing anti-jamming solutions suffer from efficiency issues due to the growth of smart mobility data and it is time-consuming to perform a computational clustering process. Furthermore, vehicles in the smart mobility context are producing big data at a rapid rate in the dynamic urban environment. Thus, the time and cost of the clustering process will increase since they depend on the volume of datasets, which is definitely difficult to be handled in real-time. Yet, the study of data prioritization is required since it aims at serving the real-time Big Data Analytics by selecting the most valuable data from the initial input data [7] . As a result, the anti-jamming applications can detect in real-time viral attacks that cause smart mobility system failures. In this section, we propose a model that aims at minimizing the response time of anti-jamming detection by accelerating the clustering process. Figure 1 presents a general process of attack detection based on the application of data clustering, where a predefined list of features is extracted from vehicular data to detect the characteristics of jamming attacks. The selection of features is according to the context of the proposed anti-jamming solutions, for example, GPS information is used to recognize cases of intentional jamming [33] . After that, the clustering method can be used to analyze vehicular data and classify timely the malicious nodes from benign ones. The coresets can be used to accelerate clustering the big mobility data. In the context of jamming detection, the anti-jamming application is able to deal with the specific characteristics of each jamming attack timely and effectively. The idea of using approximated data has been investigated in sensor networks [12, 14] , where coresets are used to extract small data samples that represent the original data approximately, and then solve compression issues of trajectory data in road networks [12] [13] [14] , such as improving the run-time performance of location-based applications. Moreover, coresets not only can reduce the data scale while keeping the original data distribution [11, 24] , but also can be used for improving the quality of clustering [18, 37] . For Example, ProTraS [37] is a coreset construction algorithm that aims at generating a data sample to deal with big data clustering problems [2, 3] . The main idea of ProTraS is to select a representative point based on a probability of cost reduction. Given an > 0, for each iteration of the algorithm, it adds a new representative into a group of the sample with the highest probability of the cost reduction. When the cost drops below a threshold, which depends on , the algorithm stops. The algorithm finds the nearest group for points that are not yet assigned to any group of the current sample. The point among them is determined to be the new representative if it is farthest in its group and has the highest probability. That means, the representative selected by ProTraS is the farthest-first traversal item of a given group. As a result, this coreset construction algorithm leads to enhancing the quality of clusters that are required for ensuring the accuracy of the Big Data Analytics outcomes [2, 3] . In this work, we aim to investigate the advantages of coresets to optimize the quality of clustering used in anti-jamming detection. Particularly, we use coresets method to deal with the clustering formulation and complexity. We have referred to the coresets technique discussed in [41] . This is an improvement version of the ProTraS algorithm [37] by using a post-processing task. Given a dataset P = {x i }, for i = 1, 2, . . . , n and a given > 0, the method firstly calls ProTraS to obtain S = {y j } and P (y j ) for j = 1, 2, . . . , s. The method next tries to find some sample points that have low representativeness and remove them from the sample. A point in remaining points is then replicated by the center of the set of patterns which the point represents. The details of the method are given in Algorithm 1. Coresets-based algorithm for sampling [41] Require: P = {xi}, for i = 1, 2, . . . , n, a if |P (yj)| is greater than a threshold then 5: y * k = arg min y k ∈P (y j ) y l ∈P (y j ) d(y k , y l ). Line 4 determines which sample points will be select into our sample S . This is performed using a threshold. |P (y j )| denotes the number of patterns in P with y j ∈ S being their representative. A small value of |P (y j )| means that the representativeness of y j is low. Accordingly, it is removed from the sample. The value of the threshold should be chosen due to the distribution characteristics of datasets. For y j ∈ S that is not removed, line 5 computes the center of the group represented by y j , to consider replacing it. The center here, denoted by y * k , is defined to be the point in P (y j ) such that the total distance to all others in the group is minimized. The set S including such y * k is the output sample of the algorithm. In the experiment, we focus on examining how the integration of the coreset method [41] can facilitate the analysis process of anti-jamming applications. To do that, we study vehicular data clustering. Then, we present the details of clustering quality evaluation. The goal of this experiment is to detect the presence of a constant jamming attack. This latter is detected by monitoring the signal power that is reported via the Received Signal Strength Indicator (RSSI), which is an expression of the SNIR (Signal-to-Noise-and-Interference Ratio). In the presence of the malicious attack, the probability of successful message reception is decreased as well as SNIR is decreased too. For achieving this experiment, we have initially referred to a study in [36] that has explored the impact of different jamming attacks, including a constant jamming case, in VANETs. Then, we have selected its dataset 1 that contains traces of 802.11p packets with and without the presence of constant jamming signals. Table 3 presents the network configuration used for creating a series of constant jamming scenarios. The number of generated packets is 25,000. The vehicular network features used in this experiment are as follows: Node-Id-number, type, vehicle position, GPS-time, speed, time sender, time receiver, RSSI, SNIR, and vehicle heading. For storage and clusters, we used the permanent cloud environment offered by MetaCentrum 2 . Our goal in this experiment is to evaluate the representation of clustering sampling yielded from K-means and its improved versions, which are: K-means++ and Fuzzy c-means. We selected k-means as our clustering algorithm, as k-means is a widely used and efficient unsupervised algorithm that uses an iterative method to divide a given dataset into several clusters noted as k. Next, the produced clusters are positioned as points, and all samples are linked with the nearest cluster and adjusted; then, the process overuses the new adjustments until the desired result is achieved. Thus, this algorithm is easy to implement, efficient in terms of its computational costs, and offers easily interpretable clustering results. On the other hand, since K-means is sensitive to initialization, it is sensitive to the presence of outliers because the "mean" is not a robust statistic value. Therefore, k-means may yield poor outcomes and take more processing time. For that reason, we have evaluated the quality of the obtained clusters both with and without the application of the coresets method. We used the improved versions of K-means (i.e., fuzzy c-means) to evaluate how coresets could influence the quality of the clusters by using a list of metrics. For instance, the Dunn index (DI) is used as an internal evaluation metric to determine how well each sample lies within its cluster. A higher DI indicates better clustering. Likewise, we used a second internal metric, named the Davies-Bouldin index (DBI), to evaluate how well the clustering has been done by using the quantities and features inherent to the selected database. A lower DBI indicates better clustering. Figure 2 represented the mapping of the SNIR time evolution. One can see that the sample size is reduced from 25,000 to 479 due to the application of the coreset. Consequently, the time analysis process is also reduced. Thus, the combination of the coresets with clustering algorithms could help the anti-jamming applications to learn quickly from the approximated data that represent the original data source. As a result, they can detect the presence of constant jamming attacks rapidly. Meanwhile, the results of internal and external cluster validity indices in Fig. 3 showed that the application of k-means, k-means++, and Fuzzy c-means based on coresets provides promising results compared to their regular application. DBI (Fig. 3a) and DI (Fig. 3b) achieved better values with the coresets compared to the original application of clustering algorithms. Further, DBI and DI reflect how well each sample lies within its cluster. Accordingly, the integration of the coresets with k-means and its improved versions increases the quality of clusters. Besides, K-means shows better efficiency than k-means++ and Fuzzy c-means in term of time. However, we noticed that the application of the coresets supports Fuzzy c-means to proceed quickly compared to its regular time process. Thus, the proposed solution keeps and even significantly improves the quality of the clusters in terms of DBI and DI measurements. From Fig. 4 , one can see that the time of the clustering process is improved on average 132 times across k-means, k-means++, and Fuzzy c-means. Furthermore, all the clustering time is within 1 s, which indicates that the solution can facilitate real-time jamming detection in VANET. In other words, the anti-jamming applications can detect the presence of the viral constant jamming attack and cope with it in real-time, which is a good starting point not only for enhancing the security mechanisms adopted by anti-jamming applications but also for supporting the other detection attack systems based-clustering that have to deal with viral attacks in real-time. On the other side, the experiment results could be a big motivation for further use of approximated data in smart mobility systems not only for avoiding (or minimizing) the negative impact of malicious attacks, such as damage of personal properties (i.e., cars) and sharing wrong traffic information, but also for supporting the progress of smart mobility applications in urban environments. In this paper, we have proposed a model based on the coresets techniques to address the real-time jamming attack detection in smart mobility. Our model demonstrates how to process the big mobility data and use clustering techniques in anti-jamming applications. In order to validate the proposed model, we have conducted an experiment in the VANET setting and our results have shown that the proposed solution can significantly increase the detection efficiency of anti-jamming applications while keeping the clustering quality. With the significant decrease in clustering time, the proposed solution enables the anti-jamming applications to perform real-time jamming detection in smart mobility. Furthermore, our model can also be easily integrated into different smart mobility systems and used to advance the efficiency of other big data applications in the Internet of Vehicles. As future work, we plan to conduct more experiments with other clustering algorithms, and extend the coresets to detect and discoverer other attacks in smart mobility. Furthermore, we plan to deploy our solution in different realworld scenarios such as the Internet of Vehicles and benchmark the performance of the proposed solution. Smart cities: definitions, dimensions, performance, and initiatives Exploring big data clustering algorithms for Internet of Things applications A research roadmap of big data clustering algorithms for future internet of things M-cluster and x-ray: two methods for multijammer localization in wireless sensor networks A comparative survey of VANET clustering techniques Jamming attacks reliable prevention in a clustered wireless sensor network Fog based intelligent transportation big data analytics in the internet of vehicles environment: motivations, architecture, challenges, and critical issues A system dynamic approach for the smart mobility of people: implications in the age of big data Information integrity for multisensors data fusion in smart mobility Energy efficient optimal routing for communication in VANETs via clustering model Turning big data into tiny data: constantsize coresets for k-means, PCA and projective clustering An effective coreset compression algorithm for large scale sensor networks The single pixel GPS: learning big data signals from tiny coresets Coresets for differentially private kmeans clustering and applications to privacy in mobile sensor networks A novel jammer detection framework for cluster-based wireless sensor networks Big data for Internet of Things: a survey Smart mobility creating smart space: 3D smart aquarium bus On coresets for k-means and k-median clustering VANet security challenges and solutions: a survey Smart mobility and driver behavior correlated with vehicular networks under a social perception in smart cities Users as programmers: developing a vehicular interface notation for older users of smart vehicles Survey on security for WSN based VANET using ECC Jamming attack detection in a pair of RF communicating vehicles using unsupervised machine learning Tight clustering for large datasets with an application to gene expression data Road infrastructure monitoring system using e-bikes and its extensions for smart community RF jamming classification using relative speed estimation in vehicular wireless networks A novel intrusion detection system for vehicular ad hoc networks (VANETs) based on differences of traffic flow and position Anti-jamming communications using spectrum waterfall: a deep reinforcement learning approach Smart cities and smart tourism: what future do they bring? DJAVAN: detecting jamming attacks in vehicle ad hoc networks Vehicular social networks: enabling smart mobility A statistical approach to detect jamming attacks in wireless sensor networks Localization of multiple jamming attackers in vehicular ad hoc network Estimating the number of multiple jamming attackers in vehicular ad hoc network Assessing the reliability of fog computing for smart mobility applications in VANETs Experimental characterization and modeling of RF jamming attacks on VANETs ProTras: a probabilistic traversing sampling algorithm Smart mobility The future of mobility with connected and autonomous vehicles in smart cities Learn from IoT: pedestrian detection and intention prediction for autonomous driving Scaling big data applications in smart city with coresets Smartmentality: the smart city as disciplinary strategy Smart mobility: new roles for telcos in the emergence of electric and autonomous vehicles