key: cord-0532054-nedu5nb0 authors: Chowdhury, Sayeed Shafayet; Islam, Kazi Mejbaul; Noor, Rouhan title: Unsupervised Abnormality Detection Using Heterogeneous Autonomous Systems date: 2020-06-05 journal: nan DOI: nan sha: 9544475567ce4924cdd8875066e5ac38b9acad15 doc_id: 532054 cord_uid: nedu5nb0 Anomaly detection in a surveillance scenario is an emerging and challenging field of research. For autonomous vehicles like drones or cars, it is immensely important to distinguish between normal and abnormal states in real-time to avoid/detect potential threats. But the nature and degree of abnormality may vary depending upon the actual environment and adversary. As a result, it is impractical to model all cases a priori and use supervised methods to classify. Also, an autonomous vehicle provides various data types like images and other analog or digital sensor data. In this paper, a heterogeneous system is proposed which estimates the degree of abnormality of an environment using drone-feed, analyzing real-time image and IMU sensor data in an unsupervised manner. Here, we have demonstrated AngleNet (a novel CNN architecture) to estimate the angle between a normal image and another image under consideration, which provides us with a measure of anomaly. Moreover, the IMU data are used in clustering models to predict abnormality. Finally, the results from these two algorithms are ensembled to estimate the final abnormality. The proposed method performs satisfactorily on the IEEE SP Cup-2020 dataset with an accuracy of 99.92%. Additionally, we have also tested this approach on an in-house dataset to validate its robustness. The autonomous and intelligent vehicle is one of the promises of the fourth industrial revolution of machine intelligence, block chain and the Internet of things. Detecting abnormalities of the autonomous vehicle becomes a hot research field as it's important for providing security and improving autonomous decision-making ability by learning and detecting abnormalities of the surrounding environment by gathering sensory data [1] [2] [3] [4] . Also determining normal/abnormal dynamics in a given scene from an external viewpoint is one of the emerging research fields [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] . This paper proposes an approach for detecting abnormalities by using intelligent and heterogeneous systems in an unsupervised manner where we determine abnormalities of ground and aerial autonomous vehicles that interact with the surrounding environment. The motivation behind using ground and aerial autonomous vehicle simultaneously is to pave the way the research regarding developing a more robust and precise self-aware autonomous vehicle that will understand the surrounding environment and negotiate and interact between various autonomous vehicles better. Nowadays, different methods based on machine learning coupled with signal processing are used to develop various novel methods and models by research in areas like activity/motion recognition [18] , security/prevention of incidents [17] and travel management in cities and urban planning/design [20] [21] . Campo et al. [22] proposed a method to detect abnormal motions in real vehicle situations based on trajectory data where they used Gaussian BMVC 2020 Submission # 836 Kanapram et al. [23] proposed a novel method to detect abnormalities based on internal crosscorrelation parameters of the vehicle with Dynamic Bayesian Network (DBN) to determine the abnormal behavior. Iqbal et. al. [24] proposed a method where they selected an appropriate network size for detecting abnormalities in multisensory data coming from a semiautonomous vehicle. As previous works mostly rely on a high level of supervision to learn private layer (PL) self-awareness models [13,25-27, 30,31] , Ravanbakhsh et. al. [32] proposed a dynamic incremental self-awareness (SA) model which allows experiences done by hierarchical manner, starting from a simpler situation to structured one. In this paper, means of cross-modal Generative Adversarial Network (GAN) is used for processing high dimensional visual data. Baydoun et. al. [33] also proposed a method based on multi-sensor anomaly detection for moving cognitive agents using both external and private first-person visual observations to characterize agents' motion in a given environment where the semiunsupervised way of training as a set of Generative Adversarial Networks (GANs) was used that produce an estimation of external and internal parameters of moving agents. In this paper, we introduced an ensembled method to use both image and IMU sensor data to detect the anomaly of a drone in real-time. In most of the abnormal cases, the drone is shacked or tilted at a significant angle. To estimate the tilt angle, we have introduced AngleNet in this work. Anomaly from IMU data is detected using clustering based algorithm. Ensembling the results, we have calculated the degree of abnormality. This work is implemented practically on a drone in real-time. The rest of the paper is organized as follows. Section 1.1 explains the details of the problem description. Section 1.2 describes the proposed method where the AngleNet and the rest of the clustering methods are explained in detail. Section 1.3 presents the experimental result and comparison and finally, Section 1.4 concludes the work. In SP Cup 2020 [37] , Rosbag [49] files were provided which contained data from IMU sensor and images of respective.time frames. Rosbag files are provided in two different manners. Some files contained only normal time frames while other files contained both normal and abnormal time frames which are mixed. The task is to find the abnormal time frames using unsupervised methods which means we had to use only normal data for training and other calculations and using it we had to find the abnormal cases. Total 12 Rosbag files were provided where 6 of them contained only normal timestamps and the other 6 contained normal and abnormal timestamps. The total number of normal images is 416 and the number of mixed images is 238. Besides image data, they have provided IMU sensor data. There are 6 types of data under IMU topicname among which we have used IMU/data and IMU/mag. A total of 987 normal timestamps was provided into two parts, 300 data were given first in one Rosbag file and 687 timestamps were given in later. There are two separate parts in this detection procedure. First, we used unsupervised clustering algorithms to cluster normal IMU sensor data. Then we have used a deep learning model to model the normal images and anything other than normal is supposed to be abnormal in this procedure. Figure 2 shows some normal image samples provided in this dataset. There are several types of data provided in the Rosbag file under IMU topicname. We have used two types of data named 'data' and 'mag'. We have tried to use as minimum data as possible to model the normal data and found that data are enough to model the normal state. Figure 3 shows the PCA of normal and mixed IMU data. This section describes the method we have used to develop an unsupervised model to detect abnormalities using the image and IMU sensor data. At first, we have developed a model for classifying abnormal and normal images. In this challenge, we are asked to use normal images only, so we cannot use abnormal image samples. But it is very clear that when the abnormal images were taken, the drone was rotated at a significant angle. To estimate the angle without depending on the abnormal data, we have introduced AngleNet which is used in an unsupervised manner to meet the criterion. Besides we have to use K-means clustering algorithm for modeling normal IMU data. Figure 1 shows a flowchart that describes the process with ensembling. In the abnormal state of a surveillance drone, it is mostly the tilt angle that varies from the normal state. In normal conditions, the drone is pretty stable as shown in the dataset. While for the unstable drone, the image is tilted at a significant angle. So, we introduce AngleNet, a novel convolutional neural network architecture to detect significant angle change from the normal state. Building this model, we've taken inspiration from Siamese model architecture and related works regarding this. There're several works regarding Siamese model like Fischer et. al. [43] introduced a method of extracting feature representation by training CNN in a supervised manner and match this feature based on Euclidean distance. Zbontar and LeCun [48] train a Siamese architecture CNN for predicting similarity in image patches. Recent applications of related CNN architecture including semantic segmentation [42, 45, 46, 47] depth prediction [40] , KeyPoint prediction [46] , edge detection [44] and determining optical flow in supervised manner [41] . We have used a similar idea to estimate angles between two images. In this model, a normal image should be provided first, and then the upcoming frames will be taken as input and the output is the angle between them. If there is a significant difference between the images such as object mismatch, the output will be significantly high. Figure 4 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 Figure 5 : Samples of normal images and augmented images from Stanford Car Dataset [8] shows the model structures. To use this model in an unsupervised manner in this case, we have used the Stanford car dataset [34] . The images were augmented as so it can mimic the angle change, as demonstrated in Figure 5 . The activation function of the final layer was relu as it is a non-negative linear function. After training the model using the dataset, we have used the provided dataset by SP Cup committee to estimate the degree of abnormality of the images. The performance of the model on classifying normal images is demonstrated in Table 2 . For both the IMU/data and IMU/mag, we have directly fed the data into K-means classification. First, we have measured the optimum number of clusters using the elbow method [35] as shown in Figure 5 . In this algorithm, k-means clustering is applied on the dataset for a range of values of k, in this case, maximum k = 50. For each value of k, the sum of squared error is calculated using (1). For each k, SSE is plotted and if the plot forms and arm, then elbow of that arm is the optimum value of k. In both cases, we have found k = 10 is optimum, produces less error for normal time frames which will be discussed in section 1.3. After the selection of an optimum number of clusters, we have used clustering-based anomaly detection, described in [24] . From the PCA plots Figure 3 , we can see that normal points remain closer with low variance. So assuming this we can conclude that abnormal data points will maintain a good distance from the normal mean. In this scheme, we measure the distance Dp, between a cluster center and the most distant point in that cluster. When a foreign data falls in that cluster we measure the n-dimensional distance between the cluster center and that data point where n is the dimension of data and calculate the degree of abnormality using (2). Here i and j represent the coordinate of the cluster center and foreign data point respectively. n is the number of Childs in a cluster. is the degree of abnormality. If is greater than 1, we consider the test point as abnormal otherwise the point is supposed to be normal. Throughout the process, we neither used any data from the mixed dataset where both normal and abnormal data are kept together nor generated abnormal samples of this problem set and performed supervised training. So, undoubtedly this process is unsupervised, according to [38] . Metric Value IMU/data Accuracy 99.96% False-negative 0.034% IMU/mag Accuracy 99.97% False-negative 0.03% Table 1: Performance on IMU data. 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 Threshold Angle In this section, the experiments on the dataset provided by the organizing committee are explained and also the result is compared with some other algorithms. Both for IMU/data and IMU/mag, we have used a clustering-based anomaly detection system [39] , also discussed in section 1.1. For modelling the normal timestamps in both cases, we have used 10 clusters as optimum k from the elbow method. While training, we have used 600 data from training and the rest for testing the performance. Table 1 shows the performance of the model. Normalization does not make any significant difference so no sample was normalized while training or testing. IMU/mag data is 3 dimensional and the plot is shown in Figure 7 Using AngleNet, we can estimate the angle between the test image and the normal image. In the abnormal images, the rotation angle is the main distinctive factor. For preparing the training dataset, we have tilted the images from Stanford Card Dataset [34] by 5, 30, 50 and 70-degree angle. And the output is then divided by 90.0 to get the abnormality. It represents the degree of abnormality concerning the rotation angle. For our experiment, we have used 0.33 (30 degrees) as the rotation limit for the normal image, any image rotated by greater than 30 degrees is supposed to be abnormal. But the threshold is perfectly tuneable and userdefined. The performance of AngleNet on the normal image is shown in Table 3 Table 3 : Results of proposed methods on mixed data. The process is designed so that both the clustering-based anomaly detection and AngleNet can be used separately or in an ensemble manner. As we have discussed we calculate the degree of abnormality in each case, ensembling them can produce a combined result. Proposed ensembling formula is given as- Where N is the combined degree of abnormality, wd, wm, wI represents the weights for three different models such as two clustering-based models and one Convolutional Neural Network-based model, AngleNet. And σd, σm, σI represent the degree of abnormality for IMU/data, IMU/mag, and Image respectively. For anomaly detection in IMU data or similar data types, there have been a lot of algorithms used. Autoencoder [10] is used by some researchers and in [13] , the author proposed reconstruction error-based anomaly detection. For this dataset of SP Cup 2020, the accuracy of autoencoder based anomaly detection on normal data is 97.7%. All the algorithms are tested on mixed data where both the abnormal are normal timestamps are present. Table 3 shows the performance. During training AngleNet, only the dataset of [34] was used. And the weights were used to calculate the degree of abnormality and it shows a good performance on the normal dataset as shown in Table 3 . Due to this transferring of weights and not using any abnormal images while training, this process is undoubtedly unsupervised. While training AngleNet on the Car dataset [34] , we have used GoogleColab with Nvidia Tesla K80 GPU with 12 Gigabytes of memory. But for the testing purpose, it runs on a computer with 2 Gigabytes of GPU seamlessly. The system is tested on a system containing the Intel Core i5 processor, 8 Gigabytes of RAM and Nvidia 940 MX. It takes 0.47 seconds on an average to process a single frame. For demonstrating real-time usage on an embedded device, it is used on a raspberry pi where the clustering algorithms run on the pi and CNN based processing works on a remote server. The system is tested on in-house setup, with custom hexacopter running on Ardupilot and used raspberry pi 3 for real-time processing and sending video frames to the server. In this setup, the accuracy of the algorithm was 96.7%. In this paper, we have demonstrated an ensembled approach for vehicle anomaly detection. Our approach did not classify any sample strictly normal or abnormal, rather we have used the degree of abnormality, the lower the value the closer it is to normal situation. We have introduced a novel AngleNet which is used to identify abnormal image samples. As it is asked by the organizers to use unsupervised classification between abnormal and normal, we could not use any data from abnormal samples for training. So we have trained AngleNet on another dataset [8] and used the weight to determine the degree of abnormality. Kmeans classification is very popular for anomaly detection but modeling all normal data in a single cluster would not be a good idea in this case. So we have used multiple clusters to 900 Learning Switching Models for Abnormality Detection for Autonomous Driving Learning Multi-Modal Self-Awareness Models for Autonomous Vehicles from Human Driving Self-aware Computing Systems An Engineering Approach Multi-Perspective Approach to Anomaly Detection for Self -Aware Embodied Agents Static force field representation of environments based on agents' nonlinear motions Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance Multisensor-fusion for 3D full-body human motion capture Abnormal behavior detection and behavior matching for networked cameras Multi-camera open space human activity discovery for anomaly detection Toward Measurement of Situation Awareness in Autonomous Vehicles Unsupervised Trajectory Modeling Based on Discrete Descriptors for Classifying Moving Objects in Video Sequences Plug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal Event Detection Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes Crowd behavior representation: an attribute-based approach Temporal Poselets for Collective Activity Detection and Recognition Analyzing pedestrian behavior in crowds for automatic detection of congestions A Survey on Approaches of Motion Mode Recognition Using Sensors Analyzing pedestrian behavior in crowds for automatic detection of congestions A Review of Motion Planning Techniques for Automated Vehicles Urban Computing Learning Probabilistic Awareness Models for Detecting Abnormalities in Vehicle Motions Self-awareness in Intelligent Vehicles: Experience Based Abnormality Detection Clustering Optimization for Abnormality Detection in Semi-Autonomous Systems Gaussian process regression flow for analysis of motion trajectories Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video Dynamic representations for autonomous driving Gaussian process regression flow for analysis of motion trajectories Multi-Perspective Approach to Anomaly Detection for Self -Aware Embodied Agents Toward Measurement of Situation Awareness in Autonomous Vehicles A machine learning based intelligent vision system for autonomous object detection and recognition Hierarchy of Gans for Learning Embodied Self-Awareness Model Multi-Perspective Approach to Anomaly Detection for Self -Aware Embodied Agents The Determination of Cluster Number at k-Mean Using Elbow Method and Purity Evaluation on Headline News Autoencoders, Unsupervised Learning, and Deep Architectures Unsupervised abnormality detection by using intelligent and heterogeneous autonomous systems Anomaly Detection Using Autoencoders in High Performance Computing Systems A survey of anomaly detection techniques in financial domain Depth map prediction from a single image using a multi-scale deep network FlowNet: Learning Optical Flow with Convolutional Networks Learning Hierarchical Features for Scene Labeling Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT N^4 -Fields: Neural Network Nearest Neighbor Fields for Image Transforms Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Hypercolumns for object segmentation and fine-grained localization Fully convolutional networks for semantic segmentation Computing the stereo matching cost with a convolutional neural network Robotic Operating System