key: cord-0044426-ed6itb9u authors: Aljubairy, Abdulwahab; Zhang, Wei Emma; Sheng, Quan Z.; Alhazmi, Ahoud title: SIoTPredict: A Framework for Predicting Relationships in the Social Internet of Things date: 2020-05-09 journal: Advanced Information Systems Engineering DOI: 10.1007/978-3-030-49435-3_7 sha: 640ebed891062688f0b4f683d777b92377ccca63 doc_id: 44426 cord_uid: ed6itb9u The Social Internet of Things (SIoT) is a new paradigm that integrates social network concepts with the Internet of Things (IoT). It boosts the discovery, selection and composition of services and information provided by distributed objects. In SIoT, searching for services is based on the utilization of the social structure resulted from the formed relationships. However, current approaches lack modelling and effective analysis of SIoT. In this work, we address this problem and specifically focus on modelling the SIoT’s evolvement. As the growing number of IoT objects with heterogeneous attributes join the social network, there is an urgent need for identifying the mechanisms by which SIoT structures evolve. We model the SIoT over time and address the suitability of traditional analytical procedures to predict future relationships (links) in the dynamic and heterogeneous SIoT. Specifically, we propose a framework, namely SIoTPredict, which includes three stages: i) collection of raw movement data of IoT devices, ii) generating temporal sequence networks of the SIoT, and iii) predicting relationships among IoT devices which are likely to occur. We have conducted extensive experimental studies to evaluate the proposed framework using real SIoT datasets and the results show the better performance of our framework. Crawling the Internet of Things (IoT) to discover services and information in a trusted-oriented way remains a prolonged challenge [21] . Many solutions have been introduced to overcome the challenge. However, due to the increasing number of IoT objects in a tremendous rate, these solutions do not scale up. Integrating social networking features into the Internet of Things (IoT) paradigm has received an unprecedented amount of attention for the purpose of overcoming issues related to IoT. There have been many attempts to integrate IoT devices in social loops such as Smart-Its friend procedure [8] , Blog-jects [3] , Things that Twitter [9] , and Ericson Project 1 . A new paradigm has emerged from this, called Social Internet of Things (SIoT), and the key idea of this paradigm is to allow IoT objects to establish relationships with each other independently with respect to the heuristics set by the owners of these objects [1, 2, 15, 18] . The perspective of SIoT is to incorporate the social behaviour of intelligent IoT objects and allow them to have their own social networks autonomously. There are several benefits to the SIoT paradigm. First, SIoT can foster resource availability and enhance services discovery easily in a distributed manner using friends and friends of friends [1] , unlike traditional IoT where search engines are employed to find services in a centralized way. Second, the centralized manner of searching IoT objects raises scalability issue, and SIoT overcomes the issue because each IoT object can navigate the network structure of SIoT to reach other objects in a distributed way [2, 20, 21] . Third, based on the social structure established among IoT objects, things can inquire local neighbourhood for other objects to assess the reputation of these objects. Fourth, SIoT enables objects to start new acquaintance where they can exchange information and experience. Many research efforts have been devoted to realizing the SIoT paradigm. However, the majority of the research activities focused on identifying possible policies, methods and techniques for establishing relationships between smart devices autonomously and without any human intervention [1, 2] . In addition, several SIoT architectures have been proposed [2, 5, 17] . In spite of the intensive research attempts on SIoT, there are insufficient considerations to model and analyze the resulted SIoT networks. The nature of SIoT is dynamic because it can grow and change quickly over time where nodes (IoT objects) and edges (relationships) appear or disappear. Therefore, there is a growing interest in developing models that allow studying and understanding this evolving network, in particular, predicting the establishment of future links (relationships) [1, 15] . Predicting future relationships among IoT objects can be utilized for several applications such as service recommendation and service discovery. Thus, there is a need for identifying the mechanisms by which SIoT structures evolve. This is a fundamental research question that has not been addressed in SIoT yet, and it forms the motivation for this work. However, the size and complexity of the SIoT network create a number of technical challenges. Firstly, the nature of the resulted network structure is dynamic because smart devices can appear and disappear overtime and the existed relationships may vanish and new relationships may establish. Secondly, SIoT is naturally structured as a heterogeneous graph with different types of entities and various relationships [2, 18] . Finally, the size of SIoT network is mas-sive, and hence, it requires efficient and scalable methods. Therefore, this paper focuses on modelling the SIoT network and study, in particular, the problem of predicting future relationships among IoT objects. We study the possibility of relationship establishment among IoT objects when there is co-occurrence meeting in time and space. Our research question centers on how likely two IoT objects could create a relationship between each other when they have been approximately on the same geographical location at the same time on multiple occasions. In our work, we develop the SIoTPredict framework, which includes three stages: i) collecting the raw movement data of IoT devices, ii) generating temporal sequence networks of SIoT, and iii) predicting future relationships that may be established among things. The salient contributions of our study are summarized as follows: -Designing and implementing the SIoTPredict framework for studying the SIoT network. The SIoTPredict framework consists of three main stages for i) collecting raw movement data of IoT devices, ii) generating temporal sequence networks, and iii) predicting future relationships among things. To the best of our knowledge, our framework is the first on SIoT relationship prediction. -Generating temporal sequence networks of SIoT. We develop two novel algorithms in the second stage of our framework. The first algorithm identifies the stays of IoT objects and extracts the corresponding locations. The second algorithm, named Sweep Line Time Overlap, discovers when and where any two IoT objects have met. -Developing a Bayesian nonparametric prediction model. We adopt the Bayesian nonparametirc learning to build our prediction model. This model can adapt the new incoming observations due to the power representation and flexibility of Bayesian nonparametric learning. -Conducting comprehensive experiments to assess our framework. SIoTPredict has been evaluated by extensive experiments using real-world SIoT datasets [12] . The results demonstrate that our framework outperforms the existing methods. The rest of this paper is organized as follows. Section 2 discusses the related works. Section 3 presents heterogeneous graph modeling for Social IoT and introduces the SIoTPredict framework. The experimental results on real SIoT datasets are presented in Sect. 4, and finally Sect. 5 concludes the paper. SIoT is still in the infancy stage, and several efforts have been devoted to realizing the SIoT paradigm. Most of the current research activities focused on identifying possible policies, methods and techniques for establishing relationships between smart devices autonomously and without any human intervention [2] . Atzori et al. [2] proposed several relationships that can be established between IoT objects as shown in Table 1 . Some of these relationships are static such as POR and OOR, which can usually be defined in advance. Other relationships are dynamic and can be established when the conditions of the relationship are met. Roopa et al. [18] defined more relationships that may be established among IoT objects. Nevertheless, current SIoT research lacks effective modelling and analysis of SIoT networks. However, in the context of IoT, there are a few attempts to exploiting the relationships among smart devices and users for recommending things to users. Yao et al. [23] proposed a hyper-graph based on users' social networks. They used existing relationships among users and their things to infer relationships among IoT objects. They leveraged this resulted network for recommending things of interest to users. Mashal et al. [14] modelled the relationships among users, objects, and services as a tripartite graph with hyper-edges between them. Then they explored existing recommendation algorithms to recommend third-party services. Nevertheless, these works are mainly based on users' existing relationships and the things they own. Atzori et al. [1] emphasized to modelling and analyzing the resulted social graphs (uncorrelated to human social networks) among smart objects in order to introduce proper network analysis algorithms. Therefore, our work aims to model the SIoT network in order to allow studying relationships prediction (link prediction) among IoT objects that may form in the future. Link prediction is considered as one of the most essential problems that have received much attention in network analysis, and in particular, when anticipating the network structure at a future time. A large body of work has investigated link prediction with various aspects including similarity-based measures, algorithmic methods, and probabilistic and statistical methods [11, 13] . Recently, there is a growing interest in developing probabilistic network models using Bayesian nonparametric learning. Bayesian nonparametric is capable to cap- This relationship is established among objects when they come into contact, sporadically or continuously, because they or their owners come in touch with each other during daily routine Dynamic ture the network evolution in different time steps by finding latent structure in observed data. Latent class models such as Stochastic Blockmodels (SBs) [7] and Mixed Membership Stochastic Blockmodels (MMSB) [19] depend on the vertex-exchangeability perspective where nodes are the target unit to assign into clusters. However, these models suffer from generating dense networks while most of the real-world networks tend to be sparse. To overcome this limitation, edgeexchangeable models have been proposed to deal with sparse networks [4, 22] . In this perspective, edges are the main units to assign into clusters. In this section, we describe the dynamic heterogeneous SIoT graph modelling and then present the details of our SIoTPredict framework. A dynamic, heterogeneous SIoT graph is composed of nodes and edges where nodes represent IoT devices, and edges represent relationships that could be of multiple different types. The formal definition is as follows. An SIoT network can be considered as a temporal sequence of networks (as depicted in Fig. 1a . , e t n } contains N edges observed at time t and the set of vertices V t is the set of vertices that have at least participated in one edge up to t such that represents the feature matrix at time t where x i is the attribute vector of node v n . Throughout the paper, we consider the dynamic relationship establishment using SIoT data as a case study. Figure 1b shows an example of the SIoT heterogeneous network. For this application, we assume that a heterogeneous SIoT graph has been obtained at time t from the SIoT data. Given these data, we will predict the likelihood of a relationship (edge) creation between any two IoT devices (nodes). This section explains our SIoTPredict framework for predicting future relationships in SIoT. Figure 2 gives an overview of the SIoTPredict framework. The framework includes three stages namely: Stage 1: collection of the raw movement data of IoT devices, Stage 2: generating the temporal sequence networks of SIoT, and Stage 3: prediction future relationships of the SIoT. In the following, we will provide more details on these three stages. In the first stage of our framework, we collect the raw movement data of IoT devices. We distinguish two types of IoT devices: mobile and static. The coordinates of a static device (e.g., a light pole) are stationary and known whereas the coordinates of a mobile device (e.g., a bus) is dynamic and changing while the device is moving. We assume that mobile devices include GPS technology which provides the location coordinates of these devices along with the timestamp. We also assume mobile IoT devices send their location history records continuously (e.g., every 60 seconds). Each record contains some important fields: (device id, latitude, longitude, and timestamp). 1. Definition 1. A location history record is represented by a point on the earth (latitude, longitude) and a timestamp. This record tells where an IoT object is at a specific time (as illustrated in Fig. 3 ). IoT object (as illustrated in Fig. 3 ). Phase 1) Identifying "stays" from the raw movement data and extracting "locations". We are interested in knowing where objects meet. Therefore, we first need to identify the stays for all objects using their raw movement data. Then, we extract locations from these stays. This enables us to identify where and when IoT objects have stayed. A stay is a sequence of N location history records, which can be represented by (longitude, latitude, Start-Time, End-Time). Longitude and latitude represent the average of longitude and latitude values in the sequence. Start-Time indicates the smallest timestamp in the sequence, and End-Time represents the largest timestamp (see Fig. 3 ). 2. Definition 4. A location can be the latitude and longitude of one stay, or it can be the average of longitude and latitude of a group of stays. This group of stays are separated by less than or equal to a distance R (see Fig. 3 ). We develop Algorithm 1 for identifying stays, extracting locations and, then labelling identified stays by the extracted locations. The input of this algorithm is the raw movement data of IoT objects. The output is a list of identified stays labelled by extracted locations. The time complexity of this algorithm is quadratic since it is required to calculate the distances between each two observations in the raw movement data. The first step focuses on stay identification (line ). This step is to identify the stays for each IoT object from the given raw movement data. First, we define the time period of the stay (for example, when the value of stay period = 10, it means that we need to identify if an object stays at a place for 10 min). According to our assumption, each record in the raw movement data is sent every one minute, so 10 records represent 10 min. The algorithm calculates the distance among the raw movement data records according to Eq. 1, where, d is the distance between the two location history records, r is the radius of the sphere (earth), θ 1 , θ 2 are the latitude of the two location history records, λ 1 , λ 2 are longitude of the two location history records. Then, it groups them if their distances are less than or equal to a threshold R. From line (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) , the algorithm checks each group. If a group has a number of records larger than or equal to the value of the stay period, the algorithm takes the average of the latitude and longitude of the group. It also takes the smallest timestamp of the group to be the starting time of the stay, and the largest timestamp to be the ending time of the stay. (1) The second step targets location extraction (line 28-40). The algorithm extracts the list of locations out of the identified stays. Since stays are represented by latitude and longitude, the algorithm calculates the distance among these stays, and groups them using a threshold R. The algorithm takes the average of the latitude and longitude of these stays to represent one location. If there is a stay which has not been grouped with any other stays, this stay can represent a location. Finally the third step focuses on labelling stays with locations. In this step, we label the identified stays by one of the extracted locations. Time Overlap Algorithm. We develop Algorithm 2 to detect and report all overlapped periods occurred among the given set of stays produced by Algorithm 1. The purpose is to determine if any two IoT objects have met in a location at a particular time. This novel algorithm named as Sweep Line Time Overlap (SLTO). The SLTO algorithm is inspired by a sweep-line algorithm in geometry which finds intersections between a group of line segments. However, the SLTO algorithm identifies if there are overlapping periods among stays of the objects in a location. The idea of this algorithm is to run a virtual sweep-line parallel to y-axis and move from left to right in order to scan intervals on x-axis. When this sweep-line detects overlaps among stays (they look like line segments which represent the stay periods of objects), it starts calculating if there is an overlap and reporting these overlaps. There are two main steps. The first step focuses on storing all the intervals of the stays (line 2-3). Since our goal is to find the overlapped periods among the set of stays, the algorithm initializes two data structures: i) a priority queue Q to store all the intervals of the stays we got from Algorithm 1 in sorting order, and ii) a sweep-line status as S to scan the stays from left to right. The second step runs the sweep line S (line 4-10). We get the interval end points from Q one by one to allow the sweep line S to scan it. The sweep line detects the start of the stay in the space, and adds it to the S. Also, when it detects the end of the stay (in this case, the SLTO finished from scanning the stay), the algorithm checks the last element in S. If it is the start of this stay, then the algorithm removes the stay form S with no action. If the last element in S is not the start of the finished stay, then an overlap or more have been detected between the this stay and other active stays in the S. Algorithm 3 reports the overlaps discovered by SLTO. It calculates the amount of the detected overlaps, and the results are reported. The time complexity of SLTO is O(NlogN +L) since the stay time is only calculated when overlaps between objects exist (Fig. 4) . OverlapPeriod ← min(L1.end, L2.end) − max(L1.start, L2.start) 6 -Report the overlap (ids of objects and OverlapPeriod) Phase 3) Generating the temporal networks of SIoT. After obtaining the time overlapping periods of stays among IoT objects, we are able to know the count of meetings occurred between any two IoT objects. According to this, we check the rules of the targeted relationship such as how many times they have met and the length of the interval period. If the rules are met, then we build the temporal sequence of the SIoT composed from this relationship to be used in our prediction model stage. After generating the temporal sequence networks in the second stage, our next step is to model each one of them using the Bayesian non-parametric model [22] . This model allows combining structure elucidation with a predictive performance by clustering links (edges) rather than the nodes. Our aim here is to predict links (relationships) between IoT objects that are likely to occur in the subsequent snapshot of the network. Therefore, the SIoT network is modelled as an exchangeable sequence of observed links (relationships) and that allows adapting the growth of the network over time. We assume that the SIoT network clusters into groups, and for this, we model each community using a mixture of Dirichlet network distributions. The description of the model as follows: We model the relationships of the SIoT network using Dirichlet distribution G, where δ Θi is a delta function centered on Θ i , and π i is the corresponding probability of an edge to exist at Θ i , with i=1 ∞ π i = 1. The parameter γ controls the total number of nodes in the network. To model the size and number of clusters, we use a stick-breaking distribution GEM (α) with concentration parameter α that controls the number of the clusters. The model places a distribution over all clusters, and it places per-cluster distribution over the nodes. To generate an edge, first, a cluster will be picked according to D. Then, two nodes (devices) will be sampled according to G. The probability of predicting a link between any two objects is proportional to the product of the degree of these two devices. For the inference part that is based on the Bayes' rule (Eq. 3), we follow the same steps conducted in [22] to compute the distribution over the cluster assignment using the Chinese restaurant process and evaluate the predictive distribution over the n th link, given the previous n−1 links. We perform inference using an Markov chain Monte Carlo (MCMC) scheme [22] . We evaluated the effectiveness and efficiency of the SIoTPredict framework based on comprehensive experiments. In this section, we discuss the experimental design and report the results. We used the SIoT datasets 2 to evaluate the SIoTPredict framework. These datasets are based on real IoT objects available in the city of Santander and contain a description of IoT objects. Each object is represented by fields such as (device id, id user, device type, device brand, device model). The total number of IoT objects is 16,216. 14,600 objects are from private users and 1,616 are from public services. The dataset includes the raw movement data of devices that are owned by users and the smart city. There are two kinds of devices: static devices and mobile devices. Static devices are represented by fixed latitudes and longitudes. Mobile devices are represented by latitudes, longitudes, and timestamps. The latitude and longitude values of mobile devices are dynamic. In addition, the dataset includes an adjacency matrix for SIoT relationship produced with some defined parameters. In Table 2 , we only depict SOR and SOR2 relationships and their parameters to be used in our experiments. In this section, we explain the common metrics and the comparison methods. Performance Metrics. Our performance metrics used in the experiments include Accuracy, Precision, Recall, and F 1 score. Following the work of information diffusion in [6] , we define the Accuracy as the ratio of correctly predicted edges to the total edges in the true network, Precision as the fraction of edges in the predicted network that are also present in the true network, Recall as the fraction of edges of the true network that are also presented in the predicted network, and finally F 1 score as the weight average of precision and recall. [19] . Although the aforementioned models are not explicitly designed for link prediction, they can be modified for the prediction task using the above procedure of selecting the N highest probability edges [22] . In addition, these models suffer from the limitation of assuming a fixed number of vertices. Furthermore, we also compared our approach with common link prediction methods [10] : Resource Allocation, Adamic Adar index, Jaccard Coefficient, XGBoost, and Common Neighbor. (a) ROC curve using nodes in the training set. (b) ROC curve using nodes outside the training set. We modeled the SIoT network to predict future interactions among devices, and that enabled us to have a better understanding on the resulted network. Based on the existing Bayesian models, nodes are assigned to clusters, and these clusters control the way of how these nodes establish relationships. Figure 5 and Fig. 6 show the performance of SIoTPredict against other methods. We used a small network, and the reason of that is due to the nature of SB and MMSB, which do not scale very well on large networks [16] . We experimented the performance of these models and methods in two ways. In the first experiment, we used the same nodes (i.e., IoT objects) in the training set and the test set. That means there were no nodes in the test set outside the training set. We performed experiments in this way because SB, MMSB and other methods assume nodes in the test set are not outside the training set. In the second experiment, the nodes in the test set are outside the training set. For the overall performance, SB does not perform well against the MMSB and our model. The reason is that the assumption of SB states that nodes can only belong to one cluster whereas we see MMSB performs better than SB because it relaxes this assumption by allowing the nodes to belong to more than one clusters. However, SB, MMSB and other common methods do not perform well compared to our model on both settings as illustrated in Fig. 5 and Fig. 6 . In particular, these methods perform poorly in the second setting (i.e., the nodes in the test set are outside the training set) due to their limitation on dealing with new nodes. In contrast, our model delivers the similar performance. Social Internet of Things (SIoT) can foster and enhance resource availability, discovering services, assessing object reputations, composing services, exchanging information and experience. In addition, SIoT enables establishing new acquaintances, collaborating to achieve common goals, and exploiting other object capabilities. Therefore, instead of relying on centralized search engine, social structure resulted from the created relationships can be utilized in order to find the desired services. In this paper, we take the research line of SIoT to a new dimension by proposing the SIoTPredict framework that addresses the link prediction problem in the SIoT paradigm. This framework contains three stages: i) collecting raw data movement of IoT devices, ii) generating temporal sequence networks of SIoT, and iii) predicting the links that are likely form between IoT objects in the future. Ongoing work includes further assessment of the SIoTPredict framework, and enhancement of the relationship prediction by considering the features of IoT objects (e.g., services offered by the objects). From smart objects to social objects: the next evolutionary step of the internet of things The Social Internet of Things (SIoT) -when social networks meet the Internet of Things: concept, architecture and network characterization A manifesto for networked objects -cohabiting with pigeons, arphids and aibos in the Internet of Things Edge-exchangeable graphs and sparsity Lysis: a platform for IoT distributed applications over socially connected objects Inferring networks of diffusion and influence Stochastic blockmodels: first steps Smart-its friends: a technique for users to easily establish connections between smart artefacts Things that Twitter: social networks and the Internet of Things The link-prediction problem for social networks Link prediction in complex networks: a survey A dataset for performance analysis of the social internet of things A survey of link prediction in complex networks Analysis of recommendation algorithms for Internet of Things Friendship selection in the social internet of things: challenges and possible strategies Bayesian models of graphs, arrays and other exchangeable random structures The cluster between internet of things and social networks: review and research challenges Social Internet of Things (SIoT): foundations, thrust areas, systematic review and future directions Estimation and prediction for stochastic blockmodels for graphs with latent block structure Searching the web of things: state of the art, challenges, and solutions Internet of Things search engine Nonparametric network models for link prediction Things of interest recommendation by leveraging heterogeneous relations in the Internet of Things