key: cord-0043131-tykvzjk0 authors: Wang, Xianjing; Salim, Flora D.; Ren, Yongli; Koniusz, Piotr title: Relation Embedding for Personalised Translation-Based POI Recommendation date: 2020-04-17 journal: Advances in Knowledge Discovery and Data Mining DOI: 10.1007/978-3-030-47426-3_5 sha: a9f81ab60ad5849daa1b91c0b299e412fd5b2a62 doc_id: 43131 cord_uid: tykvzjk0 Point-of-Interest (POI) recommendation is one of the most important location-based services helping people discover interesting venues or services. However, the extreme user-POI matrix sparsity and the varying spatio-temporal context pose challenges for POI systems, which affects the quality of POI recommendations. To this end, we propose a translation-based relation embedding for POI recommendation. Our approach encodes the temporal and geographic information, as well as semantic contents effectively in a low-dimensional relation space by using Knowledge Graph Embedding techniques. To further alleviate the issue of user-POI matrix sparsity, a combined matrix factorization framework is built on a user-POI graph to enhance the inference of dynamic personal interests by exploiting the side-information. Experiments on two real-world datasets demonstrate the effectiveness of our proposed model. With the increase of mobile devices on the market and ubiquitous presence of wireless communication networks, people gain easy access to Point-of-Interest (POI) recommendation services. A great number of Location-based Social Networks (LBSNs) have consequently been established e.g., Foursquare, Gowalla, Facebook Places, and Brightkite. The LBSNs often provide POI services that recommend users new POI venues that meet specific user criteria. In this paper, we develop a high quality personalized POI recommendation system by leveraging user check-in data. There are three technical challenges listed as follows: Spatial Reasoning. A user's current geographical location limits their choice of check-in POIs [13] . Many approaches model relations between a user's current geographical location and their preferences with respect to the surrounding POIs. h + r = t u + r t = e 1 and e 1 + r l = v u + (r t • r l ) = v The above issues are often addressed by the use of side information in traditional recommendation systems. Such a side information may be retrieved from social networks [9] and may include user demographic information, item attributes, and context information [26] . As the auxiliary data is useful for the recommendation systems [19] , it is desirable to model and utilize heterogeneous and complex data types in recommendation systems. However, the traditional collaborative filtering techniques such as Matrix Factorization (MF) cannot deal with the above problems in a unified manner. Knowledge Graph Embedding (KGE) [1, 21] , also known as a translation-based embedding model, encodes the side-information to improve the performance of Recommender Systems (RS) [16, 27] . He et al . [6] and Zhang et al . [27] employ the KGE model to represent users, movies, and movie attributes. The graph edges represent connections between users and movies in a knowledge base [6, 27] . However, previous studies do not offer insights on the following challenges: (1) how to construct a user-POI graph that utilizes the user check-in data with side information, such as spatio-temporal data and semantic context information, to leverage data sparsity problem; (2) how to effectively integrate a translation-based embedding model with a traditional recommendation system to improve the quality of POI recommendation. Problem Definition. Given an LBSN user check-in dataset, we aim to recommend each user with personalized top-k POIs they may be interested in visiting. We build upon recent advances in graph embedding methods and propose Graph Embedding for POI Recommendation, a novel translation-based graph embedding approach specifically for POI recommendation, abbreviated as GERec. To overcome the challenge stemming from the spatio-temporal context, GERec encodes temporal and spatial information, as well as user dynamic check-in activities in a low-dimensional latent space. GERec addresses the issues of user check-in data sparsity by integrating user-POI graph embedding with a combined matrix factorization framework (Fig. 1) . I. To deal with the data sparsity, we propose a novel translation-based POI recommendation model to effectively form a user-POI graph capturing the side information, such as spatial, temporal, and semantic contents. II. We propose a spatio-temporal relation path embedding to model the temporal, spatial and semantic content information to improve the quality of POI recommendations. III. We show our model outperforms the state-of-the-art POI recommendation techniques on two real-world LBSN datasets, Foursquare [2] and Gowalla [3] . POI Recommendations. Making personalized POI recommendations is challenging due to the user dynamic check-in activities. Existing studies on MF-based POI recommendations either focus on aggregating spatially-wise personal preference or exploring temporal influences. Most aggregation-based POI recommendation approaches fail to capture jointly geographical and temporal influences with the semantic context while addressing the data sparsity in an unified framework. In [22, 25] , geographical locations are used to improve the performance of POI system which highlights that there is a strong correlation between user checkin activities and geographical distance. Geographical sparse additive generative model [22] for POI recommendations, Geo-SAGE, exploited co-occurrence patterns with contents of spatial items. A POI system [25] based on deep learning from heterogeneous features and hierarchically additive representation learning proposed spatially-aware model for personal preferences. Knowledge Graph Embedding. KGE is well known for its use in recommendation systems. Zhang et al . [27] proposed a collaborative KG-integrated movie recommender framework to learn the latent and visual representations. Palumbo et al . [16] proposed entity2rec to capture a user-item relatedness from KGEs with the goal of generating top-k item recommendations. Qian et al . [18] adopts KGE to model the side information in POI recommendation system. Their work only focuses on embedding the user and POI entities, and mapping the spatio-temporal patterns as a translation matrix. Although their work explored KGEs and RS, it did not integrate graph embedding with the traditional MF model. Compared with our proposed model that combines the spatial and temporal information with semantic contents in a semantic relation embedding space, these KG embedding-based recommender models have limited expressive abilities as they model the key parameters (e.g., spatial and temporal information) as a simple matrix. Finally, noteworthy is the family of Graph Convolutional Networks with models such as GCN [7] , GraphSAGE [5] , adversary GCN [20] , kernel-based CKN [14] as well as generic graph embedding approaches such as DeepWalk [17] and Node2Vec [4] which all have the capacity to model graph-related tasks. Difference with Existing Works. 1) To the best of our knowledge, this is the first work that investigates the joint modeling of temporal, geographical and semantic category information integrated with KG embedding POI recommendation system; 2) A novel embedding is proposed to bridge the gaps between embedding and traditional MF. Therefore, we propose a novel combined MF framework for dynamic user-POI preference modeling based on the learned embedding in a unified manner; 3) In contrast to the approach [24] based on the bipartite graph (homogeneous graph), our approach uses the translationbased graph (heterogeneous graph). Moreover, approach [24] does not apply MF while our model investigates MF for generating top-k proposals. A heterogeneous graph admits two or more node types which can be then embedded by a symmetric function e.g., one can use interchange user type and POI type input arguments. For the best recommendation performance, we develop an effective representation for the user and POI nodes. A user u and a POI v represent the head or tail of a triplet (head, relation, tail), denoted as (u, r, v), where u, e, v ∈ R k are the vector representations of u, r and v. Head-tail entity pairs usually exhibit diverse patterns in terms of relations [12] . Thus, a single relation vector cannot perform all translations between head and tail entities. For example, the relation path embedding has the diversity patterns, such as temporal, spatial and semantic contents patterns. The relation between user (head) and POI (tail) "user -sushi shop" exhibits many patterns: (i) Temporal pattern i.e., a user visits a POI in a certain time slot ; (ii) Geographical pattern i.e., a user visits a POI when she is in a particular area ; and (iii) Semantic content pattern i.e., a user visits a specific POI that is associated with a category . In our model, we embed spatio-temporal information as a relationship connecting users and POIs. Take the 2-step path as an example. In Table 1 , a user check-in activity (a user visits a POI) is associated with temporal and geographical patterns i.e., denotes a user visiting a sushi shop (POI) at 12pm (time slot) at food court (location). Instead of building triplets (u, r t , e 1 ) and (e 1 , r l , v) for learning the graph representation, we form a triplet (u, r t •r l , v), and optimize the objective u + (r t • r l ) = v. The composition operator • merges the temporal and spatial relations r t and r l into the spatio-temporal relation. Given a relation path r = (r 1 , . . . , r n ), we obtain the relation path embedding r by composing multiple relations via the operator •, i.e., r = r 1 • · · · • r n . For the composition operator, we use the multiplication operation. Thus, the relation path vector is defined as r = r 1 ×· · · ×r n . In our model, we embed temporal and geographical patterns, and semantic category contents into the relation path. −→ v illustrates that a user visits a POI at a certain time slot t in location l, which has semantic category information c associated with the user's current location. We define a spatio-temporal and semantic-based relation path representation r tlc = r t • r l • r c , which consists of a temporal relation path r t , a geographical relation path r l , and a semantic relation path r c . The relation r tlc is used as our default relation representation in our POI model. In what follows, we write r instead of r tlc for simplicity. TransR [12, 21] is among the most representative translational distance models for a heterogeneous graphs. We apply TransR [12] to our POI recommendation model. For each triplet, including (u, r tl , v) and (u, r tlc , v) in the graph, entities are embedded into vectors u, v ∈ R k and relation is embedding into r ∈ R d . For each relation r, we set a projection matrix from the entity space to the relation space, denoted as M r ∈ R k×d . TransR firstly maps entities u and v into the subspace of relation r by using matrix M r : and the TransR score function is defined as: The following margin-based ranking loss defined in [12] is used for training: where γ controls the margin between positive and negative samples, S and S are the set of positive and negative triplets, respectively. The existing graphs that we construct from user-POI check-in datasets contain mostly correct triplets. Thus, we corrupt the correct triplet (u, r, v) ∈ S to construct incorrect triplets (u , r, v ) by replacing either head or tail entities with other entities from the same group so that: We note that translation-based embedding provides a generic way for extracting a useful information from a graph. However, embedding cannot be applied directly to matrix factorization models. Thus, we propose a function g(·) that extracts the learnt entities. Given an entity u, v and a relation r, we obtain representation sets {e r u } and {e r v }, where r denotes the set of relation paths, where e r u and e r v represent a user u and a POI v with respect to the specific relation path r. Thus, the entity extraction is denoted as: where {φ u } and {φ v } are sets of final representations for user and POI embedding, respectively. The function g(·) prepares the embedded user and POI information to become the entries for the matrix factorization by sorting the learnt user and POI embedding sets based on the distances from Eq. (2) sorted according to the descending order. Embedded pairs that are further from each other than some θ are pruned. When a user connects with a POI by a relation (u + r ≈ v), the smaller the score value, the lower distance between POI and user is. Hence, (v + r ≈ u) vice versa. Then, the sorted user and POI embedding sets are filtered according to the user's current location. In many cases, POIs may be outside of the user's home location and it may be not reasonable to recommend such POIs. Thus, we set a reasonable radius w.r.t. the geographical location by applying a threshold θ d to filter the learnt POIs that are too far away from user's home location. Following [10, 23] , we assume a Gaussian distribution for user current location l, and we set the user's current check-in POI v l so that v l ∼ N (μ l , Σ l ). We integrate the matrix factorization into our model by combining two parts: 1) spatio-temporal MF and 2) User preference MF. The spatio-temporal MF calculates the probability that a user will visit a POI. The user preference MF evaluates user's preference w.r.t. a POI. The combined probability determines the total probability of a user u visiting a POI v. For each embedded user vector φ u and embedded POI vector φ v , we apply the matrix factorization to predict a probability that a user u would visit a POI v based on her current location l and a particular time slot t. Given a frequency matrix P ∈ R |{φ u }|×|{φ v }| , which represents the number of check-ins of the embedded users for the embedded POIs. MF is performed by finding two low-rank matrices: a user specific matrix E ∈ R K×|{φ u }| and a POI specific matrix O ∈ R K×|{φ v }| , where K is the dimension of the latent vector that captures the corresponding user-POI preference transition. The probability of an embedded user u based on a particular spatio-temporal relation r tl and embedded location v, is determined by: where E u and O v are vectors for the user u and the POI v from matrices E and O, respectively, while P uv is a scalar frequency for u and v. The goal of matrix factorization is to accurately approximate the probabilities for the user frequency data: where (u, v) ∈ Ω indicates the observed frequency of user u at POI v, || · || 2 F is the Frobenius norm, and α( E 2 F + O 2 F ) is a regularization term to prevent overfitting. User Preference MF. The second part of the combined MF model is to predict the user preference given a POI. Based on the user historical check-in frequency, given an observed frequency matrix F , MF factorizes users and POIs so that F ≈ U V . Then, scalar P uv captures users' preference at a POI determined by the following equation: The same objective as in Eq. (7) is applied to accurately approximate the probabilities for the user check-in frequencies. Combined Matrix Factorization. We propose a combined MF model that is simply a product of probabilities that 1) a user is spatio-temporally compatible with a POI and 2) the user has a preference given the POI. The first term is the probability of an embedded user visiting an embedded POI given some spatio-geographic pattern, where P uv is defined by Eq. (6) . The second term is the probability of the user's preference at a POI based on her historical records, where P uv is defined in Eq. (8) . The combined model is denoted as: Datasets. We adopt two popular large-scale LBSN datasets: Foursquare [2] and Gowalla [3] . The experimental results for our approach and the baselines are compared in the same testbed. We selected the Foursquare dataset from Sep 2010 to Jan 2011 which contains 1,434,668 users' check-in activities in the USA. The Foursquare geographical area is divided into a set of 5846 locations/regions according to administrative divisions. There are 114, 508 user entities and 62, 462 POI entities connected with 46, 768 spatio-temporal relations. For Gowalla, another graph is built from 107, 092 user entities and 1, 280, 969 POI entities connected with 1, 633 relations. We apply k-means [18, 26] to form 200 region clusters for Gowalla geographical area. Baselines. Two of the baseline models are translation-based models that are highly related work in RS [6, 18] . PMF [15] is a classic probabilistic matrix factorization model that explicitly factorizes the rating matrix into two low-rank matrices. GeoMF [9] is a weighted matrix factorization model for POI recommendations. Rank-GeoFM [8] is a ranking-based geographical factorization model in which the check-in frequency characterizes users' visiting preference, and the factorization is learnt by ranking POIs. GeoSoCa model [28] extends the kernel density estimation by applying an adaptive bandwidth learnt from the user check-in data. ST-LDA [26] is a latent class probabilistic generative Spatio-Temporal LDA (Latent Dirichlet Allocation) model, which learns the region-dependent personal interests according to the contents of the checked-in POIs at each region. TransRec is the translation-based recommendation approach proposed in [6] , which embeds items into a translation space and models users via a translation vector. Note that our proposed method is different from TransRec as we select both users and POIs as entities, and learn the embedding representation for a different type of knowledge as well as the spatio-temporal relationships. STA [18] is a spatio-temporal context-aware and translation-based POI recommendation model. However, this solution does not consider the semantic relation embedding of spatial, temporal and category content information, and thus is incapable of leveraging the user-POI graph structure. Evaluation Metrics. Following [8, 9, 28] , we deploy the following evaluation methodology. The user-POI graph is built from historical user check-in activities in the training set. The spatio-temporal relations in the user-POI graph are composed based on each user's current time slot and the area from the given query q = (u, l, t). We divide the time slot to different hour lengths (1, 2, 4, 8, 12, 24) . The user's current standing area before visiting v is selected for her location l. In the experiment, we first calculate the frequency for each user visiting groundtruth POIs. We use the 80% as the cut-off point so that check-ins before a particular date are used for training. The rest check-in data generated after this date is chosen for testing. We form a top-k recommendation list from the top k POI recommendations. We deploy measurement metrics such as Precision@k (P rec@k), Recall@k (Rec@k) and F1-score@k (F @k): P rec@k = 1 Following [11, 12] , for translational distance model TransR, we set the learning rate λ = 0.001, the margin γ = 1, the dimensions of entity embedding and relation embedding d = 100, the batch size B = 120. We traverse all the training triplets for 1000 rounds on both Foursquare and Gowalla datasets. Fig. 2 reports the performance of the POI recommendation models on Foursquare and Gowalla datasets, respectively. We present the performance for k = {1, 5, 10, 20}. Figure 2 presents the results of algorithms in terms of P rec@k, Rec@k and F 1@k on Foursquare and Gowalla datasets. The figure show that the proposed GERec model outperforms all baseline models significantly for all metrics at different k values. Specifically, when comparing with the traditional MF models, GERec outperforms the Rank-GeoMF, which is the MF baseline with the best performance, by 50% and 47% in F1-score@10 on Foursquare and Gowalla, respectively. When comparing with translation-based models, our proposed model also improves the POI recommendation performance significantly. GERec outperforms STA, by 20% and 25% in F1-score@10 on both datasets. This demonstrates the capability of our graph-based GERec model to generate high quality POI recommendations. Although GeoSoCa exploits social influences, geographical locations and user interests, the simple kernel density estimation results in the poor performance. This validates the effectiveness of our GERec solution, especially our proposed step which exploits and integrates the user-POI interactions and spatio-temporal patterns to tackle the sparsity in the user-POI check-in data. The learned embeddings are well integrated into the combined matrix factorization model. Thus, GERec achieves the best performance among all compared baseline models. The user-POI graph constructed from Gowalla dataset has fewer relation edges than the Foursquare graph, as Gowalla relation patterns r tl have fewer regions in the relation paths than Foursquare. In Fig. 3 , we conduct extensive experiments to evaluate the performance of the models under the data sparsity. Specifically, we create multiple datasets with various sparsity levels by reducing the amount of training data randomly by 10%, 20%, 30%, and 40% of the total amount of data (before the cut-off date), and at the same time keeping the test data the same. The results on both Foursquare and Gowalla are shown in the Fig. 3 There are two parameters in the proposed GERec model: the time slot h and the embedding dimension d. Below, we investigate the effect of these two parameters. Table 2 shows the impact of the length of time slot on Precision@k and Recall@k. The length of the time slot affects the quality of POI recommendations. When the length of time slot changes, the relation paths change and the entire graph needs to be computed again. We split day activities into different lengths. The parameter h denotes the length of each time slot in hours. The larger length of time slot the less the time influence on recommendation results. We report the top-k recommendation precision and recall for each time slot on the Foursquare and Gowalla datasets. From the experimental results we observe that the POI recommendation accuracy improves when the time slot length increases. The recommendation accuracy reaches a peak point for 8 h long time slot. Then, it starts decreasing as the time slot keeps increasing. The reason for the improved accuracy is that the larger time length, the denser the data. Hence, there are more user check-in records at each time slot for generating recommendations. However, the recommendation accuracy decreases as the length of the time slot reaches 8 h. This is because when the length of the time slot is large enough, it may reduce the influence of temporal pattern. Moreover, we study the impact of varying dimension d of the relation embedding by setting it to {70, 80, 90, 100, 120} ( Table 3 ). The best parameter is determined according to the mean rank in the test set. The accuracy rate increases gradually when the dimension increases. Specifically, the accuracy keeps increasing until the dimension reaches 100, then it remains stable. In this paper, we propose a novel translation-based POI recommendation model, which can effectively construct a user-POI graph and model the side information. To address time and geographical reasoning, we propose spatio-temporal relation path embedding to model the temporal, spatial and semantic contents to leverage the user-POI interaction and improve the quality of user embedding. To overcome the sparsity of the user-POI interaction data, we develop an embedding function which bridges gaps between the translation-based embedding model and traditional MF-based model. The user-POI graph is integrated with a combined MF model to improve the quality of POI recommendations. A comprehensive survey of graph embedding: problems, techniques, and applications Exploring millions of footprints in location sharing services Friendship and mobility: user movement in location-based social networks Node2Vec: scalable feature learning for networks Inductive representation learning on large graphs Translation-based recommendation Semi-supervised classification with graph convolutional networks Rank-geofm: a ranking based geographical factorization method for point of interest recommendation Geomf: joint geographical modeling and matrix factorization for point-of-interest recommendation Modeling human location data with mixtures of kernel densities Modeling relation paths for representation learning of knowledge bases Learning entity and relation embeddings for knowledge graph completion Personalized point-of-interest recommendation by mining users' preference transition Convolutional kernel networks Probabilistic matrix factorization Entity2rec: learning user-item relatedness from knowledge graphs for top-n item recommendation DeepWalk: online learning of social representations Spatiotemporal representation learning for translation-based poi recommendation Heterogeneous information network embedding for recommendation Fisher-bures adversary graph convolutional networks Knowledge graph embedding: a survey of approaches and applications Geo-sage: a geographical sparse additive generative model for spatial item recommendation Tpm: a temporal personalized model for spatial item recommendation Learning graph-based poi embedding for location-based recommendation Spatial-aware hierarchical collaborative deep learning for poi recommendation Adapting to user interest drift for poi recommendation Collaborative knowledge base embedding for recommender systems Geosoca: exploiting geographical, social and categorical correlations for point-of-interest recommendations We acknowledge the support of Australian Research Council Discovery DP190101485, Alexander von Humboldt Foundation, and CSIRO Data61 Scholarship program.