key: cord-0043102-1rijkc5s authors: Wang, Xiaodong; Liu, Zhen; Wang, Nana; Fan, Wentao title: Relational Metric Learning with Dual Graph Attention Networks for Social Recommendation date: 2020-04-17 journal: Advances in Knowledge Discovery and Data Mining DOI: 10.1007/978-3-030-47426-3_9 sha: 65e8836f1423b33d14c32c642d2fc0f68364fd9c doc_id: 43102 cord_uid: 1rijkc5s Existing social recommenders typically incorporate all social relations into user preference modeling, while social connections are not always built on common interests. In addition, they often learn a single vector for each user involved in two domains, which is insufficient to reveal user’s complex interests to both items and friends. To tackle the above issues, in this paper, we consider modeling the user-item interactions and social relations simultaneously and propose a novel metric learning-based model called RML-DGATs. Specifically, relations in two domains are modeled as two types of relation vectors, with which each user can be regarded as being translated to both multiple item-aware and social-aware representations. Then we model the relation vectors by neighborhood interactions with two carefully designed dual GATs to fully encode the neighborhood information. Finally, the two parts are jointly trained under a dual metric learning framework. Extensive experiments on two real-world datasets demonstrate that our model outperforms the best baseline by 1.91% to 4.74% on three metrics for top-N recommendation and the performance gains are more significant under the cold-start scenarios. Nowadays people are troubled by the information overload problem caused by the explosively growing online contents. Recommendation system, which aims at addressing this problem by providing personalized items to each user, has become an indispensable element of many web applications. Recently, top-N recommendation based on implicit feedback has attracted much research interest since implicit feedback is much more abundant and easier to collect in practice [10] . Among the various recommendation methods, collaborative filtering (CF) stands out to show good performance, with the assumption that similar users tend to share similar preferences [11, 17] . However, the performance of CF models are often hindered by the data sparsity and cold-start issues [3] . With the prevalence of the online social platforms, social recommendation has emerged as a promising solution to address the above two issues. It assumes that people tend to spread their interests to their social connections and focuses on combing social information with user-item interactions for better item recommendations [19] . In conventional social recommenders [8, 12] , social relations are typically modeled as an ensemble term or a regularization based on the matrix factorization (MF). There also exists hybrid models which collectively factorize the user-item interaction matrix and trust matrix [4] . Recently, researchers also designed more advanced neural models for social recommendation and showed state-of-the-art performance [1, 2, 18] . E.g., [1] models both aspect-level and friend-level differences for social recommendation by attention mechanism and memory network. Although the above models are effective, we argue that they have three common limitations: Firstly, they typically incorporate all social relations into user preference modeling, ignoring the fact that users may have totally opposite tastes on specific items. Secondly, they often learn a single vectorized representation for each user, which is insufficient to reveal user's complex interests to both items and friends. Finally, none of them have fully considered the neighborhood information in both domains. In this paper, we focus on implicit feedback and propose a dual metric learning framework to handle the above issues. As users involve in two heterogeneous graphs, we model the user-item interactions and social relations simultaneously instead of directly incorporating social information into user embeddings. Specifically, as shown in Fig. 1 , we propose to model the two kinds of relations as interaction vector (e.g., r 11 ) and social vector (e.g., s 12 ) respectively. Then in the metric space, each user can be regarded as being translated to both mulitple item-aware points by the interaction vectors or to multiple social-aware points by the social vectors, which results in a much finer-grained model. In addition, from the perspective of CF, each user-item pair can be seen as being related by their respective neighboring items and users. Analogously, two users in the social domain are also connected by their neighboring friends. Therefore, we model the two kinds of relation vectors by neighborhood interactions, in which neighborhood-based representations in the two domains are generated by two carefully designed dual GATs [16] . Then they are fed into two corresponding MLPs to model the complex interactions between two neighborhoods as relation vectors. Finally, the two relation modeling parts are incorporated into a joint model to mutually enhance each other. Our main contributions are summarized as followings: -We propose a novel social recommendation model RML-DGATs which considers utilizing social information by simultaneously modeling the user-item interactions and social relations under a dual metric learning framework. -Relations in two domains are modeled as relation vectors to translate each user to both multiple item-aware and social-aware representations. We model the two kinds of relation vectors by two carefully designed dual GATs, which could fully encode the neighborhood information in an explicit manner. -Under extensive experiments conducted on two real-world datasets, our model consistently outperforms state-of-the-art baselines on three metrics for top-N recommendation tasks, especially in the cold-start scenarios. Top-N recommendation with implicit feedback has attracted more and more research interest due to its good applicability in practice [10] . However, traditional MF approaches are incapable of handling such a scenario for the absence of negative feedback. To address this issue, [14] first proposed a pairwise ranking assumption tailored for implicit feedback which states that user tend to rank items she consumed over those unobserved, then it optimizes MF with BPR-OPT criterion. [5] further extends MF to a neural architecture to model the implicit feedback as complex nonlinear user-item interactions. Recently, [7] argues that inner product used in most previous methods violates the import triangle inequality property and adopts the Euclidean distance as the relation modeling metric. To alleviate the data sparsity and cold-start issues, many efforts has been devoted to incorporate social relations for better recommendation. These studies could be broadly categorized into two groups: MF-based and DNN-based. MF-based methods typically take social information as an ensemble term or a regularization. [22] extends BPR with the assumption that users tend to rank items consumed by their friends over those unobserved. [4] collectively factorizes the rating matrix and trust matrix to share the same user latent vectors. However, the above MF-based methods could only capture linear information encoded in social network, so several recent studies proposed to learn more complex features using deep neural networks [1, 2, 18] . Authors in [2] designed a NSCR model based on NCF [5] to model the social relations as a graph regularization. To learn non-linear features of each user from social relations, [18] presented a deep model which substitutes the user latent modelling part of PMF with DNN. A newly proposed method SAMN [1] leveraged attention-based memory network to model both aspect-level and friend-level differences for social recommendation. In this section, we introduce the proposed RML-DGATs model and the overall architecture is shown in Fig. 2 . We start with formulating the social recommendation problem concerned in this paper and then elaborate the three main components of the model: the ID Embedding part, the Relation Modeling part and the Joint Learning part. Let G = (U ∪ I, Y ) be the user-item interaction graph, where U is the user set with n users and I is the item set with m items. Y ∈ R n×m is the user-item interaction matrix with Y ui = 1 if user u has interacted with item i, otherwise Y ui = 0. As social connections can be unilateral (e.g., followings on Twitter) or bilateral (e.g., friendships on Facebook), we describe the social network as a directed graph S = (U, T ). T ∈ R n×n represents the social relations between users which satisfies T uv = 1 if user v is in user u's trust list. Note that we term u as the truster and v as the trustee in T uv . Then, the social recommendation problem can be described as: Social Recommendation. Given an user-item interaction graph G and a social network S, the social recommendation aims to learn a mapping function f to predict users' preferences to items as:R ui = f (G, S), whereR ui ∈ R n×m is the predicted preference scores. Let P ∈ R n×d and Q ∈ R m×d denote the low-dimensional embedding matrices of users and items, with d represents the dimension size of latent vectors. Given an user u, one of her interacted item i and an user v in her trust list, the ID Embedding part takes one-hot representation of triplet (u, i, v) as input and performs indexing operations from P and Q according to their one-hot IDs. Then it outputs the corresponding latent vectors p u , q i and p v for u, i and v, respectively. Note that p u could be regarded as the shared essential preference between the item and social domains. The Relation Modeling part aims to model the user-item interaction relations and user-user social relations simultaneously. Next we will detail the user-item interaction modeling and social modeling respectively. In the user-item interaction graph, the items user has interacted form its neighborhood and they can reflect her preferences to some extent. Analogously, item's neighbors are the users who interacted with it and its attributes could be revealed by these users. We propose a dual GATs which consists of an user-level GAT and an item-level GAT to aggregate item's or user's neighborhood information. User-Level GAT. Let N (i) be the set of users who have interacted with item i. The user-level GAT aggregates latent vectors of item i and users in N (i) attentively and then outputs the neighborhood-based item representation q N i . We define the aggregation function as: . α ij is the attention weight and we parameterize it with a two-layer neural network as: where σ is the nonlinear activation function and [, ] denotes the concatenation operation. In addition, are the model parameters of the userlevel GAT, in which W is the weight matrix for the GAT layer and W , h, b are the weights and bias for the attention network. Note that we also tried other forms of attention networks such as simple inner product or inner product with nonlinear activation, but we empirically found that Eq. (2) performs best. Item-Level GAT. Latent vectors of user u and its neighboring items in C(u), namely the set of items u has interacted, are aggregated to generate the neighborhood-based user representation t N u in item domain. Analogously, we define the aggregation function as: where Then a MLP which takes the two representations as input is designed to model the complex interactions between the two neighborhoods as the interaction vector r I . Formally, we empirically design a two-layer tower structure as: where W 1 , W 2 , b 1 and b 2 are the weights and biases of the MLP and we denote them as Social Modeling. The social relation between two users can be seemed as related by their respective neighborhoods including themselves. Given user u and user v in her trust list, we propose another dual GATs which consists of two friend-level GATs to aggregate user's neighboring friends into neighborhoodbased user representation. Friend-Level GAT. For user u, the latent vectors of u and users in her trust list denoted as v ∈ N (u) are aggregated into the neighborhood-based user representation s N u in social domain. We define the aggregation function as: For user v, her neighborhood-based representation p N v is generated by the same friend-level GAT and we only need to replace p u with p v . Then we also model the social vector r S by feeding s N u and p N v into a two-layer MLP as: Note that we use the same parameters for the two MLPs, as we empirically found that it works the best. The rationale behind maybe that users and items share similar neighborhood structures and unifying them to train the same parameters booms the training process and reduce the risk of overfitting. In addition, as the number of users' or items' neighbors vary greatly, traversing all of them is time consuming and also may introduce too much noise. Therefore, we set a threshold K and for those nodes with more than K neighbors, we randomly sample K of them to aggregate their features. Scoring Functions. Since we consider the user-item interactions and social relations individually, we define the following two scoring functions to evaluate the qualities of interaction vector and social vector learned by the Relation Modeling part: Given latent vectors of user u, item i and their relation vector r I , the interaction scoring function s 1 (u, i) penalizes any deviation of p u + r I from q i brought by the relation vector r I . Analogously, the social scoring function s 2 (u, v) calculates the distance between user's translated vector p u + r S and p v . The smaller the scoring values s 1 and s 2 are, the higher preference user u will show to item i and the more user u will trust user v. Loss Function. We adopt the widely-used pairwise hinge loss which was tailored for item-ranking task [7] . For each positive tuple (u, i) which satisfies Y ui = 1, we pair it with t sampled negative tuples (u, j) to form the the pairwise training set in item domain as where t is the negative sampling ratio. Then we define the loss function in the item domain as: τ I works as the margin to ensure that the distance score between a positive user-item pair after translation is smaller than that between a negative pair, and a larger setting value of τ I will push the two pairs far away from each other. Similarly, the loss function in the social domain is defined as: where D S = {(u, v, w)|T uv = 1 ∧ T uw = 0} is the pairwise training set in social domain. Note that D S is constructed in the same way as D I and the margin τ S functions similarly to τ I . Motivated by [13] , we design two kinds of regularizers for our RML-DGATs model. As we translate each user u to her interacted item i (or user v in her trust list) by the interaction vector r I (or social vector r S ), we define two distance regularizers to explicitly pull the translated user embedding closer to corresponding item or trustee embedding as: Furthermore, in order to explicitly encode the collaborative information in the user's (or item's) neighborhood, we design two neighborhood regularizers to guide users and items to be closer to their neighborhood representations as: Based on the pairwise loss functions and the two kinds of regularizers, the overall losses in two domains can be formulated as: where λ 1 and λ 2 are the regularization coefficients for the distance regularizers and neighborhood regularizers respectively. Finally, we integrate the two relation modeling parts into a joint learning framework and the overall objective function to be minimized is: Model Training. To optimize the above objective function, we implement the proposed model with Tensorflow and train it with mini-batch Adam [5] . At the end of each mini-batch, all the user and item latent factors are constrained within a unit sphere as v 2 2 ≤ 1, to prevent overfitting and ensure the robustness of the model [7] . In this section, we conduct experiments on two real-world datasets to evaluate the proposed RML-DGATs model. We first elaborate the experimental settings and then show the overall performance comparison, followed by detailed parameters analysis and cold-start testing. Finally, ablation studies are conducted to further evaluate different parts of the proposed model. Datasets. In our experiments, we used two publicly accessible datasets: Ciao 1 and Epinions 2 , to evaluate the performance of the proposed model. They both contain user-item ratings information which scale from 1 to 5 and social relations. As done in many previous works [5, 6] , we transformed the observed ratings to 1 indicating the interactions between corresponding user-item pairs, and all the unobserved ratings to 0. Then we filtered out users whose rating records or number of social relations are less than 2, and also removed items rated by less than 2 users. The statistics of the preprocessed datasets are summarized in Table 1 , where #Links denotes the number of social relations, #R density and #S density denote the density of interactions and social relations respectively. We compared our proposed RML-DGATs model with following eight baselines: -BPR [14] : A widely used pairwise ranking model tailored for implicit feedback recommendation and it optimizes MF with BPR optimization criterion. -FISM [9] : An item-based CF model which generates user's embedding by the items she consumed and learn the item similarity matrix by a low-rank assumption. -NeuMF [5] : A recently proposed DNN-based model which integrates MF and MLP into a unified model and achieves the state-of-the-art performance. -CML [7] : By learning a joint metric space which satisfies the triangle inequality, this model can capture users' finer-grained preferences and show strong performance. -LRML [15] : It also adopts the idea of translating users to items by translation vectors, while it relies on memory network to learn the relation vectors. -SBPR [22] : It assumes that users tend to prefer items consumed by her friends than those unobserved and extends the preferences order in BPR to get a finer-grained model. -CUNE-BPR [21] : It extracts latent semantic friends using random walks and graph embedding and then incorporates them into BPR for ranking task. -SAMN [1] : A newly proposed DNN-based social recommender which leverages attention-based memory network to model both aspect-level and friend-level differences. These baselines could be categorized into three groups: the first three belong to CF methods, the subsequent two are built on metric learning and the remainings are social-aware models. Evaluation Protocols. We adopt the widely used leave-one-out evaluation protocol to evaluate the above methods [9] . Specifically, for each user, we keep her latest two consumed items for validation and testing and the remaining items for training. In line with many previous studies [5, 15] , we randomly sample 999 negative items that the user didn't interact with along with the last interacted one for testing, as it's too time consuming to rank all unobserved items. In addition, we use three widely used ranking-based metrics HR, MRR and NDCG to measure the recommendation quality [20] . Then we repeated each experiment 5 times and report the average performance. We initialized all the baselines by the parameters declared in the original papers and carefully tuned them to achieve the optimal NDCG@10 on the validation set. Note that we also pretrained the NeuMF model with MF and MLP as the authors did in [5] . For our model, we empirically set the learning rate to 0.001, d = 128, the size of attention vectors to 32, λ 1 = λ 2 = 0.01, t = 4 on both datasets, and τ I = 0.25, τ S = 0.5. Moreover, we set K to 20 and 30, γ to 0.2 and 0.1 on Ciao and Epinions respectively. We also adopted ReLU as the nonlinear activation for all GATs and MLPs after the tuning process. Table 2 . Top-K recommendation performance of different methods on two datasets. The best performance is in boldface and the second is underlined. * denotes the significance p-value < 0.05 compared with the best baseline. The recommendation performances of different methods are shown in Table 2 . We can make the following observations from the table: -Our proposed model RML-DGATs consistently outperforms all the baselines in all cases, including the two state-of-the-art neural methods NeuMF and SAMN and also the competitive metric learning model LRML. The improvements could be attributed to the finer-grained modeling of each user and the two carefully designed dual GATs in neighborhood interactions. -Methods incorporating social information generally perform better than those utilizing only user-item interactions in most cases. For example, social recommender CUNE-BPR is superior to non-social models like BPR, FISM and CML. In addition, all with neural architectures, SAMN and our model significantly outperform NeuMF and LRML. -LRML consistently show superior performances to all other non-social methods as it also adopts the idea of translating users to multiple item-aware representations. However, we empirically found that it underperforms our model when both considering only user-item interactions. This further demonstrates the effectiveness of our model in modeling the relation vectors by neighborhood interactions. Parameters Analysis. We also conduct experiments to investigate the impacts of the number of sampled neighbors K, the joint learning weight γ and the margin value τ on the two datasets. The results on NDCG@10 are shown in Fig. 3 , from which we have the following findings: -Performances on the two datasets both boost quickly first and then decrease when K varies from 5 to 100. This is reasonable since when neighbors are not enough, the larger K is, the more beneficial information will be incorporated. However, once K exceeds a certain value, the performance will be harmed by the noise introduced by neighbors. -With the increase of γ, performance on the two datasets also improve first and then decrease. Furthermore, the optimal results on the two dataset are both achieved when γ is relatively small (0.1 and 0.2 respectively). It conforms to the reality since the interaction modeling part in item domain dominates the recommendation task. -The margin value τ shows similar impact as K and γ. With a relative small τ , users and items would be squeezed tightly in the Euclidean space and results in the geometrical restriction issue as CML [16] . However, if τ exceeds certain value, the similarities among users and items may not retain as they need to meet the distance requirements first. Cold-Start Testing. In order to validate whether our proposed RML-DGATs model could mitigate the cold-start issue, we conduct experiments on different sparsity levels of user groups. Firstly, similar to [21] , users are categorized into four groups according to the number of interacted items they have in the training set (i.e. <6, 6-20, 21-35, >35) and we term users in the first group as cold-start users. Then we take LRML as a baseline for its competitive performance as a nonsocial model. Results of the two methods on the two datasets w.r.t. NDCG@10 are shown in Fig. 4 (Results on HR@10 show similar trends). Compared with LRML, RML-DGATs achieves much better performance in all cases. Generally speaking, the sparser the training instances are, the more improvement will it obtain. For example, the relative improvements of our model on the four user groups are 6.32%, 4.15%, 3.76% and 2.43% respectively on Ciao. The rationale behind is that social information could work as the complement to user-item interactions and provide evidence to users' preferences. For users in the cold-start group which lack sufficient interactions, they will ask for more social relations to make up for the deficiency. In addition, Epinions achieves 3.71% of relative improvement in cold-start scenario, which is obviously smaller than on Ciao, this could be attributed to that interactions and social relations in Epinions are both much sparser than Ciao. The proposed RML-DGATs leverages two different relation vectors to model the complex relations between user-item in item domain and user-user in social domain. To investigate the effectiveness of the two dual GATs and the metric learning component, we compare our model with the following four variants: -RML-Avg: This variant replaces the two dual GATs with mean operations, namely averaging neighbors' latent factors to obtain the corresponding neighborhood-based representations. -RML-IGATs: The dual GATs in social domain are eliminated and mean aggregations are imposed. -RML-SGATs: This variant is similar to RML-IGATs except that we remove the dual GATs in item domain instead. -DGATs: It regards the inner product of two neighborhood-based representations in specific domain as the relation score and incorporates it into the log loss as NeuMF [15] . We report the results of all variants and the proposed model on two datasets w.r.t. HR@10 and NDCG@10 in Fig. 5 . As shown in this figure, RML-IGATs and RML-SGATs both show better performances than RML-Avg, which demonstrates the importance of discriminating the contributions of different neighbors when aggregating their features. In addition, RML-SGATs performs better than RML-IGATs on the two metrics. The rationale behind may be that social connections does not necessary guarantee similar preferences while user's consumed items typically conform to her tastes. Therefore, considering the different contributions of social neighbors will bring more benefits. Furthermore, although all with two dual GATs, DGATs performs poorly than both RML-IGATs and RML-SGATs. This is not surprising since DGATs ignores the constructions of two relation vectors which could translate each user to both multiple item-aware and social-aware representations. Finally, our RML-DGATs consistently and significantly outperforms all the variants, which further verifies the effectiveness of the two carefully designed dual GATs in aggregating neighborhood information and also the metric learning framework for finer-grained user and item modeling. In this paper, we propose a metric learning-based model named RML-DGATs for social recommendation. Specifically, user-item interaction and social relation in corresponding domains are modeled as interaction vector and social vector respectively. Then we model the two relation vectors by neighborhood interactions with two carefully designed dual GATs to fully encode the neighborhood information. Finally, the two domains are jointly trained to mutually enhance each other. Extensive experiments on two real-world datasets demonstrate the superiority of our model, especially in the cold-start scenarios. In the future, we are interested in exploring the dynamic diffusion of user's interests in social network. Social attentional memory network: modeling aspect-and friend-level differences in recommendation Deep modeling of social relations for recommendation A simple but effective method to incorporate trusted neighbors in recommender systems TrustSVD: collaborative filtering with both the explicit and implicit influence of user trust and of item ratings Neural collaborative filtering Fast matrix factorization for online recommendation with implicit feedback Collaborative metric learning A matrix factorization technique with trust propagation for recommendation in social networks FISM: factored item similarity models for top-N recommender systems Factorization meets the neighborhood: a multifaceted collaborative filtering model Matrix factorization techniques for recommender systems Learning to recommend with social trust ensemble Collaborative translational metric learning BPR: Bayesian personalized ranking from implicit feedback Latent relational metric learning via memory-based attention for collaborative ranking Graph attention networks Unifying user-based and item-based collaborative filtering approaches by similarity fusion Item silk road: recommending items from information domains to social users Social networking meets recommender systems: survey. IJSNM Adaptive implicit friends identification over heterogeneous network for social recommendation Leveraging social connections to improve personalized ranking for collaborative filtering