key: cord-0043205-et6u91gf authors: Wei, Dongjun; Liu, Yaxin; Zhu, Fuqing; Zang, Liangjun; Zhou, Wei; Lu, Yijun; Hu, Songlin title: AutoSUM: Automating Feature Extraction and Multi-user Preference Simulation for Entity Summarization date: 2020-04-17 journal: Advances in Knowledge Discovery and Data Mining DOI: 10.1007/978-3-030-47436-2_44 sha: 9157db1db52832f92a408dbb267004f9ac057996 doc_id: 43205 cord_uid: et6u91gf With the growth of knowledge graphs, entity descriptions are becoming extremely lengthy. Entity summarization task, aiming to generate diverse, comprehensive and representative summaries for entities, has received an increasing interest recently. In most previous methods, features are usually extracted by the hand-crafted templates. Then the feature selection and multi-user preference simulation take place, depending too much on human expertise. In this paper, a novel integration method called AutoSUM is proposed for automatic feature extraction and multi-user preference simulation to overcome the drawbacks of previous methods. There are two modules in AutoSUM: extractor and simulator. The extractor module operates automatic feature extraction based on a BiLSTM with a combined input representation including word embeddings and graph embeddings. Meanwhile, the simulator module automates multi-user preference simulation based on a well-designed two-phase attention mechanism (i.e., entity-phase attention and user-phase attention). Experimental results demonstrate that AutoSUM produces the state-of-the-art performance on two widely used datasets (i.e., DBpedia and LinkedMDB) in both F-measure and MAP. Semantic data enables users or machines to comprehend and manipulate the conveyed information quickly [10] . In major knowledge graphs, semantic data describes entities by Resource Description Framework (RDF) triples, referred as triples [4] . With the growth of knowledge graphs, entity descriptions are becoming extremely lengthy [23] . Since Google first released the knowledge graph, "get the best summary" for entities has been one of the main contributions in Google Search 1 [25] . Specifically, Google Search returns a top-k subset of triples which can best describe the entity from a query on the right-hand side of the result pages [15] . Motivated by the success of Google Search, entity summarization task has received an increasing interest recently [7, 25] , it aims to generate diverse, comprehensive and representative summaries for entities. In addition, entity summarization has been integrated into various applications such as document browsing, Question Answering (QA), etc. [15] . Most previous entity summarization methods are adopted from random surfer [4] , clustering [9, 10] and Latent Dirichlet Allocation (LDA) [19] models, depending too much on the hand-crafted templates for feature extraction as well as human expertise for feature selection. Meanwhile, entities are capable to represent diverse information (or multi-aspect information) in knowledge graphs [21] , resulting in different user preference (sometimes multi-user preference [27] ). Take entity Triathlon at the 2000 Summer Olympics Men's in DBpedia 2 for instance, different users may prefer to the medal, event or type of this entity, respectively. In order to generate more diverse summaries, the specific model needs to be selected for providing a more distinguishable multi-user preference simulation [9, 21] . However, due to the countless quantities and unpredictable types of entities in real large-scale knowledge graphs, extracting discriminative features or selecting suitable models based on human expertise could be arduous [15] . In this paper, a novel integration method called AutoSUM is proposed for automatic feature extraction and multi-user preference simulation to overcome the drawbacks of above previous models. There are two modules in Auto-SUM: extractor and simulator. The extractor module operates automatic feature extraction based on a BiLSTM with a combined input representation including word embeddings and graph embeddings. Meanwhile, the simulator module automates multi-user preference simulation based on a well-designed two-phase attention mechanism (i.e., entity-phase attention and user-phase attention). Experimental results demonstrate that AutoSUM produces the state-of-the-art performance on two widely used datasets (i.e., DBpedia and LinkedMDB 3 ) in both F-measure and MAP. Previous entity summarization methods mainly rely on human expertise. To find the most central triples, RELIN [4] and SUMMARUM [24] compute the relatedness and informativeness based on the features extracted from hand-crafted templates. Meanwhile, FACES [9] and ES-LDA [19] introduce a clustering algorithm and LDA model for capturing multi-aspect information, respectively. In order to generate more diverse summaries, the specific models need to be selected for providing a more distinguishable multi-user preference simulation [9, 19] . However, due to the countless quantities and unpredictable types of entities in the real large-scale knowledge graphs, extracting discriminative features and selecting suitable models based on human expertise could be arduous. Recently, deep learning methods relieve the dependency on human expertise in Natural Language Processing (NLP) [17] community. To generate the summaries without human expertise, an entity summarization method with a single-layer attention (ESA) [29] is proposed to calculate the attention score for each triple. Then top-k triples which have the highest attention scores are selected as the final results. However, ESA cannot extract features and capture multi-aspect information with the single-layer attention mechanism. Following ESA work, our proposed AutoSUM automates feature extraction and multi-user preference based on a novel extractor-simulator structure. In extractor, a BiL-STM with a combined input representation is utilized for feature extraction. The word embeddings and graph embeddings are included. Meanwhile, in simulator, a two-phase attention mechanism is designed for multi-user preference simulation. An RDF triple is composed of a subject, a predicate, and an object. In major knowledge graphs, an entity of which is then defined as a subject with all predicates and corresponding objects to those predicates. When a user queries an entity in a knowledge graph, a set of triples {t 1 , t 2 , · · · , t n } related with the entity will be returned, referred as an entity description document d, where t i is the i-th triple in d. Following Google Search [7, 15] , given a positive integer k, the summary of an entity is a top-k subset of d which can best describe the entity. As shown in Fig. 1 , AutoSUM has a novel extractor-simulator structure. The extractor extracts the features of triples in d as h = {h 1 , h 2 , · · · , h n }, where h i is the feature vector of t i . Given h, the simulator calculates the attention scores a = {a 1 , a 2 , · · · , a n }, where a i is the attention score of t i . Then top-k triples with the highest attention scores will be selected as the summary of an entity. The extractor module in AutoSUM aims at extracting features of triples automatically. In this section, we introduce the input representation and the automatic feature extraction in details. Input Representation: As discussed above, the triples related with an entity share the same subject with different predicates and corresponding objects to those predicates. In order to map predicates and objects into a continuous vector space for feature extraction, we apply a combined input representation method including word embeddings and graph embeddings. Then we concatenate the embeddings of the predicates and corresponding objects as the representation for each triple. Word Embedding: Learning word embeddings has been an effective method to enhance the performance of entity summarizers. In ES-LDA ext [20] , Pouriyeh et al. stated the key point of learning word embeddings was the definition for "words". Following Pouriyeh's work, we extract predicates and objects of triples as our words. Take "http://dbpedia.org/ontology/goldMedalist" for instance, we extract "goldMedalist" as the word for the above predicate. Given the embeddings of words, we then initialize a word embedding (lookup) table for future training. Graph Embedding: Obviously, simple word embeddings cannot represent triples with a graph structure. To fully encode the graph information, we utilize a graph embedding technique called TransE [3] to pretrain the whole knowledge graph in the dataset. Given the embeddings of tirples, we then initialize a graph embedding table for future training. Automatic Feature Extraction. In Named Entity Recognition (NER) task, the bidirectional LSTM (BiLSTM) has been widely used for automatic feature extraction [14] . For instance, in order to automatically extract features from a small and supervised training corpus, an LSTM-CRF model was proposed by Lample et al. [14] , utilizing a BiLSTM for feature extraction and conditional random fields [13] for entity recognition. The BiLSTM extracted representative and contextual features of a word, aligning with other words in the same sentence [8] . As for summarizing entities, we also apply a BiLSTM to extract features of a triple, aligning with other triples related with the same entity. Specifically, due to the uncertain timing sequence of triples, we first map (serialize) the triples into a sequence comes randomly. Then we feed the input representation of triples in the sequence to the BiLSTM, and take the outputs as the extracted features for those triples. The simulator in AutoSUM aims at simulating multi-user preference based on a well-designed two-phase attention mechanism (i.e., entity-phase attention and user-phase attention). Entity-phase attention captures multi-aspect information from an entity, user-phase attention then simulates multi-user preference based on the captured information. In this section, we present the details of entityphase attention and user-phase attention. Entity-Phase Attention. The intuition of entity-phase attention is straightforward. Since the single-layer attention mechanism in ESA [29] cannot capture multi-aspect information, we then design a multi-aspect attention mechanism with multiple (stacked) attention layers to overcome the drawback of ESA. One seminal work using stacked attention layers is neural machine translation (NMT) [17] , where the stacked attention layers (Transformer) [26] are utilized to capture the multi-aspect information from a sentence. To our knowledge, we are the first to utilize the stacked attention layers to capture the multi-aspect information from an entity. Specifically, different attention layers capture information from an entity in different aspects. In each attention layer, a general attention function [17] is utilized to calculate the relevance between each triple and the information captured from the attention layer, termed attention scores. Here, instead of combining all attention layers to generate overall attention scores of Transformer [26] , we directly output the attention scores from each attention layer for multi-user preference simulation in user-phase attention. Notice that the number of the attention layers is a hyper-parameter which can be tuned during training. User-Phase Attention. When users browse triples, they will allocate high preference values (more attention) to triples which are more related with the information they are interested in [9] . Meanwhile, as described above, entityphase attention consists of different attention layers for capturing information in different aspects. In each attention layer, a general attention function is utilized to allocate higher attention scores to the triples which are more relevant to the information captured from the attention layer. To simulate the preference of users who are interested in the information captured by the current attention layer, user-phase attention assigns the user preference values of each triple with the same attention scores from the attention layer. Then different distributions of attention scores in different attention layers simulate the different preference of different users (multi-user preference). After simulating the multi-user preference, we have to allocate different attention scores for different user preference rather than treating them equally. The main reason is that some user preference may represent the preference of most users for an entity, while others may represent the preference of few users for the same entity. Allocating proper attention scores for each user preference is critical to generate a more comprehensive entity summarization result. Therefore, we combine a BiLSTM with a general attention score function for allocation. In NER, a BiLSTM can maintain the independence and capture the intrinsic relationships among words [8] . Similarly, a BiLSTM is adopted in user-phase attention to preserve independence as well as capture the intrinsic relationships between different user preference. Then the outputs of the BiLSTM are taken as the inputs to a general attention score function, in order to allocate attention scores for each user preference. At last, we integrate all the user preference based on the allocated attention scores. In addition, due to the uncertain order in user preference like triples, we also randomly map the user preference into a sequence as our input of the BiLSTM. In this section, we demonstrate the complete pipeline of AutoSUM. As described in Sect. 3.1, the input of AutoSUM is an entity description document d = {t 1 , t 2 , · · · , t n }. Here, t i is the i-th triple in d, which is composed of a same subject s, a predicate p i and an object o i . Given d, we first split d into a predicate set p = {p 1 , p 2 , · · · , p n } and an object set o = {o 1 , o 2 , · · · , o n }, respectively. Given p and o, we combine word embeddings and graph embeddings to map p i and o i into a continuous vector space and concatenate them as e i , recursively. Given e = {e 1 , e 2 , · · · , e n }, we randomly map e into a sequence q = (q 1 , q 2 , · · · , q n ). Then we apply a BiLSTM to extract the features vector h i of q i as follows, where − → c and ← − c are the final hidden states in forward and backward LSTM networks. Given h = {h 1 , h 2 , · · · , h n } and c, we utilize the multi-aspect attention mechanism to capture multi-aspect information. Specifically, for the j-th attention layer in multi-aspect attention mechanism, we calculate the attention score s i j for triple t i with a general score attention function as follows, where W j is a parameter matrix of the general attention score function in the j-th attention layer, and m is the number of attention layers in the multi-aspect attention mechanism. Given s = {s 1 , s 2 , · · · , s m }, we then simulate the preference of the j-th user u j who is interested in the information of triple t i captured by the j-th attention layer as follows, where u i j is the preference value allocated to triple t i by u j . Given u = { u 1 , u 2 , · · · , u m }, we randomly map u into a sequence q * = (q * 1 , q * 2 , · · · , q * m ) and utilize a BiLSTM to encode u j into u * j as follows, where − → c * and ← − c * are the final hidden states from forward and backward LSTM networks. Then we calculate the attention score for user preference as follows, where W * is a parameter matrix of the general attention score function. Having obtained a * , we integrate different user preference to generate the final attention score for each triple t i in d as follows, Finally, we employ cross-entropy loss and define the loss function L for Auto-SUM, L(a, a) = CrossEntropy(a, a). Here, a = {a 1 , a 2 , · · · , a n } is a gold(real) attention score vector associated with above entity from ESBM dataset. Specifically, we count the frequency of the i-th triple t i selected by users in ESBM dataset following ESA work, denoted as c i . Then the gold attention score α i of t i is formulated as follows, Dataset. In this paper, we utilize ESBM dataset v1.1, which consists of 6.8k triples related with 125 entities from DBpedia [2] and 2.6k triples related with 50 entities from LinkedMDB [5] . Given an entity, ESBM asks 5 different users to select top-5 and top-10 triples which can best describe the entity. In addition, ESBM provides an evaluator for the comparison of different entity summarization methods. Both datasets and evaluator can be accessed from the ESBM website 4 . Our baselines consist of some existing state-of-the-art entity summarization methods, including RELIN [4] , DIVERSUM [21] , CD [30] , FACES [9] , LinkSUM [23] , MPSUM [28] and ESA [29] . MPSUM 5 is an open source implementation of ES-LDA. To provide ablation studies, we also modify the original AutoSUM into 5 different versions, denoted as AutoSUM 1∼5 , which will be futher illustrated in Sect. 4.3. Evaluation Methodology. Summarization tasks can be mainly divided into extractive and non-extractive tasks [1, 16] , which orient to unstructured and structured data, respectively. Sydow et al. [22] stated that entity summarization task could be treated as an extractive task of information retrieval (IR). IR returns the most relevant documents for a query, while entity summarization selects the top-k triples related with an entity. Following previous work, we utilize F-measure and mean average precision (MAP) metrics for evaluation, which are two standard evaluation metrics in IR [12, 15] . F-measure is the harmonic mean of recall and precision, and MAP is the mean average of precision. Meanwhile, given the limited number of entities in ESBM, we conduct 5-fold cross-validation to reduce the risk of overfitting without losing the number of learning instances [11] . Specifically, the entities in ESBM are divided into 5 folds randomly. The parameters for each model are tuned on 4-of-5 folds. The final fold in each case is utilized to evaluate the optimal parameters. Since ESA has significantly better than all other state-of-the-art methods in our baselines, we then compare the statistical significance among ESA and AutoSUMs (i.e., the original AutoSUM and the modified AutoSUM 1∼5 , respectively) utilizing Student's paired t-test (p-value ≤ 0.05) [12] . Experimental Details. For experimental details, we tune the parameters on a validation set (i.e., a part of the training set). Specifically, to learn graph embeddings, we utilize TransE to pretrain the whole ESBM dataset. Here, the dimension of each triple is set to 100. As for word embeddings, we initialize the lookup table randomly, where the dimension of each word is set to 100. Then we apply a BiLSTM with a single layer in each LSTM cell for feature extraction, where the number of the layers in multi-aspect mechanism is set to 6. In addition, the graph embedding of each triple is fixed after pretraining, while all other parameters in AutoSUM are initialized randomly and tuned without weight sharing. We train the AutoSUM model for 200 epochs, and report the results of the best epoch under early stopping. As shown in Table 1 and 2, AutoSUM is significantly better than some existing state-of-art methods in our baselines. Comparison with Traditional Methods: Compared with traditional methods depending on manual feature extraction and multi-user preference simulation, AutoSUM automates the above processes without any human expertise effectively. The average improvement of AutoSUM over the best outperforming traditional methods is 38% and 36%, in terms of F-measure and MAP, respectively. Comparison with Deep Learning Methods: Compared with ESA, which calculates attention scores without feature extraction and multi-user preference, AutoSUM achieves the state-of-the-art performance. The average improvement of AutoSUM over ESA is 26% and 23%, in terms of F-measure and MAP, respectively. In addition, we track the attention scores of entity Triathlon (Triathlon at the 2000 Summer Olympics Men's) in user-phase attention, as shown in Fig. 2 . We can observe that the user-phase attention simulates 3 groups of user preference of the entity, and the entity-phase attention allocates high attention scores to users who prefer medal as well as event than property, which is in accordance with the preference of most users in real world. In this section, we provide ablation studies to demonstrate the effectiveness of the primary modules in AutoSUM. AutoSUM 1 : To evaluate the features extracted by AutoSUM, AutoSUM 1 removes the BiLSTM in extractor and feeds the input representation of triples into simulator directly. Experimental results show the original AutoSUM is significantly better than AutoSUM 1 , proving that the BiLSTM extracts highquality features for user-preference simulation. AutoSUM 2 and AutoSUM 4 : To explore whether the attention scores of different user preference are appropriate, AutoSUM 2 removes the BiLSTM in simulator and allocates equal attention scores for each user preference. Meanwhile, we also attempt to replace the BiLSTM with an FCN, referred as Auto-SUM 4 . As shown in Table 1 and 2, the original AutoSUM gains a significant improvement over AutoSUM 2 and AutoSUM 4 , indicating the BiLSTM with a general attention function allocates appropriate attention scores for each user preference. In addition, we can observe that the performance of FCN (AutoSUM 2 ) is even worse than allocating equal attention scores (AutoSUM 4 ) in our experiments. AutoSUM 3 : For comparison, AutoSUM 4 removes the BiLSTM in both extractor and simulator. Experimental results show that the performance of Auto-SUM 3 is worse than AutoSUM 1 and AutoSUM 2 , which remove the BiLSTM in extractor and simulator respectively, further proving the irreplaceable role of BiLSTM in AutoSUM. AutoSUM 5 To explore whether the multi-aspect mechanism captures the multiaspect information from an entity, we replace the multi-aspect mechanism with a single-aspect mechanism, i.e., setting the number of attention layers to 1. As shown in Table 1 and 2, we can observe that the original AutoSUM outperforms AutoSUM 5 in both F-measure and MAP. Experimental results indicate that the multi-aspect attention mechanism successfully captures the multi-aspect information. We also notice that AutoSUM 5 with a single-layer attention mechanism still outperforms all other methods in our baselines including ESA. In this paper, we propose a novel integration model called AutoSUM to automate feature extraction and multi-user preference simulation for entity summarization. The performance of our proposed AutoSUM is significantly better than other state-of-the-art methods in both F-measure and MAP. Meanwhile, sufficient ablation studies are provided to demonstrate the effectiveness of each module in AutoSUM. In the future, we expect to expand the ESBM dataset and introduce the notion of AutoSUM into other applications such as recommender systems [6, 18] . Data summarization: a survey Dbpedia -a crystallization point for the web of data Translating embeddings for modeling multi-relational data RELIN: relatedness and informativeness-based centrality for entity summarization Managing linked data on the web: the LinkedMDB showcase An attentive spatio-temporal neural model for successive point of interest recommendation Official Google Blog: Introducing the knowledge graph: Things, not strings Generating sequences with recurrent neural networks Faces: diversity-aware entity summarization using incremental hierarchical conceptual clustering Gleaning types for literals in RDF triples with application to entity summarization A deep relevance matching model for ad-hoc retrieval Technical brief: agreement, the f-measure, and reliability in information retrieval Conditional random fields: probabilistic models for segmenting and labeling sequence data Neural architectures for named entity recognition Entity summarization: State of the art and future challenges Graph summarization methods and applications: a survey Effective approaches to attention-based neural machine translation A novel top-N recommendation approach based on conditional variational auto-encoder ES-LDA: entity summarization using knowledge-based topic modeling Combining word embedding and knowledge-based topic modeling for entity summarization DIVERSUM: towards diversified summarisation of entities in knowledge graphs The notion of diversity in graphical entity summarisation on semantic knowledge graphs LinkSUM: using link analysis to summarize entity data Browsing DBpedia entities with summaries Fuse: entity-centric data fusion on linked data Attention is all you need User preference-aware review generation MPSUM: entity summarization with predicate-based matching ESA: entity summarization with attention Generating characteristic and diverse entity summaries This research is supported in part by the Beijing Municipal Science and Technology Project under Grant Z191100007119008.