key: cord-0123643-iom6h0nu authors: Christofidellis, Dimitrios; Manica, Matteo; Georgopoulos, Leonidas; Vandierendonck, Hans title: Understood in Translation, Transformers for Domain Understanding date: 2020-12-18 journal: nan DOI: nan sha: 24980b747505e479c7a63fecba8ab2056ff2efe9 doc_id: 123643 cord_uid: iom6h0nu Knowledge acquisition is the essential first step of any Knowledge Graph (KG) application. This knowledge can be extracted from a given corpus (KG generation process) or specified from an existing KG (KG specification process). Focusing on domain specific solutions, knowledge acquisition is a labor intensive task usually orchestrated and supervised by subject matter experts. Specifically, the domain of interest is usually manually defined and then the needed generation or extraction tools are utilized to produce the KG. Herein, we propose a supervised machine learning method, based on Transformers, for domain definition of a corpus. We argue why such automated definition of the domain's structure is beneficial both in terms of construction time and quality of the generated graph. The proposed method is extensively validated on three public datasets (WebNLG, NYT and DocRED) by comparing it with two reference methods based on CNNs and RNNs models. The evaluation shows the efficiency of our model in this task. Focusing on scientific document understanding, we present a new health domain dataset based on publications extracted from PubMed and we successfully utilize our method on this. Lastly, we demonstrate how this work lays the foundation for fully automated and unsupervised KG generation. Knowledge Graphs (KGs) are among the most popular data management paradigms and their application is widespread across different fields, e.g., recommendation systems, question-answering tools and knowledge discovery applications. This is due to the fact that KGs share simultaneously several advantages of databases (information retrieval via structured queries), graphs (representing loosely or irregularly structured data) and knowledge bases (representing semantic relationship among the data). KG research can be divided in two main streams (Ji et al. 2020 ): knowledge representation learning, which investigates the representation of KG into vector representations (KG embeddings) , and knowledge acquisition, which considers the KG generation process. The latter being a fundamental aspect since a malformed graph will not be able to serve reliably any kind of downstream task. The knowledge acquisition process is either referred to KG construction, where the KG is built from scratch using a specific corpus, or KG specification, where a subgraph of interest is extracted from an existing KG. In both cases, the acquisition process can follow a bottom-up or top-down approach (Zhao, Han, and So 2018) . In a bottom-up approach, all the entities and their connections are extracted as a first step of the process. Then, the underlying hierarchy and structure of the domain can be inferred from the entities and their connections. Conversely, a top-down approach starts with the definition of the domain's schema that is then used to guide the extraction of the needed entities and connections. For general KG generation, a bottom-up approach is usually preferred as we typically wish to include all entities and relations that we can extract from the given corpus. Contrarily, a top-down approach better suits a domainspecific KG generation or KG specification, where entities and relations are strongly linked to the domain of interest. The structure of typical bottom-up and top-down pipelines, focusing on the case of KG generation, are presented in figures 1a and 1b respectively. Herein, we focus on domain-specific, i.e., top-down, acquisition for two main reasons. Firstly, the acquisition process can be faster and more accurate in this way. By specifying the schema of the domain of interest, then we only need to select the proper and needed tools (i.e. pretrained models) for the actual entity and relation extraction. Secondly, such approach minimizes the presence of irrelevant data and restricts queries and graph operations to a carefully tailored KG. This generally improves the accuracy of KG applications (Lalithsena, Kapanipathi, and Sheth 2016) . Furthermore, the graph's size is significantly reduced by excluding irrelevant content. Thus, execution time of queries can be reduced by more than one order of magnitude (Lalithsena, Kapanipathi, and Sheth 2016) . The domain definition is usually performed by subject matter experts. Yet, knowledge acquisition by expert curation can be extremely slow as the process is essentially manual. Moreover, human error may affect the data quality and lead to malformed KGs. In this work, we propose to overcome these issues by introducing an automated machine learning-based approach to understand the domain of a collection of text snippets. Specifically, given sample input texts, we infer the schema of the domain to which they belong. This task can be incorporated into both domainspecific KG generation and KG specification process, where the domain definition is the essential first step. For the KG generation, the input texts can be samples from the corpus of interest, while for the KG specification, these text snippets can express possible questions that need to be answered from the specified KG. We introduce a seq2seq-based model relying on transformer architecture to infer the relation types characterizing the domain of interest. Such model lets us to define the domain's schema including all the needed entity and relation types. The model can be trained using any available previous schema (i.e., schema of a general KG like DBpedia) and respective text examples for each possible relation type. We show that our proposed model outperforms other baseline approaches, it can be successfully utilized for scientific documents and it has interesting potential extension in the field of automated KG generation. At the best of our knowledge, our method is the first attempt to introduce a supervised machine learning based domain understanding tool that can be incorporated into domainspecific KG generation and specification pipelines. Currently, the main research interest related to KG generation workflows is associated with attempts to improve the named entity recognition (NER) and the relation extraction tasks or provide end-to-end pipelines for general or domain-specific KG generation (Ji et al. 2020) . The majority of such work focuses on the actual generation step and rely solely on manual identification of the domain definition (Luan et al. 2018; Manica et al. 2019; Wang et al. 2020) . As it concerns the KG specification field, the subgraph extraction is usually based on graph traversals or more sophisticated heuristics techniques and some providing initial entities or entity types (Lalithsena, Kapanipathi, and Sheth 2016) . Such approaches are effective, yet a significant engineering effort is required to tune the heuristics for each different case. Let alone, the crucial task of proper selection of the initial entities or entity types is mainly performed manually. The relation extraction task is also related to our work. It aims at the extraction of triplets of the form of (subject, relation, object) from the texts. The neural network based methods, such as Nguyen and Grishman; Zhou et al.; Zhang et al., dominate the field. These methods are CNN (Zeng et al. 2014; Nguyen and Grishman 2015) or LSTM (Zhou et al. 2016; Zhang et al. 2017 ) models, which attempt to identify relations in a text given its content and information about the position of entities in it. The positional information of the entities is typically extracted in a previous step of KG generation using NER methods (Nadeau and Sekine 2007) . Lately, there is a high interest of methods that can combine the NER and relation extraction tasks into a single model (Zheng et al. 2017; Zeng et al. 2018; Fu, Li, and Ma 2019) . While our work is linked to relation extraction it has two major differences. Firstly, we focus on the relation type and the entity types that compose a relation rather than the actual triplet. Secondly, the training process differs and requires coarser annotations. We solely provide texts and the respective existing sequence of relation types. Contrarily in a typical relation extraction training process, information about the position of the entities in the text is also needed. Here, we propose to improve knowledge acquisition by performing a data-driven domain definition providing an approach that is currently unexplored in KG research. The domain understanding task attempts to uncover the structured knowledge underlying a dataset. In order to depict this structure we can leverage the so called domain's metagraph. A domain's metagraph is a graph that has as vertices all the entity types and as edges all their connections/relations in the context of this domain. The generation of such a metagraph entails obtaining all the entity types and their relations. Assuming that each of entity types that are presented in the domain has at least one interaction with another entity type, the metagraph of this domain can be produced by inferring all the possible relation types as all the entity types are included in at least one of them. Thus, our approach aims to build an accurate model to detect a domain's relation types, and leverages this model to extract those relations from a given corpus. Aggregating all extracted relations yields the domain's metagraph. Sequence to sequence models (seq2seq) Cho et al. 2014; Sutskever, Vinyals, and Le 2014; Jozefowicz et al. 2016) attempt to learn the mapping from an input X to its corresponding target Y where both of them are represented by sequences. To achieve this they follow an encoder-decoder based approach. Encoders and decoders can be recurrent neural networks or convolutional based neural networks (Gehring et al. 2017 ). In addition, an attention mechanism can also be incorporated into the encoder (Bahdanau, Cho, and Bengio 2014; Luong, Pham, and Manning 2015) for further boosting of the model's performance. Lately, Transformer architectures (Vaswani et al. 2017; Devlin et al. 2018; Liu et al. 2019; Radford et al. 2018 ), a family of models whose components are entirely made up of attention layers, linear layers and batch normalization layers, have established themselves as the state of the art for sequence modeling, outperforming the typically recurrent based components. Seq2seq models have been successfully utilized for various tasks such as neural machine translation and natural language generation (Pust et al. 2015) . Recently, their scope has also been extended beyond language processing in fields such as chemical reaction prediction (Schwaller et al. 2019) . We consider the domain's relation type extraction task as a specific version of machine translation from the language of the corpus to the "relation" language that includes all the different relations between the entity types of the domain. A relation type R which connects the entity type i to j is represented as "i.R.j" in the "relation" language. In the case of undirected connections, "i.R.j" is the same as "j.R.i" and for simplicity we can discard one of them. Seq2seq models have been designed to address tasks where both the input and the output sequences are ordered. In our case the target "relation" language does not have any defined ordering as per definition the edges of a graph do not have any ordering. In theory the order does not matter, yet in practice unordered sequences will lead to slower convergence of the model and requirements of more training data to achieve our goal (Vinyals, Bengio, and Kudlur 2015) . To overcome this issue, we propose a specific ordering of the "relation" language influenced from the semantic context that the majority of the text snippets hold. According to (Zeng et al. 2018) , in the context of relation extraction, text snippets can be divided into three types: Normal, EntityPairOverlap and SingleEntityOverlap. A text snippet is categorized as Normal if none of its triplets have overlapping entities. If some of its triplets express a relation on the same pair of entities, then it belongs to the EntityPairOverlap category and if some of its triplets have one entity in common but no overlapped pairs, then it belongs to SingleEntityOverlap class. These three categories are also relevant in the metagraph case, even if we are working with entity types and relation types rather than the actual entities and their relations. Based on the given training set, we consider that the model is aware of a general domain anatomy, i.e., the sets of possible entity types and relation types are known, and we would like to identify which of them are depicted in a given corpus. In both cases of EntityPairOverlap and SingleEntityOverlap type text snippets, there is one main entity type from which all the other entity types can be found by performing only one hop traversal in the general domain's metagraph. The class of Normal text snippets is a broader case in which one can identify heterogeneous connectivity patterns among the entity types represented. Yet, a sentence typically describes facts that are expected to be connected somehow, thus the entity types included in such texts usually are not more than 1 or 2 hops away from each other in the general metagraph. In the light of the considerations above, we propose to sort the relations in a breadthfirst-search (BFS) order starting from a specific node (entity type) in the general metagraph. In this way, we confine the output in a much lower dimensional space by adhering to a semantically meaningful order. Inspired by state-of-the-art approaches in the field of neural machine translation, our model architecture is a multi-layer bidirectional Transformer. We follow the lead of Vaswani et al. in implementing the architecture, with the only difference that we adopt a learned positional encoding instead of a static one (see Appendix for further details on the positional encoding). As the overall architecture of the encoder and the decoder are otherwise the same as in Vaswani et al., we omit an in-depth description of the Transfomer model and refer readers to original paper. To boost the model's performance, we also propose an ensemble approach exploiting different Transformers and aggregating their results to construct the domain's metagraph. Each of the Transformers differs in the selected ordering of the "relation" vocabulary. The selection of different starting entity type for the breadth-first-search will lead to different orderings. We expect that multiple orderings could facilitate the prediction of different connection patterns that can not be easily detected using a single ordering. The sequence of steps for an ensemble domain understanding is the following: Firstly, train k Transformers using different orderings. Secondly, given a set of text snippets, predict sequences of relations using all the Transformers. Finally, use late fusion to aggregate the results and form the final predictions. It is worth mentioning that in the last step, we omit the underlying ordering that we follow in each model and we perform a relation-based aggregation. We examine each relation separately in order to include it or not in the final metagraph. For the aggregation step, we use the standard Wisdom of Crowds (WOC) (Marbach et al. 2012) yet other consensus methods can also be leveraged for the task. The overall structure of our approach is summarized in Figure 2 . We evaluate our Transformer-based approach against three baselines on a selection of datasets representing different domains. As baselines, we use CNN and RNN based methods influenced by (Nguyen and Grishman 2015) and (Zhou et al. 2016) respectively. For the CNN based method, we slightly modified the architecture to exclude the component which provides information about the position of the entities in the text snippet, as we do not have such information available in our task. Additionally, we also include a Transformer-based model without applying any ordering in the target sequences as an extra baseline. To our knowledge, there is no standard dataset available for the relation type extraction task in the literature. However there is a plethora of published datasets for the standard task of relation extraction that can be utilized for our case with limited effort. For our task, the leveraged datasets should contain tuples of texts and their respective sets of relation types. We use WebNLG (Gardent et al. 2017) , NYT (Riedel, Yao, and McCallum 2010) and DocRED (Yao et al. 2019) , three of the most popular datasets for relation extraction. Both NYT and DocRED datasets provide the needed information such as entity types and relation type for the triplets of each instance. Thus their transformation for our task can be conducted by just converting these triplets to the relation type format, for instance the triplet (x,y,z) will be transformed as type(x).type(y).type(z). On the other hand, WebNLG doesn't share such information for the entity types and thus manual curation is needed. Therefore, all the possible entities are examined and replaced with the proper entity type. For the WebNLG dataset, we avoid including rare entity and relation types which are occurred less than 10 times in the dataset. We either omit them or replace them with similar or more general types that exists in it. To emphasize the application of such model in the scientific document understanding, we produce a new taskspecific dataset called PubMed-DU related to the general health domain. We download paper abstracts from PubMed focusing on work related to 4 specific health subdomains: Covid-19, mental health, breast cancer and coronary heart disease. We split the abstracts into sentences. The entities and their types for each text have been extracted using Pub-Tator (Wei et al. 2019) . The available entity types are Gene, Mutation, Chemical, Disease and Species. The respective relation types are in form x.to.y where x and y are two of the possible entity types. We assume that the relations are symmetric. For text annotation, the following rule was used: a text has the relation x.to.y if two entities with types x and y co-occurred in the text and the syntax path between them contains at least one keyword of this relation type. These keywords have been manually identified based on the provided instances and are words, mainly verbs, related to the relation. Table 1 depicts the statistics of all four utilised datasets. For all datasets, we use the same model parameters. Specifically, we use Adam (Kingma and Ba 2014) optimizer with a learning rate of 0.0005. The gradients norm is clipped to 1.0 and dropout (LeCun, Bengio, and Hinton 2015) is set to 0.1. Both encoder and decoder consist of 2 layers with 10 attention heads each, the positional feed-forward hidden dimension is 512. Lastly, we utilize the token embedding layers using GloVe pretrained word embeddings (Pennington, Socher, and Manning 2014) which have dimensionality of m=300. Our code and the datasets are available at https://github.com/christofid/DomainUnderstanding. The evaluation of the models is performed both at instance and graph level. In the instance level, we examine the ability of the model to predict the relation types that exist in a given text. To investigate this, we use F1-score and accuracy. F1score is the harmonic mean of model's precision and recall. Accuracy is computed at an instance level and it measures for how many of the testing texts, the model manage to infer correctly the whole set of their relation types. For the metagraph level evaluation, we use our model to predict the metagraph of a domain and we examine how close to the actual metagraph is. For this comparison, we utilize F1-score for both edges and nodes of the metagraph as well as the similarity of the distribution of the degree and eigenvector centrality (Zaki and Meira 2014) of the two metagraphs. For the comparison of the centralities distribution, we construct the histogram of the centralities for each graph using 10 fixed size bins and we utilize Jensen-Shannon Divergence (JSD) metric (Endres and Schindelin 2003) to examine the similarity of the two distributions (see Appendix for the definition of JSD). We have selected degree and eigenvector centralities as the former gives as localized structure information as measure the importance of a node based on the direct connections of it and the latter gives as a broader structure information as measure the importance of a node based on infinite walks. To study the performance of our model, we perform 10 independent runs each with different random splitting of the datasets into training, validation and testing set. picts the median value and the standard error of the baselines and our method for the two metrics. Our method is better in terms of accuracy for all the four datasets and in terms of F1score for the WebNLG and DocRED datasets. For the NYT and PubMed-DU datasets, the F1-score of CNN and RNN models outperform our approach. We observed that the baseline models profit from the fact that, in these datasets, the majority of the instances depict only one relation and many of the relations appear in a limited number of instances. In general, there is lack of sequences of relations that hinders the Transformer's ability to learn the underlying distribution in these two cases(see Appendix). Lastly, the decreased performances of all the models in the DocRED dataset is due to the long tail characteristic that this dataset shows as 66% of the relations appeared in no more than 50 instances (see Appendix). The above comparisons focus only on the ability of the model to predict the relation types given a text snippet. Since our ultimate goal is to infer the domain's metagraph from a given corpus, we divide the testing sets of the datasets into small corpora and we attempt to define their domain using our model. For WebNLG, NYT and DocRED dataset, 10 artificial corpora and their respective domains have been created by selecting randomly 10 instances from each of the testing sets. We set two constraints into this selection to assure that the produced metagraphs are meaningful. Firstly, each subdomain should have a connected metagraph and secondly each existing relation type is appeared at least two times in the provided instances. For the PubMed-DU dataset, we already know the existence of 4 subdomains in it, so we focus on the inference of them. For each subdomain, we select randomly 100 instances from the testing set that belong to this subdomain and we attempt to produce the domain based on them. We infer the relation types for each instance and then we generate the domain's metagraph by including all the relation types that were found in the instances. Then, we compare how close the actual domain's metagraph and the predicted metagraph are. Table 3 presents the results of the evaluation of the predicted versus the actual domain's metagraph for 10 subdomains extracted from the testing set of the WebNLG, NYT and DocRED datasets. All the presented values for these datasets are the mean over all the 10 subdomains. For the PubMed-DU dataset, we include only the Covid-19 subdomain case. Results for the remaining subdomains of this dataset can be found in the Appendix. Our approach using Transformer + BFS based ordering outperforms or is close to the baselines for all cases in terms of edges and nodes F1score. Furthermore, the degree and eigenvector centralities distribution of the generated metagraphs using our method are closer to the groundtruth in comparison to other methods in all cases. This indicates that the graphs produced with our method are both element-wise and structurally closer to the actual ones. More detailed comparisons of the different methods at both instance and metagraph level have been included in the Appendix. The ensemble variant of our approach, based on the WOC consensus strategy, outperforms the simple Transformer + BFS ordering in all cases. Based on the evaluation at both instance and metagraph level, our ensemble variant seems to be the most reliable approach for the task of domain's relation type extraction as it achieves some of the best scores for any dataset and metric. The proposed domain understanding method enables the inference of the domain of interest and its components. This enables a partial automation and a speed up of the KG generation process as, without manual intervention, we are able to identify the metagraph, and inherently the needed models for the entity and relation extraction in the context of the domain of interest. To achieve this, we adopt a Transformerbased approach that relies heavily on attention mechanisms. Recent efforts are focusing on the analysis of such attention mechanisms to explain and interpret the predictions and the quality of the models (Vig and Belinkov 2019; Hoover, Strobelt, and Gehrmann 2019) . Interestingly, it has been shown how the analysis of the attention pattern can elucidate complex relations between the entities fed as input to the Transformer, e.g., mapping atoms in chemical reactions with no supervision (Schwaller et al. 2020) . Even if it is out of the scope of our current work, we observe that a similar analysis of the attention patterns in our model can identify not only parts of text in which relations exist but directly the entities of the respective triplets. To illustrate this and emphasize its application in the domain understanding field, we extract 24 text instances from the PubMed-DU dataset related to the COVID-19 domain. After generating the domain's metagraph, we analyze the attention to triples to build a KG. We rely on the syntax dependencies to propagate the attention weights throughout the connected tokens and we examine the noun chunks to extract the entities of interest based on their accumulated attention weight (see Appendix for further details). We select the head which achieves the best accuracy in order to generate the KG. Figure 3 depicts the generated metagraph and the KG. Using the aforementioned attention analysis, we manage to achieve 82% and 64% accuracy in the entity extraction and the relation extraction respectively. These values might not be able to compete the state of the art respective models and the investigation is limited in only few instances. Yet it indicates that a completely unsupervised generation based on attention analysis is possible and deserves further investigation. Herein, we proposed a method to speed up the knowledge acquisition process of any domain specific KG application by defining the domain of interest in an automated manner. This is achieved by using a Transformer-based approach to estimate the metagraph representing the schema of the domain. Such schema can indicate the proper and needed tools for the actual entity and relation extraction. Thus our method can be considering as the stepping stone in any KG generation pipeline. The evaluation and the comparison over different datasets against state-of-the-art methods indicates that our approach produces accurately the metagraph. Especially, in datasets where text instances contain multiple relation types our model outperforms the baselines. This is an important observation as text describing multiple relations is the most common scenario. Based on that and relying on the Figure 3 : KG extracted from 24 text snippets related to the COVID-19 domain using our model and the respective attention analysis. Green color means that the respective node/edge exists in both actual and predicted graph, while pink color means that this element exists in the actual but not in the predicted graph. Entities in bold indicate the path that has been extracted from the text "ace2 and tmprss2 variants and expression as candidates to sex and country differences in covid-19 severity in italy.". capability of the transformers to catch longer dependencies, future investigation of how our model performs in larger pieces of texts, like full paragraphs, could be interesting and indicate a clearer advantage of our work. The needed definition of a general domain for the training phase might be a limitation of this method. However, schema and data from already existing KGs can be utilized for training purposes. Unsupervised or semi-supervised extension of this work can also be explored in the future to mitigate the issue. Our work paves the way towards an automated knowledge acquisition, as our model minimizes the need of human intervention in the process. So in the near future the currently needed manual curation can be avoided and lead to faster and more accurate knowledge acquisition. Interestingly, using the PubMed-DU dataset, we underline that our method can be utilized for scientific documents. The inference of their domain can assist both in their general understanding but also lead to more robust knowledge acquisition from them. As a side effect, it is also important to notice that, such attention-based model can be directly applied to triplet extraction from the text without retraining and without supervision. Triplet extraction in an unsupervised way represents a breakthrough, especially if combined with most recent advances in zero-shot learning for NER Pasupat and Liang 2014; Guerini et al. 2018) . Further analysis of our Transformer-based approach could give a better insight into these capabilities. In our model, we adopt a learned positional encoding instead of a static one. Specifically, the tokens are passed through a standard embedding layer as a first step in the encoder. The model has no recurrent layers and therefore it has no idea about the order of the tokens within the sequence. To overcome this, we utilize a second embedding layer called a positional embedding layer. This is a standard embedding layer where the input is not the token itself but the position of the token within the sequence, starting with the first token, the (start of sequence) token, in position 0. The position embedding has a "vocabulary" size equal to the maximum length of the input sequence. The token embedding and positional embedding are element-wise summed together to get the final token embedding which contains information about both the token and its position within the sequence. This final token embedding is then provided as input in the stack of attention layers of the encoder. For a better understanding of the datasets, we analyzed the distribution of occurrences for all relation types. These distributions are depicted in Figure 4 . A percentage of relation types with number of appearances close or less to 10 is observed for all datasets. The lack of many examples can pose problems in the learning process for these specific relation types. This is highlighted especially in the DocRED case, as we attributed the decreased performances of all the models in this dataset in its long tail characteristic that it holds. Especially for the DocRED, almost the 50% of the relations appeared in no more than 10 instances and the 66% of the relations appeared in no more than 50 instances (4). The Jensen-Shannon divergence metric between two probability vectors p and q is defined as: 2 where m is the pointwise mean of p and q and D is the Kullback-Leibler divergence. The Kullback-Leibler divergence for two probability vectors p and q of length n is defined as: The Jensen-Shannon metric is bounded by 1, given that we use the base 2 logarithm. In this section, the procedure of automated triplets extraction based on the predicted relation types and the respective attention weights is described . We generate an undirected graph that connects the tokens of the sentence based on their syntax dependencies for each instance. Then for each different predicted relation type, we define the final attention weights of a token based on the attention weights of itself and its neighbors in the syntax dependencies graph. Let a r be the attention vector of a predefined model's attention head, which contains all the attention weights related to the relation type r. The final attention weight w of the token i for the relation r is defined as: where neig i is the set containing all the neighbors of i in the syntax dependencies graph. Then for each noun chunk k (n k ) of the text we compute its total attention weight for the relation type r as: where nc k is the set of tokens which belong to the n k and f is a function defined as Finally, we extract as entities which are connected via the relation type r the two noun chunks with the highest weight n r . As this work is a proof of concept rather than an actual method, the selection of the attention head is based on whichever gives as the best outcome. Yet, in actual scenarios it is recommended the use of a training set, based on which the optimal head will be identified. For the creation of the syntax dependencies graph and the extraction of the noun chunks of the text we use spacy 1 and its en core web lg pretrained model. Table 4 includes all the texts that have used in the proof of concept that is presented in the main paper and the respective predicted triplets for each of them. Table 5 presents a detailed evaluation of our approach and the baselines models. We have included the 3 best BFS ordering variants (in terms of accuracy) and 3 consensus variants. To cover the range of all the available values of k, [1, number of entity types], we select a case with just a few Transformers, one with a value close to half of the total number of entity types and one close to the total number of entity types. For each different k value, we utilize the top k best orderings based on their accuracy. In addition to the per instance accuracy and the per relation F1-score, the table also includes the per relation precision and the recall of each model. Our proposed method, especially its ensemble variant, produces the best outcome in all datasets apart from the NYT case where the CNN and RNN models manage to be more precise. This is attributed to the characteristics of NYT dataset, where there are many one-only relation instances. Similarly, in tables 6 and 7 we perform an in-depth metagraph level evaluation of the models. For all three datasets, our proposed method and its ensemble extension produce the best or one of the top-3 best outcomes. Table 7 : Evaluation of metagraph's reconstruction on the 4 predefined subdomains of PubMed-DU dataset using CNN, RNN and Transformer-based models. *The architecture of the CNN and RNN models has been modified to exclude the component which provides information about the position of the entities in the text snippet. Neural machine translation by jointly learning to align and translate Learning phrase representations using rnn encoder-decoder for statistical machine translation Bert: Pre-training of deep bidirectional transformers for language understanding A new metric for probability distributions Graphrel: Modeling text as relational graphs for joint entity and relation extraction Creating training corpora for nlg micro-planning Convolutional sequence to sequence learning Toward zero-shot entity recognition in task-oriented conversational agents exbert: A visual analysis tool to explore learned representations in transformers models Exploring the limits of language modeling Harnessing relationships for domain-specific subgraph extraction: A recommendation use case Deep learning A survey on deep learning for named entity recognition Roberta: A robustly optimized bert pretraining approach Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction Effective approaches to attention-based neural machine translation An information extraction and knowledge graph platform for accelerating biochemical discoveries Wisdom of crowds for robust gene network inference A survey of named entity recognition and classification Relation extraction: Perspective from convolutional neural networks Zero-shot entity extraction from web pages Glove: Global vectors for word representation Parsing english into abstract meaning representation using syntax-based machine translation Modeling relations and their mentions without labeled text Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction Unsupervised attention-guided atommapping Sequence to sequence learning with neural networks Attention is all you need Analyzing the structure of attention in a transformer language model Order matters: Sequence to sequence for sets Covid-19 literature knowledge graph construction and drug repurposing report generation Pubtator central: automated concept annotation for biomedical full text articles Docred: A large-scale document-level relation extraction dataset Data mining and analysis: fundamental concepts and algorithms Relation classification via convolutional deep neural network Extracting relational facts by an end-to-end neural model with copy mechanism Position-aware attention and supervised data improve slot filling Architecture of knowledge graph construction techniques Joint extraction of entities and relations based on a novel tagging scheme Attention-based bidirectional long short-term memory networks for relation classification (thrombosis, Disease.to.Disease, COVID 19) (ace-2) receptor for its attachment similar to sars-cov-1, which is followed by priming of spike protein by transmembrane protease serine 2 (tmprss2) which can be targeted by a proven inhibitor of tmprss2, camostat.(spike, Gene.to.Gene, Transmembrane protease serine 2) temporal trends in decompensated heart failure and outcomes during covid-19: a multisite report from heart failure referral centres in london.(