key: cord-0163547-mlvku7h1 authors: Li, Yongkang title: An Empirical Study on Relation Extraction in the Biomedical Domain date: 2021-12-11 journal: nan DOI: nan sha: a10909a95a786243aa5b9b620964336ffeda60ae doc_id: 163547 cord_uid: mlvku7h1 Relation extraction is a fundamental problem in natural language processing. Most existing models are defined for relation extraction in the general domain. However, their performance on specific domains (e.g., biomedicine) is yet unclear. To fill this gap, this paper carries out an empirical study on relation extraction in biomedical research articles. Specifically, we consider both sentence-level and document-level relation extraction, and run a few state-of-the-art methods on several benchmark datasets. Our results show that (1) current document-level relation extraction methods have strong generalization ability; (2) existing methods require a large amount of labeled data for model fine-tuning in biomedicine. Our observations may inspire people in this field to develop more effective models for biomedical relation extraction. Relation extraction, which aims to extract relations of entities, is a fundamental problem in natural language processing. Formally, there are two different settings for relation extraction. Sentence-level relation extraction (Yao et al., 2019) focuses on each sentence, and the goal is to predict the relation of two entity mentions in a sentence. In contrast to that, document-level relation extraction (Yao et al., 2019) considers longer texts (e.g., paragraphs, documents) to predict entity relations. Relation extraction can benefit a variety of applications, including knowledge graph construction (Cowie and Lehnert, 1996) and question answering . In literature, a variety of methods have been proposed for relation extraction. Classical methods build models with Convolutional Neural Networks (CNN) (Kim, 2014) , Recurrent Neural Networks (RNN) (Gupta et al., 2016) , and Long Short-Term Memory (LSTM) networks (Zhang et al., 2017) . Despite the good performance, their performance highly depends on the quality of labeled sentences. When labeled sentences are insufficient as in most cases, these methods often suffer from poor results. Recently, with the great success of pre-trained language models (e.g., BERT) (Devlin et al., 2018; , there is a growing interest in approaching relation extraction by fine-tuning pretrained language models with a few labeled sentences (Han et al., 2018; Gao et al., 2019) . Owing to the high capacity of pre-trained language models, these models achieve impressive results on both sentence-level and document-level relation extraction. However, most existing studies develop relation extraction models for general domains. In general domains, we can easily collect a large amount of data for pre-training language models. Also, annotating relations of entities is relatively easy, which allows us to have many labeled sentences. Nevertheless, for some specific domains (e.g., biomedicine), the available text data is much more limited, and thus it is generally more difficult to train a powerful language model. Moreover, annotating labeled data in specific domains often requires domain experts, which is highly expensive, and thus labeled data is often insufficient in specific domains. Therefore, whether existing relation extraction models are still able to perform well on these specific domains remains unclear and under-explored. In this paper, we fill this gap by applying these relation extraction models to the biomedical domain, as biomedical relation extraction is attracting growing interest, especially during the COVID-19 pandemic, which enables researchers to quickly identify important knowledge in research papers. To give readers a comprehensive sense of biomedical relation extraction, we focus on both sentence-level and document-level relation extraction. For each task, we choose two benchmark datasets, which are DDI (Herrero-Zazo et al., 2013) , Chemprot (Taboureau et al., 2010) for sentence-level RE and CDR (Li et al., 2016) ,GDA (Wu et al., 2019) for ducument-level RE. Also, a few stateof-the-art models are selected for comparison. For sentence-level relation extraction, we choose MTB (Matching the blanks) (Soares et al., 2019), TEMP (Typed Entity Marker) (Zhou and Chen, 2021a) , and the Multitask learning framework (Zhou and Chen, 2021b) . For document-level relation extraction, we choose MTB, ATLOP (Adaptive Thresholding Localized cOntext Pooling) (Zhou et al., 2021) , and DocUNet (Document U-shaped Network) (Zhang et al., 2021) . Our results show: • Current methods which have been proved efficient on TACRED/DOCRED will need a large amount of training data to fine-tune. • The framework of current sentence-level models only have limited contributions compared with transformer-based encoders. • Existing methods for document-level relation extraction are quite strong in terms of generalization ability. We believe our results could inspire future studies on biomedical relation extraction. Relation extraction is an important problem, which aims at predicting the relation of two entities. Early works of relation extraction focus on sentence-level extraction (Mintz et al., 2009) , and the goal is to predict the relation of two entities in a sentence. However, in many cases, the relation of two entities is described across multiple sentences, and sentence-level relation extraction methods cannot deal with these cases. Inspired by the observation, there are also some recent studies focusing on document-level relation extraction (Yao et al., 2019) , aiming to predict entity relations in longer text, such as paragraphs and documents. In terms of methodology, classical relation extraction methods apply classification algorithms (e.g., SVM) to bag-of-words features for relation extraction (Zelenko et al., 2003) , but they often suffer from poor performance as they cannot deal with word orders. Later on, many studies apply deep neural networks for relation extraction, including convolutional neural networks (Kim, 2014) , recurrent neural networks (Gupta et al., 2016) , and long-short term memory neural networks (Zhang et al., 2017) . As these methods are able to take word orders into consideration, they get significant improvements over traditional relation extraction methods. However, these methods often require a large amount of labeled data to train effective models. On many datasets where the labeled data is limited, their performance is far from satisfactory. The recent success of pre-trained language models (Devlin et al., 2018; ) opens a door for solving the problem. By pre-training neural language models (e.g., Transformers (Vaswani et al., 2017) ) on a huge amount of unlabeled data, these models can effectively capture the semantics of texts. With such capabilities, only a few labeled sentences are sufficient to fine-tune these models for relation extraction (Han et al., 2018; Gao et al., 2019) . Hence, these methods achieve impressive results on many relation extraction datasets. Despite the success, whether these methods can also achieve good results on specific domains is still unclear. In this paper, we address the issue by conducting systematical experiments on relation extraction in the biomedical domain. Based on the results, we further highlight a few insights and guidance for biomedical relation extraction, which would benefit futures studies in this field. In this paper, we consider two settings of relation extraction. • Sentence-level Relation Extraction. The first setting focuses on sentence-level extraction. Specifically, given two entities in a sentence, we aim to predict their relation, which is either a relation in the given vocabulary, or a special relation called "no relation". This setting mainly targets at simple relations between entities, which can be described within single sentences. • Document-level Relation Extraction. The second setting studies document-level extraction. Specifically, we consider entities which are mentioned in longer texts, such as paragraphs or documents. The goal is to still to predict the relation of each pair of entities. The document-level setting mainly focuses on more complicated relations, which cannot be mentioned in each individual sentence. In this study, we choose a few state-of-the-art methods on benchmark datasets (e.g., TacRED (Zhang et al., 2017) and DocRED (Yao et al., 2019) ) for comparison. For sentence-level relation-extraction, there are few work tested their performance on biomedical data. We evaluate their performance on their original settings with BioBERT (Lee et al., 2020) as their encoder. The models are listed below: • MTB (Matching the Blanks) (Soares et al., 2019) takes pairs of blank-containing relation statements as input for prediction, and uses an objective that encourages relation representations to be similar if they range over the same pairs of entities. • TEMP (Typed Entity Marker (punct)) (Zhou and Chen, 2021a) is a variant of the typed entity marker technique that marks the entity span and entity types without introducing new special tokens. This leads to promising improvement over existing RE models on TACRED. • NLLIE (Zhou and Chen, 2021b) consists of several neural models with identical structures but different parameter initialization. These models are jointly optimized with the task-specific losses and are regularized to generate similar predictions based on an agreement loss, which prevents overfitting on noisy labels. For document-level relation-extraction, we have selected two transformer-based model. Existing work has explored the performance of these methods with SciBERT (Beltagy et al., 2019) . To better evaluate their performance on biomedical data, we further evaluate ATLOP, DocUNet with BioBERT as their Encoder. The model are listed below: • ATLOP (Zhou et al., 2021) mainly consists of two techniques, adaptive thresholding and localized context pooling. The adaptive thresholding replaces the global threshold for multi-label classification in the prior work with a learnable entities-dependent threshold. The localized context pooling directly transfers attention from pretrained language models to locate relevant context that is useful to decide the relation. • DocUNet (Zhang et al., 2021) approaches the problem by predicting an entity-level relation matrix to capture local and global information, parallel to the semantic segmentation task in computer vision. Specifically, it leverage an encoder module to capture the context information of entities and a segmentation module over the image-style feature map to capture global interdependency among triples. We evaluated multiple models on document-level and sentence-level RE datasets. The data statistics are listed in Table 1 (sentence-level) and Table 2 (document-level). • ChemProt (Taboureau et al., 2010) consists of 1,820 PubMed abstracts with chemical-protein relations. We evaluate on six classes: CPR:3, CPR:4, CPR:5, CPR:6, and CPR:9 and norelation. • DDI (Herrero-Zazo et al., 2013) is a collection of 792 texts selected from the DrugBank database and other 233 Medline abstracts. We evaluate five types: DDI-effect, DDI-int, DDI-mechanism, DDI-advise as well as no-relation. • CDR (Li et al., 2016) is a relation extraction dataset in the biomedical domain with 500 training samples. It is aimed to extract the relations between chemical and disease. • GDA (Wu et al., 2019) consists of 23,353 training samples. It is aimed to predict the relations between genes and disease. Our model was implemented based on Pytorch. We used cased Bio-BERT-base, and RoBERTa-large as the encoder on DDI and ChemProt and cased BioBERT-base and SciBERT-base on CDR and GDA. We tuned the hyperparameters on the development set. and evaluated our model with micro F1. The results of sentence-level and document level relation extraction are presented in Table 3 and Table 4 respectively. For sentence-level RE, the results of SciBERT and BioBERT are from a former work (Zhang et al., 2021) . For document-level RE, the results of ATLOP-SciBERT and DocUNet-SciBERT are taken from Zhang et al. (2021) . From the results, we see that the imrpovement of state-of-the-art models on Chemprot is very limited, whereas the improvement on DDI is relatively significant. The underlying reason might be that Chemprot is extracted from PubMed where BioBERT has already been trained, which implies that current methods only have limited contribution compared to the encoder. However, on DDI, as the encoder still has a gap to the unseen corpus, current methods work pretty well. In conclusion, the generalization ability of those sentence-level methods is not satisfactory. Document-level RE is a more challenging task. However, the improvement on CDR and GDA is very obvious and state-of-the-art models turn out to have a relatively powerful ability of denoising. Compared with CDR, methods achieves a higher improvement on GDA. This is as expected, because GDA has a great amount of training data than CDR. Both DocUNet and ATLOP are originally proposed and designed for the DocRED Relation Extraction task which has 101,873 documents. This paper conducts an empirical analysis of relation extraction in biomedicine. We consider both sentence-level and document-level relation extraction. For each task, a few state-of-the-art models are considered. Our results show that (1) existing methods for document-level relation extraction are quite strong in terms of generalization ability; (2) existing methods still require a large amount of labeled data for training effective models. Therefore, we believe that few-shot learning techniques, which are able to reduce the reliance of models on labeled data, are important to explore to improve biomedical relation extraction. Scibert: A pretrained language model for scientific text Information extraction Bert: Pre-training of deep bidirectional transformers for language understanding Fewrel 2.0: Towards more challenging few-shot relation classification Table filling multi-task recurrent neural network for joint entity and relation extraction Fewrel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation The ddi corpus: An annotated corpus with pharmacological substances and drug-drug interactions Convolutional neural networks for sentence classification Biobert: a pre-trained biomedical language representation model for biomedical text mining Biocreative v cdr task corpus: a resource for chemical disease relation extraction Kagnet: Knowledge-aware graph networks for commonsense reasoning Roberta: A robustly optimized bert pretraining approach Distant supervision for relation extraction without labeled data Matching the blanks: Distributional similarity for relation learning Chemprot: a disease chemical biology database Attention is all you need Renet: A deep learning approach for extracting gene-disease associations from literature Docred: A large-scale document-level relation extraction dataset Kernel methods for relation extraction Document-level relation extraction as semantic segmentation Positionaware attention and supervised data improve slot filling An improved baseline for sentence-level relation extraction Learning from noisy labels for entity-centric information extraction Document-level relation extraction with adaptive thresholding and localized context pooling