key: cord-020843-cq4lbd0l authors: Almeida, Tiago; Matos, Sérgio title: Calling Attention to Passages for Biomedical Question Answering date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_9 sha: doc_id: 20843 cord_uid: cq4lbd0l Question answering can be described as retrieving relevant information for questions expressed in natural language, possibly also generating a natural language answer. This paper presents a pipeline for document and passage retrieval for biomedical question answering built around a new variant of the DeepRank network model in which the recursive layer is replaced by a self-attention layer combined with a weighting mechanism. This adaptation halves the total number of parameters and makes the network more suited for identifying the relevant passages in each document. The overall retrieval system was evaluated on the BioASQ tasks 6 and 7, achieving similar retrieval performance when compared to more complex network architectures. Question Answering (QA) is a subfield of Information Retrieval (IR) that specializes in producing or retrieving a single answer for a natural language question. QA has received growing interest since users often look for a precise answer to a question instead of having to inspect full documents [4] . Similarly, biomedical question answering has also gained importance given the amount of information scattered over large specialized repositories such as MEDLINE. Research on biomedical QA has been pushed forward by community efforts such as the BioASQ challenge [13] , originating a range of different approaches and systems. Recent studies on the application of deep learning methods to IR have shown very good results. These neural models are commonly subdivided into two categories based on their architecture. Representation-based models, such as the Deep Structured Semantic Model (DSSM) [5] or the Convolutional Latent Semantic Model (CLSM) [12] , learn semantic representations of texts and score each query-document pair based on the similarity of their representations. On the other hand, models such as the Deep Relevance Matching Model (DRMM) [3] or DeepRank [10] follow a interaction-based approach, in which matching signals between query and document are captured and used by the neural network to produces a ranking score. The impact of neural IR approaches is also noticeable in biomedical question answering, as shown by the results on the most recent BioASQ challenges [9] . The top performing team in the document and snippet retrieval sub-tasks in 2017 [1] , for example, used a variation of the DRMM [8] to rank the documents recovered by the traditional BM25 [11] . For the 2018 task, the same team extended their system with the inclusion of models based on BERT [2] and with joint training for document and snippet retrieval. The main contribution of this work is a new variant of the DeepRank neural network architecture in which the recursive layer originally included in the final aggregation step is replaced by a self-attention layer followed by a weighting mechanism similar to the term gating layer of the DRMM. This adaptation not only halves the total number of network parameters, therefore speeding up training, but it is also more suited for identifying the relevant snippets in each document. The proposed model was evaluated on the BioASQ dataset, as part of a document and passage (snippet) retrieval pipeline for biomedical question answering, achieving similar retrieval performance when compared to more complex network architectures. The full network configuration is publicly available at https://github.com/bioinformatics-ua/BioASQ, together with code for replicating the results presented in this paper. This section presents the overall retrieval pipeline and describes the neural network architecture proposed in this work for the document ranking step. The retrieval system follows the pipeline presented in Fig. 1 , encompassing three major modules, Fast Retrieval, Neural Ranking and Snippet extraction. The fast retrieval step is focused on minimizing the number of documents passed on to the computationally more demanding neural ranking module, while maintaining the highest possible recall. As in previous studies [1, 7] , we adopted Elasticsearch (ES) with the BM25 ranking function as the retrieval mechanism. The documents returned by the first module are ranked by the neural network which also directly provides to the following module the information for extracting relevant snippets. These modules are detailed in Sects. 2.1 and 2.2. The network follows a similar architecture to the original version of DeepRank [10] , as illustrated in Fig. 2 . Particularly, we build upon the best reported configuration, which uses a CNN in the measurement network and the reciprocal function as the position indicator. The inputs to the network are the query, a set of document passages aggregated by each query term, and the absolute position of each passage. For the remaining explanation, let us first define a query as a sequence of terms q = {u 0 , u 1 , ..., u Q }, where u i is the i-th term of the query; a set of document passages aggregated by each query term as D(u i ) = {p 0 , p 1 , ..., p P }, where p j corresponds to the j-th passage with respect to the query term u i ; and a document passage as where v k is the k-th term of the passage. We chose to aggregate the passages by their respective query term at the input level, since it simplifies the neural network flow and implementation. The detection network receives as input the query and the set of document passages and creates a similarity tensor (interaction matrix) S ∈ [−1, 1] Q×S for each passage, where each entry S ij corresponds to the cosine similarity between the embeddings of the i-th query term and j-th passage term, The measurement network step is the same used in the original DeepRank model. It takes as inputs the previously computed tensors S and the absolute position of each passage and applies a 2D convolution followed by a global max polling operation, to capture the local relevance present in each tensor S, as defined in Eq. 1: At this point, the set of document passages for each query term is represented by their respective vectors h, i.e, D( encodes the local relevance captured by the M convolution kernels of size x × y, plus an additional feature corresponding to the position of the passage. . The next step uses a self-attention layer [6] to obtain an aggregation c ui M ×1 over the passages h pj for each query term u i , as defined in Eq. 2. The weights a pj , which are computed by a feed forward network and converted to a probabilistic distribution using the softmax operation, represent the importance of each passage vector from the set D(u i ). The addition of this self-attention layer, instead of the recurrent layer present in the original architecture, allows using the attention weights, that are directly correlated with the local relevance of each passage, to identify important passages within documents. Moreover, this layer has around A × M parameters, compared to up to three times more in the GRU layer (approximately 3 × A × (A + M )), which in practice means reducing the overall number of network parameters to half. Finally, the aggregation network combines the vectors c ui M ×1 according to weights that reflect the importance of each individual query term u i . We chose to employ a similar weighting mechanism to the term gating layer in DRMM [3] , which uses the query term embedding to compute its importance, as defined in Eq. 3. This option replaces the use of a trainable parameter for each vocabulary term, as in the original work, which is less suited for modelling a rich vocabulary as in the case of biomedical documents. The final aggregated vector c is then fed to a dense layer for computing the final ranking score. Optimization. We used the pairwise hinge loss as the objective function to be minimized by the AdaDelta optimizer. In this perspective, the training data is viewed as a set of triples, (q, d + , d − ), composed of a query q, a positive document d + and a negative document d − . Additionally, inspired by [14] and as successfully demonstrated by [16] , we adopted a similar negative sampling strategy, where a negative document can be drawn from the following sets: -Partially irrelevant set: Irrelevant documents that share some matching signals with the query. More precisely, this corresponds to documents retrieved by the fast retrieval module but which do not appear in the training data as positive examples; -Completely irrelevant set: Documents not in the positive training instances and not sharing any matching signal with the query. Passage extraction is accomplished by looking at the attention weights of the neural ranking model. As described, the proposed neural ranking model includes two attention mechanisms. The first one computes a local passage attention with respect to each query term, a pi . The second is used to compute the importance of each query term, a u k . Therefore, a global attention weight for each passage can be obtained from the product of these two terms, a g (k,i) = a u k × a pi , as shown in Eq. 4: This section presents the system evaluation results. We used the training data from the BioASQ 6b and 7b phase A challenges [13] , containing 2251 and 2747 biomedical questions with the corresponding relevant documents, taken from the MEDLINE repository. The objective for a system is to retrieve the ten most relevant documents for each query, with the performance evaluated in terms of Map@10 on five test sets containing 100 queries each. At first, a study was conducted to investigate the performance of the proposed neural ranking model. After that, the full system was compared against the results of systems submitted to the BioASQ 6 and 7 editions for the document retrieval task. Finally, we investigate if the attention given to each passage is indeed relevant. In the results, we compare two variants of DeepRank: BioDeepRank refers to the model with the modified aggregation network and weighting mechanism, and using word embeddings for the biomedical domain [15] ; Attn-BioDeepRank refers to the final model that additionally replaces the recurrent layer by a self-attention layer. 2 Neural Ranking Models. We compared both neural ranking versions against BM25 in terms of MAP@10 and Recall@10, on a 5-fold cross validation over the BioASQ training data. Table 1 summarizes the results. Both models successfully improved the BM25 ranking order, achieving an increase of around 0.14 in MAP and 0.31 in recall. Results of Attn-BioDeepRank, although lower, suggest that this version is at least nearly as effective at ranking the documents as the model that uses the recursive layer. Biomedical Document Retrieval. We report results on the BioASQ 6b and BioASQ 7b document ranking tasks (Table 2) . Regarding BioASQ 6b, it should be noted that the retrieved documents were evaluated against the final goldstandard of the task, revised after reevaluating the documents submitted by the participating systems. Since we expect that some of the retrieved documents would have been revised as true positives, the results presented can be considered a lower bound of the system's performance. For BioASQ 7b, the results shown are against the gold-standard before the reevaluation, since the final annotations were not available at the time of writing. In this dataset both systems achieved performance nearer to the best result, including a top result on Batch 1. Passage Evaluation. Finally, we analysed whether the information used by the model for ranking the documents, as given by the attention weights, corresponded to relevant passages in the gold-standard. For this, we calculated the precision of the passages, considering overlap with the gold-standard, and evaluated how it related to the confidence assigned by the model. Interestingly, although the model is not trained with this information, the attention weights seem to focus on these relevant passages, as indicated by the results in Fig. 3. Fig. 3 . Quality of retrieved passages as a function of the confidence attributed by the model. This paper describes a new neural ranking model based on the DeepRank architecture. Evaluated on a biomedical question answering task, the proposed model achieved similar performance to a range of others strong systems. We intend to further explore the proposed approach by considering semantic matching signals in the fast retrieval module, and by introducing joint learning for document and passage retrieval. The network implementation and code for reproducing these results are available at https://github.com/bioinformatics-ua/BioASQ. AUEB at BioASQ 6: Document and Snippet Retrieval BERT: pre-training of deep bidirectional transformers for language understanding A deep relevance matching model for ad-hoc retrieval Natural language question answering: the view from here Learning deep structured semantic models for web search using clickthrough data A structured self-attentive sentence embedding Mindlab neural network approach at bioasq 6b Deep Relevance Ranking Using Enhanced Document-Query Interactions Results of the sixth edition of the BioASQ challenge DeepRank The probabilistic relevance framework: Bm25 and beyond A latent semantic model with convolutional-pooling structure for information retrieval An overview of the BioASQ large-scale biomedical semantic indexing and question answering competition Learning fine-grained image similarity with deep ranking BioWordVec, improving biomedical word embeddings with subword information and MeSH A hierarchical attention retrieval model for healthcare question answering Acknowledgments. This work was partially supported by the European Regional Development Fund (ERDF) through the COMPETE 2020 operational programme, and by National Funds through FCT -Foundation for Science and Technology, projects PTDC/EEI-ESS/6815/2014 and UID/CEC/00127/2019.