key: cord-0988506-wrd1p68p
authors: Yang, Yunrong; Cao, Zhidong; Zhao, Pengfei; Zeng, Dajun Daniel; Zhang, Qingpeng; Luo, Yin
title: Constructing Public Health Evidence Knowledge Graph for Decision-Making Support from COVID-19 Literature of Modelling Study
date: 2021-08-13
journal: Journal of Safety Science and Resilience
DOI: 10.1016/j.jnlssr.2021.08.002
sha: c3a5c40867845be9846c818269dbdf272863e282
doc_id: 988506
cord_uid: wrd1p68p

The needs of mitigating COVID-19 epidemic prompt policymakers to make public health-related decision under the guidelines of science. Tremendous unstructured COVID-19 publications make it challenging for policymakers to obtain relevant evidence. Knowledge graphs (KGs) can formalize unstructured knowledge into structured form and have been used in supporting decision-making recently. Here, we introduce a novel framework that can extract the COVID-19 public health evidence knowledge graph (CPHE-KG) from papers relating to a modelling study. We screen out a corpus of 3096 COVID-19 modelling study papers by performing a literature assessment process. We define a novel annotation schema to construct the COVID-19 modelling study-related IE dataset (CPHIE). We also propose a novel multi-tasks document-level information extraction model SS-DYGIE++ based on the dataset. Leveraging the model on the new corpus, we construct CPHE-KG containing 60,967 entities and 51,140 relations. Finally, we seek to apply our KG to support evidence querying and evidence mapping visualization. Our SS-DYGIE++(SpanBERT) model has achieved a F1 score of 0.77 and 0.55 respectively in document-level entity recognition and coreference resolution tasks. It has also shown high performance in the relation identification task. With evidence querying, our KG can present the dynamic transmissions of COVID-19 pandemic in different countries and regions. The evidence mapping of our KG can show the impacts of variable non-pharmacological interventions to COVID-19 pandemic. Analysis demonstrates the quality of our KG and shows that it has the potential to support COVID-19 policy making in public health.

The needs of mitigating COVID-19 epidemic prompt policymakers to make public health-related decision under the guidelines of science. Tremendous unstructured COVID-19 publications make it challenging for policymakers to obtain relevant evidence. Knowledge graphs (KGs) can formalize unstructured knowledge into structured form and have been used in supporting decision-making recently. Here, we introduce a novel framework that can extract the COVID-19 public health evidence knowledge graph (CPHE-KG) from papers relating to a modelling study. We screen out a corpus of 3096 COVID-19 modelling study papers by performing a literature assessment process. We define a novel annotation schema to construct the COVID-19 modelling study-related IE dataset (CPHIE). We also propose a novel multi-tasks document-level information extraction model SS-DYGIE++ based on the dataset. Leveraging the model on the new corpus, we construct CPHE-KG containing 60,967 entities and 51,140 relations. Finally, we seek to apply our KG to support evidence querying and evidence mapping visualization. Our SS-DYGIE++(SpanBERT) model has achieved a F1 score of 0.77 and 0.55 respectively in document-level entity recognition and coreference resolution tasks. It has also shown high performance in the relation identification task. With evidence querying, our KG can present the dynamic transmissions of COVID-19 pandemic in different countries and regions.

The evidence mapping of our KG can show the impacts of variable non-pharmacological interventions to COVID-19 pandemic. Analysis demonstrates the quality of our KG and shows that it has the potential to support COVID-19 policy making in public health.

Knowledge graphs 

The emergence of the COVID-19 pandemic is the worst crisis at this century and is posing an unprecedented threat to global public health and the economy. Due to the lack of adequate vaccinations, implementing policy interventions is of significance with the guidance of scientific evidence to mitigate the pandemic [1] . Scientists in different disciplines have devoted themselves to investigating and understanding the disease, leading to the rapid increasing of COVID-19-related (COR) literature. It is in tremendous need to mine and integrate evidence contained in these papers to inform high-quality public health decision-making.

Knowledge graphs (KGs) are featured with representing structured knowledge and have recently been used for supporting decision-making, mostly for clinical decision-making support [2] [3] [4] . Constructing disease or health knowledge graph from texts has been an active field where previous studies rely highly on the annotations of domain experts. For example, Sheng et al. [5] proposes DEKGB, a framework to construct health knowledge graph with doctor-involved. Huang et al. [6] builds a disease-centric knowledge graph for depression from 10,190 trials, where psychiatric experts are employed to provide clinical use-cases. Finlayson et al. [7] extracts medical terms based on dictionary complied by clinical experts to build medicine graph from millions of clinical narratives. Moreover, some health graphs pay more attention to personal health rather than public health. Shen et al. [8] utilizes a clinical bayesian network to construct a medical ontology with over 17 aspects including patient's condition, personal history, family medical history and etc. directly from over 10,000 deidentified patient records. Zhao et al. [9] constructs a medical knowledge graph manually from 992 electronic medical records (EMRs) and the graph is regarded as markov random field for inference. Rotmensch et al. [10] extracts medical concepts from 273,174 deidentified patient records with three probabilistic models to automatically construct knowledge graphs.

To defense against the COVID-19 pandemic, researchers are working in developing COVID-19 knowledge graphs (KGs) to accelerate information collection and synthesis from COR scientific papers. Previous studies such as CKG [11] aims to present relationships between COR scientific articles and concepts. And Reese et al. creates KG-COVID-19 which coverages mechanistic information associated with the novel coronavirus, such as drug-target interactions, biological processes and genes [12] . Other KGs explore relationships of COVID-19 pathophysiology [13] as well as application for drug repurposing [14, 15] . More similar work to ours is the KG released by Wang et al. which extracts fine-grained biomedical knowledge from scientific literature [16] . Hope et al. extracts sentence-level mechanism relations from COVID-19 papers with LM-driven model for knowledge base construction [17] .

However, these KGs mostly focus on integrating the knowledge of coronavirus, but few present evidence such as transmission of epidemic, that can provide critical information for determining the type and intensity of disease interventions. Compared with the studies discussed above, our work aims to construct a KG that comprises evidence for public health decision-making. For COVID-19, it is unethical to conduct empirical trials. Hence, observational trials play an important role in providing evidence of epidemic trends. In this paper, we extract sufficiently plentiful evidence and their interrelationships from COR modelling study papers to construct the COVID-19 public health evidence knowledge graph (CPHE-KG) applied to decision-making support.

In general, KGs use a series of subject-predicate-object triples to represent different kinds of entities and relations in a domain. We generate the CPHE-KG based on structured evidence triples selected from COR papers of modelling study retrieved from CORD-19 dataset [18] . Figure 1 illustrates the types of evidence and their relationships of a modelling study in the proposed KG. These evidence types include epidemic metrics (e.g., final infection numbers, or Rt), metric measures (e.g., greater, or 3782558), interventions (e.g., social distancing, or mandatory masking), modelling methods (e.g., SIR mathematical model), time nodes (e.g., the 92th day from the first infectious), regions (e.g., Bangladesh), etc. These relationships include within-sentence relations showing the context information and cross-sentence coreference.

Recently, information extraction (IE) has been used to extract mentions of evidence entities and relations from scientific articles [19] [20] [21] [22] . In the biomedical and health domain, these extraction systems [23] [24] [25] [26] mostly aim to extract evidence from papers of randomized controlled trials (RCTs) and the entity types are designed as populations, interventions/comparators, and outcomes, i.e., PICO frames. However, PICO frames are not adaptive for extracting evidence from modelling study papers whose structures are different from RCTs. To resolve the aforementioned issues, we summarize a schema from COR evidence-based public health papers to annotate a dataset of evidence for COVID-19 modelling study papers to help knowledge discovery for the governance of epidemic. We propose a novel document-level information extraction model SS-DYGIE++ to extract evidence triples and coreference clusters. These triples and clusters form the foundation of KG of a single document. By utilizing knowledge fusion approaches, we merge the extracted KGs of documents into a comprehensive knowledge graph. The final CPHE-KG contains 60,967 entities and 51,140 relations. To further demonstrate the quality of the CPHE-KG, we provide a decision-making support service based on the CPHE-KG from two aspects: evidence querying and evidence mapping visualization. On the one hand, with evidence querying, the system returns the evidence sub-graph relevant to query terms. On the other hand, evidence mapping visualization presents the evaluation results of the effectiveness of interventions in the COVID-19 pandemic.

The contributions of this paper are threefold. 1)We propose a novel annotation schema which includes critical policy-making-related evidence and relations in the abstract of COVID-19 modelling study literature. Moreover, to the best of our knowledge, our CPHIE dataset guided with this schema is the first multi-tasks IE dataset on the basis of papers of modelling study in public health domain.

2)We introduce a novel document-level IE model SS-DYGIE++ to extract evidence from COVID-19 modelling study papers. The results demonstrate our model is feasible to achieve automatic KG construction. We release an COVID-19 public health evidence knowledge graph (CPHE-KG), which is an early exploration of KG applied to public health decision-making.

3) Our KG can provide two main functions: evidence querying and evidence mapping visualization, which has been proved to be effective in supporting decision-making relevant to COVID-19 public health.

The remainder of this paper is organized as follows. In section 2, we describe the required pipelines for constructing a knowledge graph of evidence in COR papers of modelling study. In section 3, we present the extracted results of evidence and the CPHE-KG. In section 4, the case study to test the CPHE-KG is given. In section 5, we conclude our framework and discuss future work. Finally, the annotation guideline of the IE dataset is proposed in Appendix.

In this section, we present a systematic procedure to develop the CPHE-KG. As shown in Figure 2 , the procedure involves four main steps, including: a) data preparation: CORD-19 dataset serves as the data sources of COVID-19 public health IE dataset and KG construction; b) COVID-19 public health IE dataset construction: The colored text spans represent entities; the solid arrows reflect the relations between the entities and the dotted arrows indicate co-reference relationships; c) document-level knowledge extraction: The extracted knowledge and coreference clusters of a single abstract are first integrated into knowledge graph and we can get as many knowledge graphs as the abstracts; d) knowledge fusion: After performing knowledge fusion to previously obtained knowledge graphs in step b), we acquire the final knowledge graph (CPHE-KG).

In this paper, the public health KG is built based on the COR papers of modelling study in CORD-datasets. CORD-19 dataset is released by Kaggle, including more than 500,000 publications on COVID-19 and COR research from PubMed, WHO, bioRxiv and medRxiv. We take the query (COVID-19 OR coronavirus OR "sar cov") AND (model OR forecast OR assess OR predict OR dynamic OR transmission) to retrieval papers of modelling study. As for COVID-19, empirical studies, such as RCTs are unethical and not feasible. Modelling study, especially the study of dynamic transmission models, can evaluate the effects that come from infectious disease interventions [27] . Therefore, we consider modelling study as the available evidence in scientific literature so far.

We remove papers irrelevant with the transmission of COVID-19 and keep papers with enough information we need. As there is no available 'Risk of bias' checklist for modelling study, we follow assessment guidelines from the International Society for Pharmacoeconomics and Outcomes (ISPOR) and the Society for Medical Decision making (SMDM) to grade the transmission models. Our grade criteria are similar to the criteria in the COVID-19 rapid systematic review [28] , which are presented as follows:

 Is the model a dynamic transmission model?

 Does the paper conduct uncertainty analyses?  Does the paper provide measure values of epidemic metrics related to the interventions?

The final corpus contains 3096 papers, which is the foundation of KG. Inclusion procedure is shown in Figure 3 . Firstly, 5960 papers are retrieved from CORD-19 datasets with the retrieval mentioned above. We then exclude papers that are public health irrelevant, repetitive, and too short or too long by hand search. Further, we assess the quality of these papers with the grade criteria. Finally, the remaining corpus are formed with 877 papers from PMC, 1576 papers from WHO, 570 papers from MedRxiv, and 73 papers from BioRxiv.

We construct a dataset (called CPHIE: COVID-19 Public Health Information Extraction) of 597 richly annotated abstracts of COVID-19 modelling study papers taken from the CORD-19 dataset. Our dataset annotates public health-related entities, relations and coreference clusters. The statistics of the CPHIE dataset are shown in Table 1 .

Referring to the systematic reviews and meta-analysis related to evidence-based public health of COVID-19, we summarize the most concerned entity and relation types as the foundation of our dataset. We define 10 entity types (Task, Region, Intervention, Method, Indicator, Measure, Time, Variable, Generic, OtherTerm) and To annotate a high-quality dataset efficiently, we take a semi-automatic labeling scheme. We first utilize the natural language process tool spaCy (https://spacy.io/) library to automatically label the general entity types: Region entities and Time entity.

We also use it to tag the Arabic numbers. The noisy tagging results are checked manually subsequently. Domain-related entities and relations are primarily labeled referring to rules and templates then are manually annotated using the BRAT visual text annotation platform [29] according to the Annotation Guidelines proposed in Appendix A. Since there are few nested entities in the abstracts, nested entities are not annotated in our dataset. A public health domain annotator annotated all abstracts in the dataset, and another public health domain annotator checked and revised the annotations. Figure 4 shows an annotation example of CPHIE dataset.

For COVID-19 public health domain document-level information extraction task, assume there are two corpuses: a) a small richly labeled seed document set which are the ordered sequence of within-sentence tokens. Our task includes three subtasks, entity recognition, relation extraction, and coreference resolution. Entity recognition implicates predicting the best entity type label for each span. Relation extraction implicates predicting the best relation type label between a pair of spans. And coreference resolution implicates clustering spans that refer to the same entity.

We propose a novel multi-tasks IE model SS-DYGIE++, which is improved on the foundation of DYGIE++ model [21] . DYGIE++ model is the state-of-art multi-task model in many document-level information extraction datasets. Since the scale of the annotated corpus is relatively small, in this study, we take full advantage of the unlabeled raw in-domain corpus with the semi-supervised learning method. Our SS-DYGIE++ model include two improvements, language model (LM) post-training and semi-supervised learning. The architecture of our approach is depicted in Figure  5 . 

To adapt pretrained language model to COVID-19 public health domain, we do language model post-training on the abstracts of large-scale public health papers of COVID-19. The pre-trained language models adapted in this paper is SpanBERT [30] . SpanBERT is pretrained with span boundary objective (SBO)which tries to predict the entire masked span using only the representations of the tokens at the span's boundary. It outperforms BERT [31] in coreference resolution task. We choose it as a basic language model, considering the importance of coreference entities for knowledge graph construction. We follow the schema of pretraining SpanBERT to post-train it on 140K+ raw COVID-19 public health papers and then finetune it on document-level information extraction task.

Semi-supervised Learning. We try to further improve the performance of DYGIE++ with bootstrapping, which is a classic semi-supervised learning algorithm [32] and is proven to be effective in information extraction tasks [33, 34] . After training DYGIE++ on CPHIE dataset s D , we save the trained model and apply it to the unlabeled corpus u D . Next u D with noisy predicted labels is combined with s D to update DYGIE++ model iteratively. The overall process is shown in Table 2 . and get an augmented dataset. Different from training, we set thresholds  to filter tags with low probabilities output by the classification layer. We random divide u D into K subsets and iteratively train the DYGIE++ model with * u D , until the model has achieved an acceptable level of accuracy, or until the maximum number of iterations.

In this paper, knowledge fusion includes two subtasks, entity normalization and entity alignment. The expression for the same entity in different papers could be various, hence entity normalization is needed to map the original entity into a standard one. And entity alignment is required to merge entities that have the same meaning.

With respect to generic entities and entities relevant in terms of public health field, we construct a dictionary for them according to Medical Subject Headings (MeSH) terms and match the training set with this dictionary. However, as for Time entity, it is difficult to solve with a dictionary. Since learning-based methods usually need more training instances as a result. Therefore, we opt for a rule-based method the same as Ning et al. [35] that is simple and efficient enough for time normalization.

To merge the set of small size knowledge graphs, we perform entity alignment by utilizing the cluster algorithm. This subtask is performed because there exit some entities that are not in the dictionary. We perform TF-IDF to transform entities into vector representation and further measure the cosine similarity between the entity pairs. An entity pair is considered to refer to the same entity when the similarity is higher than 0.5. We compare our SS-DYGIE++(SpanBERT) model with the model variations below. These variations have the same architecture with SS-DYGIE++(SpanBERT) model, and the difference lines in the pretrained language model.

SciBERT [36] is pretrained on a corpus of which 82% are biomedical literature and 18% are computer science literature. [37] is pretrained with the Masked language modeling (MLM) objective and was adapted to 2.68 million scientific papers from the Semantic Scholar corpus via continued pretraining.

For entity recognition, we experiment two evaluation metrics to adapt to our dataset. To begin, we evaluate entity recognition with a rigid evaluation metric, based on which the prediction is considered correct only when both the span and span's label match with the gold entity. Empirically, analyzing the entity prediction results on testing set, we observe most of the long span predictions that did not match with the gold entities accurately, could however represent entity meaning correctly. For instance, the Indicator entity "basic reproductive number (R0)" could be recognized as "basic reproductive number" or "basic reproductive number (R0". This is unsurprising given the complexity of the spans presented in our dataset. In this scenario, the rigid evaluation metric is excessively stringent. Motivated by this observation, we further consider relaxing the equivalence criteria. In the context of true predicted entity label, if gold entity span contains predicted span or vice versa, the entity prediction is recognized as correct. The rough metric has the significant advantage of explicitly illustrating the improvement of our IE approach.

For relation extraction, a relation is correct if both the span pairs and relation labels match with a gold relation triple. For coreference resolution, coreference cluster is correct if the predicted cluster match with the gold coreference cluster.

We utilize raw texts of COVID-19 public health paper to post-train SpanBERT, SciBERT and Biomed-RoBERTa for 10 epochs respectively under the Transformers framework of Huggingface. The maximum sequence length is set to 128 tokens, which is consistent with most sentence length of raw texts. The batch size is set to 16 when post-training on a single TITAN GPU. For document-level IE, we use DYGIE++ library and search the optimal hyper-parameters with Optuna [38] . Table 3 reports the performance of the DYGIE++ model fintuned with SpanBERT and of the different model variants. Our model outperforms baselines in entity identification and coreference resolution when evaluated with rough metric. 

We apply our model to 3,096 abstracts of COVID-19 public health papers to construct a KG of 60,967 entities and 51,140 relations. The statistics of entities and relations could be found in Table 4 .

To illustrate the anticipated decision-making support of our COVID-19 public health knowledge graph, we focus on the main functions of CPHE-KG: evidence querying and evidence mapping visualization.

With respect to evidence querying, CPHE-KG allows users to query conditions of interest and filter relevant components of knowledge graph. Once a set of search terms is assigned, relevant entities and relations are retrieved from the CPHE-KG. The interface of Neo4j then displays the recovered graph by the filter. Each node and link in the graphs can be clicked to explicitly exhibit the complete content.

For example, the user specifies a Region entity of "Italy" in Figure 6 , the system returns a graph about the COVID-19 epidemic development in Italy. Furthermore, users can browse the COVID-19 epidemic in temporal dimension as illustrated in Figure 7 . Additionally, users can provide interventions of interest to further narrow the search. The association of "lockdown" and "quarantine" with COVID-19 epidemic is shown in Figure 8 and Figure 9 respectively.

Featured with straightforward, evidence mapping [39] is a significant evidence synthesis method for evidence-based public health, which is usually employed to evaluate and integrate the evidence related to intervention measures, etc., namely the evidence mapping of interventions. In this section, we return to the question in Section 1: which is the most effective non-pharmacological intervention against COVID-19 epidemic? We follow a process in line with international practices [39] to construct evidence mapping. To begin, we select and count <Intervention, ASSOCIATED-WITH, Indicator> triples. We further sort the triples by Intervention and filter the top eight non-pharmacological interventions.

The evidence mapping shown in Figure 10 is displayed, providing a comprehensive presentation of the evidence available for the effectiveness of the top eight non-pharmacological interventions with respect to their outcomes for ten indicators measuring COVID-19 epidemic. We observe that, the impact of non-pharmacological interventions is largely reflected in two indicators "transmission rate" and "confirmed cases". We also observe that "social distancing", "lockdown" and "quarantine" are the three most effective non-pharmacological interventions according to evidence mapping. The conclusion of our evidence mapping is similar to the results in systematic review [1] ，which has demonstrated the feasibility of our knowledge graph. In this paper, we develop a procedure that can transform massive modelling study COR papers into an organized, structured, and actionable knowledge graph (CPHE-KG). In particular, we proposed a document-level information extraction model, SS-DYGIE++ to identify entities, relations and coreference clusters related to COVID-19 public health evidence. This module is of importance for knowledge graph construction. To enable the multi-task information extraction, we also created a high quality semi-automatically annotated dataset (CPHIE) and provided a series of results on the SS-DYGIE++ model and its variants.

In our SS-DYGIE++ model, we introduced a semi-supervised learning approach, bootstrapping, that dispenses with the need to have a large scale of labeled texts for information extraction by utilizing the unlabeled domain texts. We also introduced the language model post-training for token representation, which allows language models adaptive to COR public health domains and negates the need to pretrain a language model in public health domain from the scratch. However, our approach has a limitation on the length of processed documents as our language model is based on BERT, which specifies the inputs are no longer than 512 tokens. We also release two applications based on our CPHE-KG. The applications of evidence querying and evidence mapping demonstrated how CPHE-KG contribute to COR public health decision-making.

In the future work, we hope to extract evidence from long documents, such as the full text of scientific literature, rather than focusing on abstracts. One possible solution is replacing our BERT-based language model with language model that can process documents without length limitation. We also aim to expand the proposed KG with more evidence from modeling study papers of various epidemic diseases, such as severe acute respiratory syndrome (SARS), middle east respiratory syndrome (MERS), etc. In addition, we will target at expanding the proposed KG with multimedia evidence, e.g., figures and tables in papers to improve cross-media evidence grounding and inference.

☒√The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

The COVID-19 Public Health Information Extraction dataset (CPHIE) annotation guidelines are projected primarily to provide a simple description of the various entities and their relations in modelling study papers relating to dynamic transmission of COVID-19. Referring to these guidelines, it is easy and efficaciously used by anyone who may want to use this dataset for various natural language processing tasks such as entity recognition or relation extraction. In the following subsections we summarize the guidelines that were used in annotating the 597 abstracts of COVID-19 public health papers as we explore the entities and relations that were chosen to be labelled in this dataset. Method: Methods, models, systems to use, or tools, components of a system, frameworks. Generic: General words that usually used to connect coreference entities.

OtherTerm: Public health related terms that do not belong to entity types listed ahead. 

Associates any type entities to a Method entity that indicates solutions.

This relation marks the Measure entity that the Indicator entity equals.

Links an Intervention or a Variable entity to a Task or a Method entity, marks the existing relationship of the entities.

Associates entities to a Time entity that represents the entity takes place. 

A link that associates a Region entity to an entity. It indicates that the Region entity is the site where the entity is performed.

Links two coordinate entities that usually are linked by words and, or belong to the same entity type.

This relation marks an entity includes another entity.

Compares two entities that usually belong to the same entity type.

Characteristics associated with a Method, Task, Variable, or Indicator entity.

A link that associated an Indicator entity to a Task entity or Method entity. The former indicates that the Indicator entity is associated with COVID-19 epidemic, the latter indicates that the Indicator entity is the evaluation metric of the Method entity.

Coreference: A link that associates two entities refer to the same entity. 

Inferring the effectiveness of government interventions against COVID-19

Real-world data medical knowledge graph: construction and applications

Robustly Extracting Medical Knowledge from EHRs: A Case Study of Learning a Health Knowledge Graph

Medical Knowledge Graph to Enhance Fraud, Waste, and Abuse Detection on Claim Data: Model Development and Performance Evaluation

DEKGB: An Extensible Framework for Health Knowledge Graph

Constructing Disease-Centric Knowledge Graphs: A Case Study for Depression (short Version)

Building the graph of medicine from millions of clinical narratives

CBN: Constructing a clinical Bayesian network based on data from the electronic medical record

EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning

Learning a Health Knowledge Graph from Electronic Medical Records

COVID-19 Knowledge Graph: Accelerating Information Retrieval and Discovery for Scientific Literature

KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response

COVID-19 Knowledge Graph: a computable, multi-modal, cause-and-effect knowledge model of COVID-19 pathophysiology

Drug Repurposing for COVID-19 via Knowledge Graph Completion

Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2

COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation

Extracting a Knowledge Base of Mechanisms from COVID-19

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

CORD-19: The COVID-19 Open Research Dataset

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

A general framework for information extraction using dynamic span graphs

Entity, Relation, and Event Extraction with Contextualized Span Representations

Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing

The Role of "Condition": A Novel Scientific Knowledge Graph Representation and Construction Model

EBM+: Advancing Evidence-Based Medicine via two level automatic identification of Populations, Interventions, Outcomes in medical literature

Pretraining to recognize PICO elements from randomized controlled trial literature

A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature

Effective Crowd-Annotation of Participants, Interventions, and Outcomes in the Text of Clinical Trial Reports

Dynamic Transmission Modeling: A Report of the ISPOR-SMDM

Modeling Good Research Practices Task Force-5

Quarantine alone or in combination with other public health measures to control COVID-19: a rapid review

BRAT: a web-based tool for NLP-assisted text annotation

Spanbert: Improving pre-training by representing and predicting spans

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Bootstrapping

Snowball: Extracting relations from large plain-text collections

Extracting patterns and relations from the world wide web

CogCompTime: A tool for understanding time in natural language

SciBERT: A Pretrained Language Model for Scientific Text

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Optuna: A Next-generation Hyperparameter Optimization Framework

The global evidence mapping initiative: scoping research in broad topic areas

This work was supported in part by the National Natural Science Foundation of China (Grants No. 72025404 and No.71621002), Beijing Natural Science Foundation (L192012) and Beijing Nova Program (Z201100006820085).