key: cord-0656952-tkpbfroi authors: Wang, Qingyun; Li, Manling; Wang, Xuan; Parulian, Nikolaus; Han, Guangxing; Ma, Jiawei; Tu, Jingxuan; Lin, Ying; Zhang, Haoran; Liu, Weili; Chauhan, Aabhas; Guan, Yingjun; Li, Bangzheng; Li, Ruisong; Song, Xiangchen; Fung, Yi R.; Ji, Heng; Han, Jiawei; Chang, Shih-Fu; Pustejovsky, James; Rah, Jasmine; Liem, David; Elsayed, Ahmed; Palmer, Martha; Voss, Clare; Schneider, Cynthia; Onyshkevych, Boyan title: COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation date: 2020-07-01 journal: nan DOI: nan sha: 328d027a999efc9d9e315367f5e01096ef4e7255 doc_id: 656952 cord_uid: tkpbfroi To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities and their visual chemical structures, relations, and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence. Practical progress at combating COVID-19 highly depends on effective search, discovery, assessment and extension of scientific research results. However, clinicians and scientists are facing two unique barriers on digesting these research papers. The first challenge is quantity. Such a bottleneck in knowledge access is exacerbated during a pandemic when increased investment in relevant research leads to even faster growth of literature than usual. For example, as of April 28, 2020, at PubMed 3 there were 19,443 papers related to coronavirus; as of June 13, 2020, there were 140K+ related papers, nearly 2.7K new papers per day (see Figure 1) Figure 1 : The Growing Number of COVID-19 Papers at PubMed of vaccines and drugs for COVID-19. More intelligent knowledge discovery technologies need to be developed to enable researchers to more quickly and accurately access and digest relevant knowledge from the literature. The second challenge is quality. Many research results about coronavirus from different research labs and sources are redundant, complementary, or even conflicting with each other, while some false information has been promoted in both formal publication venues as well as social media platforms such as Twitter. As a result, some of public policy responses to the virus, and public perception of it, have been based on misleading, and at times erroneous, claims. The relative isolation of these knowledge resources makes it hard, if not impossible, for researchers to connect the dots that exist in separate resources to gain new insights. Let us consider drug repurposing as a case study. 4 Besides the long process of clinical trial and biomedical experiments, another major cause of the lengthy discovery phase is the complexity of the problem involved and the difficulty in drug discovery in general. The current clinical trials for drug repurposing rely mainly on reported symptoms in considering drugs that can treat diseases with similar symptoms. However, there are too many drug candidates and too much misinformation published in multiple sources. The clinicians and scientists thus urgently need assistance in obtaining a reliable ranked list of drugs with detailed evidence, and also in gaining new insights into the underlying molecular cellular mechanisms on COVID-19 and the pre-existing conditions that may affect the mortality and severity of this disease. To tackle these two challenges we propose a new framework, COVID-KG, to accelerate scientific discovery and build a bridge between the research scientists making use of our framework and clinicians who will ultimately conduct the tests, as illustrated in Figure 2 . COVID-KG starts by reading existing papers to build multimedia knowledge graphs (KGs), in which nodes are entities/concepts and edges represent relations and events involving these entities, as extracted from both text and images. Given the KGs enriched with path ranking and evidence mining, COVID-KG answers natural language questions effectively. With drug repurposing as a case study, we focus on 11 typical questions that human experts pose and integrate our techniques to generate a comprehensive report for each candidate drug. Figure 3: Constructed KG Connecting Losartan (candidate drug in COVID-19) and cathepsin L pseudogene 2 (gene related to coronavirus), where red nodes represent chemicals, grey nodes represent genes, and edges represent gene-chemical relations. Our coarse-grained Information Extraction (IE) system consists of three components: (1) coarsegrained entity extraction (Wang et al., 2019a) and entity linking (Zheng et al., 2015) : we extract 13 Event types and the roles of entities involved in these events as defined in (Nédellec et al., 2013) , including Gene expression, Transcription, Localization, Protein catabolism, Binding, Protein modification, Phosphorylation, Ubiquitination, Acetylation, Deacetylation, Regulation, Positive regulation, and Negative regulation. Figure 3 shows an example of the constructed KG from multiple papers. Experiments on 186 documents with 12,916 sentences manually annotated by domain experts show that our method achieves 83.6% F-score on node extraction and 78.1% F-score on link extraction. Angiotensin-converting enzyme 2 GENE_OR_GENOME ( ACE2 GENE_OR_GENOME ) as a SARS-CoV-2 CORONAVIRUS receptor: molecular mechanisms and potential therapeutic target. SARS-CoV-2 CORONAVIRUS has been sequenced [3] . A phylogenetic EVOLUTION analysis [3, 4] found a bat WILDLIFE origin for the SARS-CoV-2 CORONAVIRUS. There is a diversity of possible intermediate hosts for SARS-CoV-2 CORONAVIRUS, including pangolins WILDLIFE, but not mice EUKARYOTE and rats EUKARYOTE [5] . There are many similarities of SARS-CoV-2 CORONAVIRUS with the original SARS-CoV CORONAVIRUS. Using computer modeling, Xu et al. [6] found that the spike proteins GENE_OR_GENOME of SARS-CoV-2 CORONAVIRUS and SARS-CoV CORONAVIRUS have almost identical 3-D structures in the receptor binding domain that maintains Van der Waals forces PHYSICAL_SCIENCE. SARS-CoV spike proteins GENE_OR_GENOME has a strong binding affinity to human ACE2 GENE_OR_GENOME, based on biochemical interaction studies and crystal structure analysis [7] . SARS-CoV-2 CORONAVIRUS and SARS-CoV spike proteins GENE_OR_GENOME share identity in amino acid sequences and …… However, questions from experts often involve fine-grained knowledge elements, such as "Which amino acids in glycoprotein are most related to Glycan (CHEMICAL)?". To answer these questions, we apply our fine-grained entity extraction system CORD-NER (Wang et al., 2020c) to extract 75 types of entities to enrich the KG, including many COVID-19 specific new entity types (e.g., coronaviruses, viral proteins, evolution, materials, substrates and immune responses). CORD-NER relies on distantly-and weakly-supervised methods (Wang et al., 2019b; , with no need for expensive human annotation. Its entity annotation quality surpasses SciSpacy (up to 93.95% F-score, over 10% higher on the F1 score based on a sample set of documents), a fully supervised BioNER tool. See Figure 4 for results on part of a COVID-19 paper (Zhang et al., 2020) . Figures in biomedical papers may contain different types of visual information, for example, displaying molecular structures, microscopic images, dosage response curves, relational diagrams, and other uniquely visual content. We have developed a visual IE subsystem to extract the visual information from figures to enrich the KG. We start by designing a pipeline and automatic tools shown in Figure 5 to extract figures from papers in the CORD-19 dataset and segment figures into nearly half a million isolated subfigures. In the end, we perform cross-modal entity grounding, i.e., associating visual objects identified in these subfigures with entities mentioned in their captions or referring text. To start, since most figures are embedded as part of PDF files, we run Deepfigures (Siegel et al., 2018) to automatically detect and extract figures from each PDF document. Then each figure is associated with text in its caption or referring context (main body text referring to the figure). In this way, a figure can be attached, at a coarse level, to a KG entity if that entity is mentioned in the associated text. To further delineate semantic and visual information contained within each subfigure, we have developed a pipeline to segment individual subfigures and then align each subfigure with its corresponding subcaption. We run Figure- separator (Tsutsui and Crandall, 2017) to detect and separate all nonoverlapping image regions. On occasion, subfigures within a figure may also be marked with alphabetical letters (e.g., A, B, C, etc). We use deep neural networks (Zhou et al., 2017) to detect text within figures and then apply OCR tools (Smith, 2007) to automatically recognize text content within each figure. To identify subfigure marker text and text (Ekins and Coffee, 2015) labels for analyzing figure content, we rely on the distance between text labels and subfigures to locate subfigure text markers. Location information of such text markers can also be used to merge multiple image regions into a single subfigure. At the end, each subfigure is segmented, and associated with its corresponding subcaption and referring context. The segmented subfigures and associated text labels provide rich information that can expand the KG constructed from text captions. For example, as shown in Figure 6 , we apply a classifier to detect subfigures containing molecular structures. Then by linking the specific drug names extracted from within-figure text to corresponding drug entities in the coarse KG constructed from the caption text, an expanded cross-modal KG can be constructed that then links images with specific molecular structures to their drug entities in the KG. In order to enhance the exploration and discovery of the information mined from the COVID-19 literature through the algorithms discussed in previous sections, we create semantic visualizations over large complex networks of biomedical relations using the techniques proposed by Tu et al. (2020) . Semantic visualization allows for visualization of user-defined subsets of these relations interactively through semantically typed tag clouds and heat maps. This allows researchers to get a global view of selected relation subtypes drawn from hundreds or thousands of papers at a single glance. This in turn allows for the ready identification of novel relations that would typically be missed by directed keyword searches or simple unigram word cloud or heatmap displays. 5 We first build a data index from the knowledge elements in the constructed KGs, and then create a Kibana dashboard 6 out of the generated data indices. Each Kibana dashboard has a collection of visualizations that are designed to interact with each other. Dashboards are implemented as web applications. The navigation of a dashboard is mainly through clicking and searching. By clicking the protein keyword EIF2AK2 in the tag cloud named "Enzyme proteins participating Modification relations", a constraint on the type of proteins in modifications is added. Correspondingly, all the other visualizations will be changed. One unique feature of the semantic visualization is the creation of dense tag clouds and dense heatmaps, through a process of parameter reduction over relations, allowing for the visualization of relation sets as tag clouds and multiple chained relations as heatmaps. Figure 7 illustrates such a dense heatmap that contains relations between proteins and implicated diseases (e.g., "those proteins that are down-regulators of TNF which are implicated in obesity"), along with their type information 7 . In contrast to most current question-answering (QA) methods which target single documents, we have developed a QA component based on a combination of KG matching and distributional semantic matching across documents. We build KG indexing and searching functions to facilitate effective and efficient search when users pose their questions. We also support extended semantic matching from the constructed KGs and related texts by accepting multi-hop queries. A common category of queries is about the connections between two entities. Given two entities in a query, we generate a subgraph covering salient paths between them to show how they are connected through other entities. Figure 3 is an example subgraph summarizing the connections between Losartan and cathepsin L pseudogene 2. The paths are generated by traversing the constructed KG, and are ranked by the number of papers covering the knowledge elements in each path in the KG. Each edge is assigned a salience score by aggregating the scores of paths passing through it. In addition to knowledge elements, we also present related sentences and source information as evidence. We use BioBert (Lee et al., 2020) , a pre-trained language model to represent each sentence along with its left and right neighboring sentences as local contexts. Using the same architecture computed on all respective sentences and the user query, we aggregate the sequence embedding layer, the last hidden layer in the BERT architecture with average pooling (Reimers and Gurevych, 2019) . We use the similarity between the embedding representations of each sentence and each query to identify and extract the most relevant sentences as evidence. Another common category of queries includes entity types, rather than entity instances, and requires extracting evidence sentences based on type or pattern matching. We have developed EVI-DENCEMINER (Wang et al., 2020a,b) , a web-based system that allows for the user's query as a natural language statement or an inquiry about a relationship at the meta-symbol level (e.g., CHEMICAL, PROTEIN) and then automatically retrieves textual evidence from a background corpora of COVID-19. Report Generation A human-written report about drug repurposing usually answers the following typical questions. The answers to questions #5 and #11 are extracted based on the meta-data sections of research papers in scientific literature, including the author affiliation and acknowledgement sections. The answers for other questions are all extracted based on the knowledge graphs constructed and knowledge-driven question-answering method described above. As in our case studies, DARPA biologists inquired about three drugs, Benazepril, Losartan, and Amodiaquine, and their links to COVID-19 related chemicals/genes as shown in Figure 8 : Our KG results for many other drugs are visualized at our website 8 . We download new COVID-19 papers from three Application Programming Interfaces (APIs): NCBI PMC API, NCBI Pubtator API and CORD-19 archive. We provide incremental updates including new papers, removed papers and updated papers, and their metadata information at our website 9 . As of June 14, 2020 we collected 140K papers. We selected 25,534 peer-reviewed papers and constructed the KG that includes 7,230 Diseases, 9,123 Chemicals and 50,864 Genes, with 1,725,518 Chemical-Gene links, 5,556,670 Chemical-Disease links, and 77,844,574 Gene-Disease links. The KG has received more than 1,000+ downloads. Our final generated reports 10 are shared publicly. For each question, our framework provides answers along with detailed evidence, knowledge subgraphs and image segmentation and analysis results. Table 1 shows some example answers. Several clinicians and medical school students in our team have manually reviewed the drug repurposing reports for three drugs, and also the KGs connecting 41 drugs and COVID-19 related chemicals/genes. In checking the evidence sentences and reading the original articles, they reported that most of our output is informative and valid. For instance, after the coronavirus enters the cell in the lungs, it can cause a severe disease called Acute Respiratory Distress Syndrome. This condition causes the release of inflammatory molecules in the body named cytokines such as Interleukin-2, Interleukin-6, Tumor Necrosis Factor, and Interleukin-10. We see all of these connections in our results, such as the examples shown in Figure 3 and Figure 9 . With further checks on these results, the scientists also indicated that many results were worth further investigation. For example, in Figure 3 we can see that Lusartan is connected to tumor protein p53 which is related to lung cancer. Extensive prior research work has focused on extracting biomedical entities (Zheng et al., 2014; 8 Habibi et al., 2017; Crichton et al., 2017; Wang et al., 2018; Beltagy et al., 2019; Alsentzer et al., 2019; Wang et al., 2020c) , relations (Uzuner et al., 2011; Krallinger et al., 2011; Manandhar and Yuret, 2013; Bui et al., 2014; Peng et al., 2016; Wei et al., 2015; Peng et al., 2017; Luo et al., 2017; Peng et al., 2019 Peng et al., , 2020 , and events (Ananiadou et al., 2010; Van Landeghem et al., 2013; Nédellec et al., 2013; Deléger et al., 2016; ShafieiBavani et al., 2020) from biomedical literature, with the most recent work focused on COVID-19 literature (Hope et al., 2020; Ilievski et al., 2020; Wolinski, 2020; Ahamed and Samad, 2020) . Most of the recent biomedical QA work (Yang et al., 2015 (Yang et al., , 2016 Chandu et al., 2017; Kraus et al., 2017) is driven by the BioASQ initiative (Tsatsaronis et al., 2015) , and many live QA systems, including COVIDASK 11 and AUEB 12 , and search engines (Kricka et al., 2020; Esteva et al., 2020; Hope et al., 2020; Taub Tabib et al., 2020) have been developed. Our work is an application and extension of our recently developed multimedia knowledge extraction system for news domain (Li et al., 2020a,b) . Similar to news domain, the knowledge elements extracted from text and images in literature are complementary. Our framework advances state-of-the-art by extending the knowledge elements to more fine-grained types, incorporating image analysis and cross-media knowledge grounding, and KG matching into QA. We have developed a novel framework, COVID-KG, that automatically transforms a massive scientific literature corpus into organized, structured, and actionable KGs, and uses it to answer questions in drug repurposing reporting. With COVID-KG, researchers and clinicians are able to obtain informative answers from scientific literature, and thus focus on more important hypothesis testing, and prioritize the analysis efforts for candidate exploration directions. In our ongoing work we have created a new ontology that includes 77 entity subtypes and 58 event subtypes, and we are building a neural IE system following this new ontology. In the future we plan to extend COVID-KG to automate the creation of new hypotheses by predicting new links. We will also create a multimedia common semantic space (Li et al., 2020a,b) for literature and apply it to improve cross-media knowledge grounding and inference. Required Workflow for Using Our System Human review required. Our knowledge discovery tool provides investigative leads for pre-clinical research, not final results for clinical use. Currently, biomedical researchers scour the literature 11 https://covidask.korea.ac.kr/ 12 http://cslab241.cs.aueb.gr:5000/ to identify candidate drugs, then follow a standard research methodology to investigate their actual utility (involving literature reviews, computer simulations of drug mechanisms and effectiveness, invitro studies, cellular in-vivo studies, etc. before moving to clinical studies.). Our tool COVID-KG (and all knowledge discovery tools for biomedical applications) is not meant to be used for direct clinical applications on any human subjects. Rather, our tool aims to highlight unseen relations and patterns in large amounts of scientific textual data that would be too time consuming for manual human effort. Accordingly, the tool would be useful for stakeholders (e.g., biomedical scientists) to identify specific drug candidates and molecular targets that are relevant in their biomedical and clinical research aims. Use of our knowledge discovery tool allows the researcher to narrow down the set of candidate drugs to investigate rapidly, but then proceed with the usual sequence of steps before kicking off expensive and time-consuming clinical tests. Failure to follow this sequence of events, and use of the system without the required human review, could lead to misguided experimental design wasting time and resources. Check evidence and source before use our system results. In addition, our tool provides source and rich evidence sentences for each node and link in the KG. To curtail potential harms caused by extraction errors, users of the knowledge graphs should double check the source information and verify the accuracy of the discovered leads before launching expensive experimental studies. We spell out here the positive values, as well as the limitations and possible solutions to address these issues for future improvement. Moreover, any planned investigations involving human subjects should first be approved by the stakeholder's IRB (Institutional Review Board) who will oversee the safety of the proposed studies and the role of COVID-KG before any experimental studies are conducted. COVID-KG is a tool to enhance biomedical and clinical research; it is not a tool for direct clinical application with human subjects. System errors. Our system can effectively convert a large amount of scientific papers into knowledge graphs, and can scale as literature volume increases. However, none of our extraction components is perfect, they produce about 6%-22% false alarms and misses as reported in section 2. But as we described in the workflow, all of the connections and answers will be validated by domain experts by checking their corresponding sources before they are included in the drug repurposing report. COVID-KG is developed for pre-clinical research to target down drugs of interest for biomedical scientists. Therefore, no human subjects or specific populations are directly subjected to COVID-KG unless approved by the stakeholder's IRB who oversees the safety and ethical aspects of the clinical studies in accordance with the Belmont report (https://www.hhs.gov/ohrp/regulations-andpolicy/belmont-report/index.html). Accordingly, COVID-KG will not impose direct harm to vulnerable human cohorts or populations, unless misused by the stakeholders without IRB approval. With regards to potential harm in preclinical studies, users of COVID-KG are advised to verify the accuracy of the discovered leads in the source information before conducting expensive experimental studies. Bias in training data. Proper use of the technology requires that input documents are legally and ethically obtained. Regulation and standards (e.g. GDPR 13 ) provide a legal framework for ensuring that such data is properly used and that any individual whose data is used has the right to request its removal. In the absence of such regulation, society relies on those who apply technology to ensure that data is used in an ethical way. The input data to our system is peer-reviewed publicly available scientific articles. An additional potential harm could come from the output of the system being used in ways that magnify the system errors or bias in its training data. The various components in our system rely on weak distant supervision based on large-scale external knowledge bases and ontologies that cover a wide range of topics in the biomedical domain. Nevertheless, our system output is intended for human interpretation. We do not endorse incorporating the system's output into an automatic decision-making system without human validation; this fails to meet our recommendations and could yield harmful results. In the cited technical reports for each component in our framework, we have reported detailed error rates for each type of knowledge element from system evaluations and provide detailed qualitative analysis and explana- 13 The General Data Protection Regulation of the European Union https://gdpr.eu/what-is-gdpr/. Bias in development data. We also note that the performance of our system components as reported is based on the specific benchmark datasets, which could be affected by such data biases. Thus questions concerning generalizability and fairness should be carefully considered. Within the research community, addressing data bias requires a combination of new data sources, research that mitigates the impact of bias, and, as done in (Mitchell et al., 2019) , auditing data and models. Sections 2 and ?? cite data sources used for training to support future auditing. A general approach to properly use our system should incorporate ethics considerations as the first-order principles in every step of the system design, maintain a high degree of transparency and interpretability of data, algorithms, models, and functionality throughout the system, make software available as open source for public verification and auditing, and explore countermeasures to protect vulnerable groups. In our ongoing and future work, we have kept increasing the annotated dataset size, add more rounds of user correction and validation, and iteratively incorporate feedback from domain experts who have used the tool, to create new benchmarks for retraining model and conducting more systematic evaluations. We recommend caution of using our system output until a more complete expert evaluation has occurred. Bias in source. Furthermore, our system output may include some biases from the sources, by way of biases in the peer reviewing process. In our previous work (Yu et al., 2014; Ma et al., 2015; Zhang et al., 2019) , we have aggregated source profile, knowledge graphs and evidence for fact-checking across sources. We plan to extend our framework to include fact-checking to enable practitioners and researchers to access to up-to-the-minute information. Bias in test queries. Finally, the queries (i.e., the lists of candidate drugs and proteins/genes) are provide by the users who might have bias in their selection. Addressing the user's own biases falls outside the scope of our project, but as we have stated in the previous subsection, we direct users to carefully examine source information (author, publication date, etc.) and detailed evidence (contextual sentences and documents) associated with the extracted connections. # FA8750-18-2-0014, .S. DTRA HDTRA I -16-1-0002/Project #1553695, eTASC -Empirical Evidence for a Theoretical Approach to Semantic Components, U.S. NSF No. 1741634, the Office of the Director of National Intelligence (ODNI), and Intelligence Advanced Research Projects Activity (IARPA) via contract FA8650-17-C-9116. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. Information mining for covid-19 research from a large volume of scientific literature Publicly available clinical BERT embeddings Event extraction for systems biology by text mining the literature SciB-ERT: A pretrained language model for scientific text A novel featurebased approach to extract drug-drug interactions from biomedical text Tackling biomedical text summarization: OAQA at BioASQ 5B A neural network multi-task learning approach to biomedical named entity recognition The Comparative Toxicogenomics Database: update 2017 Overview of the bacteria biotope task at BioNLP shared task 2016 Fda approved drugs as potential ebola treatments Co-search: Covid-19 information retrieval with semantic search, question answering, and abstractive summarization Deep learning with word embeddings improves biomedical named entity recognition Scisight: Combining faceted navigation and research group detection for covid-19 exploratory scientific search Kgtk: A toolkit for large knowledge graph manipulation and analysis Structure of the host cell recognition and penetration machinery of a staphylococcus aureus bacteriophage The proteinprotein interaction tasks of biocreative iii: classification/ranking of articles and linking bio-ontology concepts to full text Olelo: a web application for intuitive exploration of biomedical literature Artificial intelligence-powered search tools and resources in the fight against covid-19 Biobert: a pre-trained biomedical language representation model for biomedical text mining Biomedical event extraction based on knowledgedriven tree-LSTM Syntax-aware multi-task graph convolutional networks for biomedical relation extraction GAIA: A fine-grained multimedia knowledge extraction system Cross-media structured common space for multimedia event extraction Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Association for Computational Linguistics Model cards for model reporting Association for Computational Linguistics Cross-sentence n-ary relation extraction with graph lstms An empirical study of multi-task learning on BERT for biomedical text mining Improving chemical disease relation extraction with rich features and weakly labeled data Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets Sentence-BERT: Sentence embeddings using Siamese BERTnetworks Global locality in biomedical relation and event extraction Learning named entity tagger using domain-specific dictionary Extracting scientific figures with distantly supervised neural networks An overview of the tesseract ocr engine Interactive extractive search over biomedical corpora Online. Association for Computational Linguistics An overview of the bioasq large-scale biomedical semantic indexing and question answering competition A data driven approach for compound figure separation using convolutional neural networks Exploration and discovery of the covid-19 literature through semantic visualization. ArXiv, abs i2b2/va challenge on concepts, assertions, and relations in clinical text Large-scale event extraction from literature with multi-level gene normalization PaperRobot: Incremental draft generation of scientific ideas Evidenceminer: Textual evidence discovery for life sciences Automatic textual evidence mining in covid-19 literature Comprehensive named entity recognition on cord-19 with distant or weak supervision Distantly supervised biomedical named entity recognition with dictionary expansion Cross-type biomedical named entity recognition with deep multi-task learning PubTator central: automated concept annotation for biomedical full text articles Overview of the biocreative v chemical disease relation (cdr) task Visualization of diseases at risk in the covid-19 literature Learning to answer biomedical factoid & list questions: Oaqa at bioasq 3b. CLEF (Working Notes Learning to answer biomedical questions: OAQA at BioASQ 4B The wisdom of minority: Unsupervised slot filling validation based on multidimensional truth-finding Angiotensinconverting enzyme 2 (ace2) as a sars-cov-2 receptor: molecular mechanisms and potential therapeutic target Expertise-aware truth analysis and task allocation in mobile crowdsourcing Entity linking for biomedical literature Entity linking for biomedical literature Modeling truth existence in truth discovery East: an efficient and accurate scene text detector This research is based upon work supported in part by U.S. DARPA KAIROS Program No. FA8750-19-2-1004, U.S. DARPA AIDA Program