key: cord-0883343-32azlh3g
authors: Kim, Jin-Dong; Cohen, Kevin Bretonnel; Rinaldi, Fabio; Lu, Zhiyong; Park, Hyun-Seok
title: Editor’s introduction to the special section on the 7th Biomedical Linked Annotation Hackathon (BLAH7)
date: 2021-09-27
journal: Genomics Inform
DOI: 10.5808/gi.19.3.e1
sha: 1df47b9c656c6d4287bb26ed2fc018824e5a6191
doc_id: 883343
cord_uid: 32azlh3g

nan

The special section is dedicated to reporting achievements of the 7th Biomedical Linked Annotation Hackathon (BLAH7). BLAH is an annual hackathon event which is organized to join forces of biomedical text mining for the goal to promote interoperability among text mining resources. This year, the 7th edition was held in January, 2021. Due to the pandemic, it was organized as an online event, with the special theme "coronavirus disease 2019 (COVID-19)". The goal was to develop text mining resources to help address the pandemic situation. During the hackathon, 47 participants from 11 countries worked on voluntarily organized projects, and the results are reported in this special collection.

This section includes seven application notes and one opinion article. The first application note by Hernandez et al. [1] presents a Twitter dataset which includes more than 120 million "potentially clinically-relevant" tweets. The tweets are automatically annotated for clinically important named entities like drugs and symptoms. The dataset is released publicly to facilitate research on mining social media data for biomedical and clinical applications. Lithgow-Serrano et al. [2] presents named entity annotation of the LitCovid [3] dataset using OntoGene's Biomedical Entity Recogniser (OGER) [4] and shows its effectiveness for document classification. Ouyang et al. [5] presents the AGAC annotation [6] added on top of the PubTator [7] and OGER annotations and shows that the addition is potentially useful to mine regulatory or causal relationships between biomedical entities. The following three papers represent efforts for multilingualism of text mining. Barros et al. [8] presents a multilingual parallel corpus of PubMed articles for the language pairs English-Portuguese and English-Spanish. Their corpus was annotated for biomedical entities and also relationships between them, which was then used to develop a multilingual recommendation dataset for recommending biomedical entities to the authors of the articles. Yamaguchi et al. [9] and Soares et al. [10] are written by the same set of authors. They developed two versions of Japanese translation of MeSH terms, one through merging of existing resources and manual curation, and another through an automatic translation method, of which the results are reported in the two separate application notes. Larmande et al. [11] reports a revision to OryzaGP [12] , a corpus of PubMed articles relevant to rice species, which are automatically annotated for proteins and genes. The last one by Dohi et al. [13] presents the authors' opinion after their case study with Alexander disease towards visualizing the phenotype diversity. Based on the spirit of sharing, most of the resulting datasets, including corpora, annotations, and dictionaries, are released through open repositories like GitHub, PubAnnotation/PubDictionaries [14] , and so on. We hope that this special collection will be an opportunity for the readers of the journal Genomics & Informatics to get informed about recent biomedical text mining activities aimed at providing support in the current COVID-19 pandemic situation.

A biomedically oriented automatically annotated Twitter COVID-19 dataset

Improving classification of low-resource COVID-19 literature by using Named Entity Recognition

LitCovid: an open database of COVID-19 literature

Entity recognition in the biomedical domain using a hybrid approach

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19

An active gene annotation corpus and its application on anti-epilepsy drug discovery

PubTator: a web-based text mining tool for assisting biocuration

COVID-19 recommender system based on an annotated multilingual corpus

Constructing Japanese MeSH term dictionaries related to the COVID-19 literature

creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information

OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition

OryzaGP: rice gene and protein dataset for named-entity recognition

Visualizing the phenotype diversity: a case study of Alexander disease

Open Agile text mining for bioinformatics: the PubAnnotation ecosystem