Collaborative biocuration—text-mining development task for document prioritization for curation

doi:10.1093/DATABASE/BAS037

Open AccessJournal Article10.1093/DATABASE/BAS037

Collaborative biocuration—text-mining development task for document prioritization for curation

Thomas C. Wiegers, +2 more

- 01 Jan 2012

- Database

- Vol. 2012, Iss: 2012

39

TL;DR: A detailed description of the Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is presented and a summary of the results are presented.

Abstract: The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The 'BioCreative Workshop 2012' subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical-gene-disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical 'named-entity recognition' (NER) across articles; the effectiveness of 'information retrieval' (IR) was also measured based on 'mean average precision' (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD's biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1093/NAR/GKW838

The Comparative Toxicogenomics Database: update 2017

Allan Peter Davis, +8 more

- 04 Jan 2017

- Nucleic Acids Research

TL;DR: This update describes the new exposure module (that harmonizes exposure science information with core toxicogenomic data) and introduces a novel dataset of GO-disease inferences (that identify common molecular underpinnings for seemingly unrelated pathologies).

...read moreread less

721

•Journal Article•10.1093/NAR/GKT441

PubTator: a web-based text mining tool for assisting biocuration

Chih-Hsuan Wei, +2 more

- 01 Jul 2013

- Nucleic Acids Research

TL;DR: PubTator is described, a web-based system for assisting biocuration that featuring a PubMed-like interface, and being equipped with multiple challenge-winning text mining algorithms to ensure the quality of its automatic results.

...read moreread less

640

•Journal Article•10.1093/BIOINFORMATICS/BTT474

DNorm: disease name normalization with pairwise learning to rank.

Robert Leaman, +2 more

- 15 Nov 2013

- Bioinformatics

TL;DR: This article introduces the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM, a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data.

...read moreread less

544

•Journal Article•10.1093/NAR/GKU935

The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.

Allan Peter Davis, +7 more

- 28 Jan 2015

- Nucleic Acids Research

TL;DR: The prototype database originally described in its first report has transformed into a sophisticated resource used actively today to help scientists develop and test hypotheses about the etiologies of environmentally influenced diseases.

...read moreread less

414

•Journal Article•10.1093/DATABASE/BAW032

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task

Chih-Hsuan Wei, +7 more

- 01 Jan 2016

- Database

TL;DR: This task was found to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction.

...read moreread less

321

...

Expand

References

•Journal Article•10.1093/NAR/GKL993

Entrez Gene: gene-centered information at NCBI

Donna Maglott, +3 more

- 17 Dec 2004

- Nucleic Acids Research

TL;DR: Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.

...read moreread less

2.4K

•Proceedings Article

Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program

Alan R. Aronson

- 01 Jan 2001

TL;DR: MetaMap as discussed by the authors is a system developed at the National Library of Medicine (NLM) to map biomedical text to the UMLS Metathesaurus or, equivalently, to discover METAThesaurus concepts referred to in text.

...read moreread less

2.1K

Journal Article•10.1002/ASI.20583

TREC: Experiment and evaluation in information retrieval

José Luis Vicedo, +1 more

- 01 Apr 2007

- Journal of the Association for Informati...

TL;DR: One of recommendation of the book that you need to read is shown, which is a kind of precious book written by an experienced author and the reasonable reasons why you should read this book are shown.

...read moreread less

1.1K

•Journal Article•10.1093/NAR/GKW838

The Comparative Toxicogenomics Database: update 2017

Allan Peter Davis, +8 more

- 04 Jan 2017

- Nucleic Acids Research

TL;DR: This update describes the new exposure module (that harmonizes exposure science information with core toxicogenomic data) and introduces a novel dataset of GO-disease inferences (that identify common molecular underpinnings for seemingly unrelated pathologies).

...read moreread less

721

•Proceedings Article•10.1142/9789812776136_0062

BANNER: an executable survey of advances in biomedical named entity recognition.

Robert Leaman, +1 more

- 01 Dec 2007

TL;DR: BANNER is an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field and is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps.

...read moreread less

580