Collaborative biocuration—text-mining development task for document prioritization for curation
TL;DR: A detailed description of the Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is presented and a summary of the results are presented.
read more
Abstract: The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The 'BioCreative Workshop 2012' subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical-gene-disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical 'named-entity recognition' (NER) across articles; the effectiveness of 'information retrieval' (IR) was also measured based on 'mean average precision' (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD's biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
The Comparative Toxicogenomics Database: update 2017
Allan Peter Davis,Cynthia J. Grondin,Robin J. Johnson,Daniela Sciaky,Benjamin L. King,Roy McMorran,Jolene Wiegers,Thomas C. Wiegers,Carolyn J. Mattingly +8 more
TL;DR: This update describes the new exposure module (that harmonizes exposure science information with core toxicogenomic data) and introduces a novel dataset of GO-disease inferences (that identify common molecular underpinnings for seemingly unrelated pathologies).
PubTator: a web-based text mining tool for assisting biocuration
TL;DR: PubTator is described, a web-based system for assisting biocuration that featuring a PubMed-like interface, and being equipped with multiple challenge-winning text mining algorithms to ensure the quality of its automatic results.
DNorm: disease name normalization with pairwise learning to rank.
TL;DR: This article introduces the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH® and OMIM, a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data.
544
The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.
Allan Peter Davis,Cynthia J. Grondin,Kelley Lennon-Hopkins,Cynthia A. Saraceni-Richards,Daniela Sciaky,Benjamin L. King,Thomas C. Wiegers,Carolyn J. Mattingly +7 more
TL;DR: The prototype database originally described in its first report has transformed into a sophisticated resource used actively today to help scientists develop and test hypotheses about the etiologies of environmentally influenced diseases.
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
Chih-Hsuan Wei,Yifan Peng,Robert Leaman,Allan Peter Davis,Carolyn J. Mattingly,Jiao Li,Thomas C. Wiegers,Zhiyong Lu +7 more
TL;DR: This task was found to be successful in engaging the text-mining research community, producing a large annotated corpus and improving the results of automatic disease recognition and CDR extraction.
References
Entrez Gene: gene-centered information at NCBI
TL;DR: Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.
•Proceedings Article
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program
Alan R. Aronson
- 01 Jan 2001
TL;DR: MetaMap as discussed by the authors is a system developed at the National Library of Medicine (NLM) to map biomedical text to the UMLS Metathesaurus or, equivalently, to discover METAThesaurus concepts referred to in text.
2.1K
TREC: Experiment and evaluation in information retrieval
José Luis Vicedo,Jaime Gómez +1 more
TL;DR: One of recommendation of the book that you need to read is shown, which is a kind of precious book written by an experienced author and the reasonable reasons why you should read this book are shown.
1.1K
The Comparative Toxicogenomics Database: update 2017
Allan Peter Davis,Cynthia J. Grondin,Robin J. Johnson,Daniela Sciaky,Benjamin L. King,Roy McMorran,Jolene Wiegers,Thomas C. Wiegers,Carolyn J. Mattingly +8 more
TL;DR: This update describes the new exposure module (that harmonizes exposure science information with core toxicogenomic data) and introduces a novel dataset of GO-disease inferences (that identify common molecular underpinnings for seemingly unrelated pathologies).
BANNER: an executable survey of advances in biomedical named entity recognition.
Robert Leaman,Graciela Gonzalez +1 more
- 01 Dec 2007
TL;DR: BANNER is an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field and is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps.