Assisted curation: does text mining really help?
Beatrice Alex,Claire Grover,Barry Haddow,Mijail Kabadjov,Ewan Klein,Michael Matthews,Stuart Roebuck,Richard Tobin,Xinglong Wang +8 more
- 01 Dec 2007
- pp 556-567
TL;DR: Three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP) together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs).
read more
Abstract: Although text mining shows considerable promise as a tool for supporting the curation of biomedical text, there is little concrete evidence as to its effectiveness. We report on three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP), together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs). In our curation scenario, we found that a maximum speed-up of 1/3 in curation time can be expected if NLP output is perfectly accurate. The preference of one curator for consistent NLP output and output with high recall needs to be confirmed in a larger study with several curators.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
ChemSpot: a hybrid system for chemical named entity recognition
TL;DR: ChemSpot, a named entity recognition (NER) tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and International Union of Pure and Applied Chemistry entities is presented.
286
ExaCT: automatic extraction of clinical trial characteristics from journal publications
TL;DR: An automatic information extraction system that assists users with locating and extracting key trial characteristics from full-text journal articles reporting on randomized controlled trials (RCTs) and can be extended to handle other characteristics and document types.
Layout-aware text extraction from full-text PDF of scientific articles
TL;DR: This paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections and shows that it can identify text blocks and classify them into rhetorical categories with Precision.
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?
TL;DR: It is concluded how text mining techniques could be tightly integrated into the manual annotation process through novel authoring systems to scale-up high-quality manual curation.
Semi-automatic semantic annotation of PubMed queries
TL;DR: This study shows that automatic pre-annotations are found helpful by most annotators, and suggests using an automatic tool to assist large-scale manual annotation projects to speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations.
104
References
A network of protein?protein interactions in yeast
TL;DR: This approach correctly predicts a functional category for 72% of the 1,393 characterized proteins with at least one partner of known function, and has been applied to predict functions for 364 previously uncharacterized proteins.
1.4K
PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine
Ian Donaldson,Joel Martin,Berry de Bruijn,Cheryl Wolting,Vicki Lay,Brigitte Tuekam,Shudong Zhang,Berivan Baskin,Gary D. Bader,Gary D. Bader,Katerina Michalickova,Tony Pawson,Christopher W. V. Hogue +12 more
TL;DR: This work presents an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.
•Posted Content
Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup
TL;DR: A Challenge Evaluation task that was created for the Knowledge Discovery and Data Mining (KDD) Challenge Cup, where 18 participating groups provided systems that flagged articles for curation, based on whether the article contained experimental evidence for gene expression products.
176
Facts from text--is text mining ready to deliver?
TL;DR: The mining of information from scientific literature using computational tools has tremendous potential for knowledge discovery, but how close are the authors to realizing this potential?
•Proceedings Article
Rule-Based Chunking and Reusability
Claire Grover,Richard Tobin +1 more
- 01 May 2006
TL;DR: A rule-based approach to chunking implemented using the LT-XML2 and LT-TTT2 tools is discussed and it is shown that this approach is easy to adapt to different chunking styles and that the mark-up of further linguistic information can be added to the rules at little extra cost.
Related Papers (5)
Alexander A. Morgan,Zhiyong Lu,Xinglong Wang,Aaron Cohen,Juliane Fluck,Patrick Ruch,Anna Divoli,Katrin Fundel,Robert Leaman,Jörg Hakenberg,Chengjie Sun,Heng Hui Liu,Rafael Torres,Michael Krauthammer,William W. Lau,Hongfang Liu,Chun-Nan Hsu,Martijn J. Schuemie,K. Bretonnel Cohen,Lynette Hirschman +19 more
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more