Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database.
TL;DR: The original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that was developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD).
read more
Abstract: We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
PubTator: a web-based text mining tool for assisting biocuration
TL;DR: PubTator is described, a web-based system for assisting biocuration that featuring a PubMed-like interface, and being equipped with multiple challenge-winning text mining algorithms to ensure the quality of its automatic results.
Information retrieval and text mining technologies for chemistry
Martin Krallinger,Obdulia Rabal,Anália Lourenço,Anália Lourenço,Julen Oyarzabal,Alfonso Valencia +5 more
TL;DR: This Review provides a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting information demands of chemical information contained in scientific literature, patents, technical reports, or the web.
Text mining to support gene ontology curation and vice versa
TL;DR: It is argued that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions, and a new type of QA system is emerging, so-called Deep QA which uses machine learning methods trained with curated contents.
A document classifier for medicinal chemistry publications trained on the ChEMBL corpus
George Papadatos,Gerard J. P. van Westen,Samuel Croset,Rita Santos,Simone Trubian,John P. Overington +5 more
TL;DR: Large-scale machine learning document classification was shown to be very robust and flexible for this particular application, as illustrated in four distinct text-mining-based use cases.
Document triage for identifying protein-protein interactions affected by mutations: a neural network ensemble approach.
TL;DR: In this approach, several neural network models are used for document triage, and the ensemble performs better than any individual model and is incorporated into the approach to address the problem of the limited size of training set.
5
References
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Combining SVMs with Various Feature Selection Strategies
Yi-Wei Chen,Chih-Jen Lin +1 more
- 01 Jan 2006
TL;DR: This article investigates the performance of combining support vector machines (SVM) and various feature selection strategies, some are filter-type approaches: general feature selection methods independent of SVM, and some are wrapper-type methods: modifications of S VM which can be used to select features.
Overview of BioCreAtIvE: critical assessment of information extraction for biology
TL;DR: The first BioCreAtIvE assessment provided state-of-the-art performance results for a basic task (gene name finding and normalization), where the best systems achieved a balanced 80% precision / recall or better, which potentially makes them suitable for real applications in biology.
Automatic assignment of biomedical categories: toward a generic approach
TL;DR: A lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units, which shows the effectiveness of phrase indexing for both GO and MeSH categorization.
MeSH Up
Dolf Trieschnigg,Piotr Pęzik,Vivian Lee,Franciska de Jong,Wessel Kraaij,Dietrich Rebholz-Schuhmann +5 more
TL;DR: The annotation of biomedical texts using controlled vocabularies such as MeSH can be automated to improve text-only IR and the automatic MeSH annotation system proposed is highly scalable and generates improvements in IR comparable with those observed for manual annotations.
Related Papers (5)
Andreas Kohn,Gerhard Peter,Udo Lindemann +2 more
- 01 Jan 2010
Konishi Kosuke,Ikeda Shoji,Furukawa Naohiro +2 more
- 07 May 2009