Optimizing graph-based patterns to extract biomedical events from the literature
TL;DR: This work adapted its event extraction system to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biologyrelated event types with varying arguments concerning 18 kinds of biological entities.
read more
Abstract: We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. In addition to adapting our system to the two tasks, we also attempted to integrate semantics into the graph matching scheme using a distributional similarity model for more events, and evaluated the event extraction impact of using paths of all possible lengths as key context dependencies beyond using only the shortest paths in our system. We achieved a 46.38% F-score in the CG task (ranking 3
rd
) and a 48.93% F-score in the GE task (ranking 4
th
). We explored three ways to further extend our event extraction system in our previously published work: (1) We allow non-essential nodes to be skipped, and incorporated a node skipping penalty into the subgraph distance function of our approximate subgraph matching algorithm. (2) Instead of assigning a unified subgraph distance threshold to all patterns of an event type, we learned a customized threshold for each pattern. (3) We implemented the well-known Empirical Risk Minimization (ERM) principle to optimize the event pattern set by balancing prediction errors on training data against regularization. When evaluated on the official GE task test data, these extensions help to improve the extraction precision from 62% to 65%. However, the overall F-score stays equivalent to the previous performance due to a 1% drop in recall.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013
Sampo Pyysalo,Tomoko Ohta,Rafal Rak,Andrew Rowley,Hong-Woo Chun,Sung-Jae Jung,Sung-Pil Choi,Jun'ichi Tsujii,Sophia Ananiadou +8 more
TL;DR: The results indicate that existing event extraction technology can generalize to meet the novel challenges represented by the CG and PC task settings, suggesting that extraction methods are capable of supporting the construction of knowledge bases on the molecular mechanisms of cancer and the curation of biomolecular pathway models.
Metabolic Capability and Phylogenetic Diversity of Mono Lake during a Bloom of the Eukaryotic Phototroph Picocystis sp. Strain ML.
Blake W. Stamps,Heather S. Nunn,Victoria A. Petryshyn,Ronald S. Oremland,Laurence G. Miller,Michael R. Rosen,Kohen W. Bauer,Katharine J. Thompson,Elise M Tookmanian,A. R. Waldeck,Sean J. Loyd,Hope A. Johnson,Bradley S. Stevenson,William M. Berelson,Frank A. Corsetti,John R. Spear +15 more
TL;DR: A comprehensive molecular analysis along a depth transect near the center of the lake from the surface to a depth of 25 m in June 2016 suggested a depletion of anaerobic sulfate-reducing microorganisms throughout the lake's water column, suggesting a type of seed bank that could restore the microbial community as a bloom subsides.
27
Multiscale Laplacian graph kernel combined with lexico-syntactic patterns for biomedical event extraction from literature
TL;DR: This paper presents a hybrid approach that integrates an ensemble-learning framework by combining a Multiscale Laplacian Graph kernel and a feature-based linear kernel, using a pattern-matching engine to identify biomedical events with arguments.
11
Biomedical knowledge base construction from text and its applications in knowledge-based systems
Patrick Ernst
- 01 Jan 2017
TL;DR: A largely automated and scalable pattern-based knowledge extraction method covering a spectrum of different text genres and distilling a wide variety of facts from different biomedical areas is devised, and the fact-pattern duality paradigm of previous methods is generalized.
References
•Book
Introduction to Algorithms
Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest +2 more
- 01 Jan 1990
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
24.8K
Introduction to algorithms: 4. Turtle graphics
TL;DR: In this article, a language similar to logo is used to draw geometric pictures using this language and programs are developed to draw geometrical pictures using it, which is similar to the one we use in this paper.
15.4K
WordNet : an electronic lexical database
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
14.4K
•Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
- 01 Jan 1983
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
12.6K
•Book
Foundations of Statistical Natural Language Processing
Christopher D. Manning,Hinrich Schütze +1 more
- 28 May 1999
TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.