Declarative information extraction using datalog with embedded extraction predicates

Open AccessProceedings Article

Declarative information extraction using datalog with embedded extraction predicates

- 23 Sep 2007

- pp 1033-1044

223

TL;DR: This paper argues that developing information extraction programs using Datalog with embedded procedural extraction predicates is a good way to proceed, and shows how optimizing such programs raises challenges specific to text data that cannot be accommodated in the current relational optimization framework.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1561/1900000003

Information Extraction

Sunita Sarawagi

- 01 Mar 2008

TL;DR: A taxonomy of the field is created along various dimensions derived from the nature of the extraction task, the techniques used for extraction, the variety of input resources exploited, and the type of output produced to survey techniques for optimizing the various steps in an information extraction pipeline.

...read moreread less

680

•Proceedings Article

Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

Laura Chiticariu, +2 more

- 01 Oct 2013

TL;DR: A case is made for the importance of rule-based IE to industry practitioners and a research agenda is laid out in advancing the state-of-theart in rule- based IE systems which has the potential to bridge the gap between academic research and industry practice.

...read moreread less

296

•Proceedings Article

SystemT: An Algebraic Approach to Declarative Information Extraction

Laura Chiticariu, +5 more

- 11 Jul 2010

TL;DR: A rule-based IE system whose basic design removes the expressivity and performance limitations of current systems based on cascading grammars, SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules.

...read moreread less

191

Proceedings Article•10.1145/1807085.1807097

From information to knowledge: harvesting entities and relationships from web sources

Gerhard Weikum, +1 more

- 06 Jun 2010

TL;DR: This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting, to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall.

...read moreread less

177

•Posted Content

Incremental Knowledge Base Construction Using DeepDive

Jaeho Shin, +5 more

- 03 Feb 2015

- arXiv: Databases

TL;DR: This work describes DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and presents techniques to make the KBC process more efficient, and proposes two methods for incremental inference, based, respectively, on sampling and variational techniques.

...read moreread less

175

...

Expand

References

Journal Article•10.1145/360825.360855

Efficient string matching: an aid to bibliographic search

Alfred V. Aho, +1 more

- 01 Jun 1975

- Communications of The ACM

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

...read moreread less

3.4K

Journal Article•10.1017/S1351324904003523

UIMA: an architectural approach to unstructured information processing in the corporate research environment

David A. Ferrucci, +1 more

- 01 Sep 2004

- Natural Language Engineering

TL;DR: A general introduction to U IMA is given focusing on the design points of its analysis engine architecture and how UIMA is helping to accelerate research and technology transfer is discussed.

...read moreread less

1K

Journal Article•10.1145/565117.565137

A brief survey of web data extraction tools

Alberto H. F. Laender, +3 more

- 01 Jun 2002

TL;DR: A taxonomy for characterizing Web data extraction fools is proposed, a survey of major web data extraction tools described in the literature is briefly surveyed, and a qualitative analysis of them is provided.

...read moreread less

854

Proceedings Article•10.1109/ICDE.1993.344061

The Volcano optimizer generator: extensibility and efficient search

Goetz Graefe, +1 more

- 19 Apr 1993

TL;DR: The Volcano project, which provides efficient, extensible tools for query and request processing, particularly for object-oriented and scientific database systems, is reviewed, and it is shown that the search engine of the Volcano optimizer generator is more extensible and powerful.

...read moreread less

504

•Book

Foundations of Databases: The Logical Level

Serge Abiteboul, +2 more

- 01 Jan 1995

TL;DR: Foundations of Databases presents indepth coverage of this theory and surveys several emerging topics and presents a unifying and contemporary perspective on the field.

...read moreread less

445

...

Expand

Declarative information extraction using datalog with embedded extraction predicates

Chat with Paper

AI Agents for this Paper

Citations

Information Extraction

Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

SystemT: An Algebraic Approach to Declarative Information Extraction

From information to knowledge: harvesting entities and relationships from web sources

Incremental Knowledge Base Construction Using DeepDive

References

Efficient string matching: an aid to bibliographic search

UIMA: an architectural approach to unstructured information processing in the corporate research environment

A brief survey of web data extraction tools

The Volcano optimizer generator: extensibility and efficient search

Foundations of Databases: The Logical Level

Related Papers (5)

Information Extraction

Open information extraction from the web

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

UIMA: an architectural approach to unstructured information processing in the corporate research environment

Web-scale information extraction in knowitall: (preliminary results)