Topic

Automatic Content Extraction

About: Automatic Content Extraction is a research topic. Over the lifetime, 83 publications have been published within this topic receiving 3635 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Proceedings Article•

The Automatic Content Extraction (ACE) Program Tasks, Data, and Evaluation

[...]

George R. Doddington, Alexis Mitchell, Mark A. Przybocki, Lance Ramshaw, Stephanie Strassel, Ralph Weischedel - Show less +2 more

1 May 2004

TL;DR: The objective of the ACE program is to develop technology to automatically infer from human language data the entities being mentioned, the relations among these entities that are directly expressed, and the events in which these entities participate.

...read moreread less

Abstract: The objective of the ACE program is to develop technology to automatically infer from human language data the entities being mentioned, the relations among these entities that are directly expressed, and the events in which these entities participate. Data sources include audio and image data in addition to pure text, and Arabic and Chinese in addition to English. The effort involves defining the research tasks in detail, collecting and annotating data needed for training, development, and evaluation, and supporting the research with evaluation tools and research workshops. This program began with a pilot study in 1999. The next evaluation is scheduled for September 2004. Introduction and Background Today’s global web of electronic information, including most notably the www, provides a resource of unbounded information-bearing potential. But to fully exploit this potential requires the ability to extract content from human language automatically. That is the objective of the ACE program – to develop the capability to extract meaning from multimedia sources. These sources include text, audio and image data. The ACE program is a “technocentric” research effort, meaning that the emphasis is on developing core enabling technologies rather than solving the application needs that motivate the research. The program began in 1999 with a study intended to identify those key content extraction tasks to serve as the research targets for the remainder of the program. These tasks were identified in general as the extraction of the entities, relations and events being discussed in the language. In general objective, the ACE program is motivated by and addresses the same issues as the MUC program that preceded it (NIST 1999). The ACE program, however, attempts to take the task “off the page” in the sense that the research objectives are defined in terms of the target objects (i.e., the entities, the relations, and the events) rather than in terms of the words in the text. For example, the so-called “named entity” task, as defined in MUC, is to identify those words (on the page) that are names of entities. In ACE, on the other hand, the corresponding task is to identify the entity so named. This is a different task, one that is more abstract and that involves inference more explicitly in producing an answer. In a real sense, the task is to detect things that “aren’t there”. Reference resolution thus becomes an integral and critical part of solving the problem. During the period 2000-2001, the ACE effort was devoted solely to entity detection and tracking. During the period 2002-2003, relations were explored and added. 1 While the ACE program is directed toward extraction of information from audio and image sources in addition to pure text, the research effort is restricted to information extraction from text. The actual transduction of audio and image data into text is not part of the ACE research effort, although the processing of ASR and OCR output from such transducers is. Now, starting in 2004, events are being explored and added as the third of the three original tasks.

...read moreread less

1,371 citations

Proceedings Article•10.3115/1219044.1219066•

Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations

[...]

Nanda Kambhatla¹•Institutions (1)

IBM¹

21 Jul 2004

TL;DR: This work employs Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text to obtain competitive results in the Automatic Content Extraction (ACE) evaluation.

...read moreread less

Abstract: Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competitive results in the Automatic Content Extraction (ACE) evaluation. Here we present our general approach and describe our ACE results.

...read moreread less

715 citations

Proceedings Article•10.3115/V1/P14-1038•

Incremental Joint Extraction of Entity Mentions and Relations

[...]

Qi Li¹, Heng Ji¹•Institutions (1)

Rensselaer Polytechnic Institute¹

1 Jun 2014

TL;DR: An incremental joint framework to simultaneously extract entity mentions and relations using structured perceptron with efficient beam-search is presented, which significantly outperforms a strong pipelined baseline, which attains better performance than the best-reported end-to-end system.

...read moreread less

Abstract: We present an incremental joint framework to simultaneously extract entity mentions and relations using structured perceptron with efficient beam-search. A segment-based decoder based on the idea of semi-Markov chain is adopted to the new framework as opposed to traditional token-based tagging. In addition, by virtue of the inexact search, we developed a number of new and effective global features as soft constraints to capture the interdependency among entity mentions and relations. Experiments on Automatic Content Extraction (ACE) 1 corpora demonstrate that our joint model significantly outperforms a strong pipelined baseline, which attains better performance than the best-reported end-to-end system.

...read moreread less

515 citations

Journal Article•10.1016/J.NEUCOM.2016.12.075•

Joint entity and relation extraction based on a hybrid neural network

[...]

Suncong Zheng¹, Yuexing Hao¹, Dongyuan Lu², Hongyun Bao¹, Jiaming Xu¹, Hongwei Hao¹, Bo Xu¹ - Show less +3 more•Institutions (2)

Chinese Academy of Sciences¹, Beijing Institute of Foreign Trade²

27 Sep 2017-Neurocomputing

TL;DR: A hybrid neural network model is proposed to extract entities and their relationships without any handcrafted features to achieve the state-of-the-art results on entity and relation extraction task.

...read moreread less

276 citations

Proceedings Article•

Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction

[...]

Stephanie Strassel¹, Mark A. Przybocki², Kay Peterson², Zhiyi Song¹, Kazuaki Maeda - Show less +1 more•Institutions (2)

University of Pennsylvania¹, National Institute of Standards and Technology²

27 Aug 2008

TL;DR: This paper presents the 2008 ACE XDoc evaluation task and associated infrastructure, and describes the linguistic resources created by LDC to support the evaluation, focusing on new approaches required for data selection, data processing, annotation task definitions and annotation software.

...read moreread less

Abstract: The NIST Automatic Content Extraction (ACE) Evaluation expands its focus in 2008 to encompass the challenge of cross-document and cross-language global integration and reconciliation of information. While past ACE evaluations have been limited to local (within-document) detection and disambiguation of entities, relations and events, the current evaluation adds global (cross-document and cross-language) entity disambiguation tasks for Arabic and English. This paper presents the 2008 ACE XDoc evaluation task and associated infrastructure. We describe the linguistic resources created by LDC to support the evaluation, focusing on new approaches required for data selection, data processing, annotation task definitions and annotation software, and we conclude with a discussion of the metrics developed by NIST to support the evaluation.

...read moreread less

61 citations

...

Expand

Performance Metrics

Papers

474

Citations

No. of papers in the topic in previous years
Year	Papers
2021	2
2020	6
2019	2
2018	1
2017	4
2016	6

Automatic Content Extraction

Topic Tools

Papers published on a yearly basis

Papers

The Automatic Content Extraction (ACE) Program Tasks, Data, and Evaluation

Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations

Incremental Joint Extraction of Entity Mentions and Relations

Joint entity and relation extraction based on a hybrid neural network

Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction

Related Topics (5)

Performance Metrics