About: Automatic Content Extraction is a research topic. Over the lifetime, 83 publications have been published within this topic receiving 3635 citations.
TL;DR: The objective of the ACE program is to develop technology to automatically infer from human language data the entities being mentioned, the relations among these entities that are directly expressed, and the events in which these entities participate.
Abstract: The objective of the ACE program is to develop technology to automatically infer from human language data the entities being mentioned, the relations among these entities that are directly expressed, and the events in which these entities participate. Data sources include audio and image data in addition to pure text, and Arabic and Chinese in addition to English. The effort involves defining the research tasks in detail, collecting and annotating data needed for training, development, and evaluation, and supporting the research with evaluation tools and research workshops. This program began with a pilot study in 1999. The next evaluation is scheduled for September 2004. Introduction and Background Today’s global web of electronic information, including most notably the www, provides a resource of unbounded information-bearing potential. But to fully exploit this potential requires the ability to extract content from human language automatically. That is the objective of the ACE program – to develop the capability to extract meaning from multimedia sources. These sources include text, audio and image data. The ACE program is a “technocentric” research effort, meaning that the emphasis is on developing core enabling technologies rather than solving the application needs that motivate the research. The program began in 1999 with a study intended to identify those key content extraction tasks to serve as the research targets for the remainder of the program. These tasks were identified in general as the extraction of the entities, relations and events being discussed in the language. In general objective, the ACE program is motivated by and addresses the same issues as the MUC program that preceded it (NIST 1999). The ACE program, however, attempts to take the task “off the page” in the sense that the research objectives are defined in terms of the target objects (i.e., the entities, the relations, and the events) rather than in terms of the words in the text. For example, the so-called “named entity” task, as defined in MUC, is to identify those words (on the page) that are names of entities. In ACE, on the other hand, the corresponding task is to identify the entity so named. This is a different task, one that is more abstract and that involves inference more explicitly in producing an answer. In a real sense, the task is to detect things that “aren’t there”. Reference resolution thus becomes an integral and critical part of solving the problem. During the period 2000-2001, the ACE effort was devoted solely to entity detection and tracking. During the period 2002-2003, relations were explored and added. 1 While the ACE program is directed toward extraction of information from audio and image sources in addition to pure text, the research effort is restricted to information extraction from text. The actual transduction of audio and image data into text is not part of the ACE research effort, although the processing of ASR and OCR output from such transducers is. Now, starting in 2004, events are being explored and added as the third of the three original tasks.
TL;DR: This work employs Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text to obtain competitive results in the Automatic Content Extraction (ACE) evaluation.
Abstract: Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competitive results in the Automatic Content Extraction (ACE) evaluation. Here we present our general approach and describe our ACE results.
TL;DR: An incremental joint framework to simultaneously extract entity mentions and relations using structured perceptron with efficient beam-search is presented, which significantly outperforms a strong pipelined baseline, which attains better performance than the best-reported end-to-end system.
Abstract: We present an incremental joint framework to simultaneously extract entity mentions and relations using structured perceptron with efficient beam-search. A segment-based decoder based on the idea of semi-Markov chain is adopted to the new framework as opposed to traditional token-based tagging. In addition, by virtue of the inexact search, we developed a number of new and effective global features as soft constraints to capture the interdependency among entity mentions and relations. Experiments on Automatic Content Extraction (ACE) 1 corpora demonstrate that our joint model significantly outperforms a strong pipelined baseline, which attains better performance than the best-reported end-to-end system.
TL;DR: A hybrid neural network model is proposed to extract entities and their relationships without any handcrafted features to achieve the state-of-the-art results on entity and relation extraction task.
TL;DR: This paper presents the 2008 ACE XDoc evaluation task and associated infrastructure, and describes the linguistic resources created by LDC to support the evaluation, focusing on new approaches required for data selection, data processing, annotation task definitions and annotation software.
Abstract: The NIST Automatic Content Extraction (ACE) Evaluation expands its focus in 2008 to encompass the challenge of cross-document and cross-language global integration and reconciliation of information. While past ACE evaluations have been limited to local (within-document) detection and disambiguation of entities, relations and events, the current evaluation adds global (cross-document and cross-language) entity disambiguation tasks for Arabic and English. This paper presents the 2008 ACE XDoc evaluation task and associated infrastructure. We describe the linguistic resources created by LDC to support the evaluation, focusing on new approaches required for data selection, data processing, annotation task definitions and annotation software, and we conclude with a discussion of the metrics developed by NIST to support the evaluation.