TL;DR: An overview of the relationship between ACE and ERE and compares them to the more restricted standard of the TACKBP slot-filling task and the more expansive standard of FrameNet is provided.
Abstract: The resurgence of effort within computational semantics has led to increased interest in various types of relation extraction and semantic parsing. While various manually annotated resources exist for enabling this work, these materials have been developed with different standards and goals in mind. In an effort to develop better general understanding across these resources, we provide a summary overview of the standards underlying ACE, ERE, TAC-KBP Slot-filling, and FrameNet. 1 Overview ACE and ERE are comprehensive annotation standards that aim to consistently annotate Entities, Events, and Relations within a variety of documents. The ACE (Automatic Content Extraction) standard was developed by NIST in 1999 and has evolved over time to support different evaluation cycles, the last evaluation having occurred in 2008. The ERE (Entities, Relations, Events) standard was created under the DARPA DEFT program as a lighter-weight version of ACE with the goal of making annotation easier, and more consistent across annotators. ERE attempts to achieve this goal by consolidating some of the annotation type distinctions that were found to be the most problematic in ACE, as well as removing some more complex annotation features. This paper provides an overview of the relationship between these two standards and compares them to the more restricted standard of the TACKBP slot-filling task and the more expansive standard of FrameNet. Sections 3 and 4 examine Relations and Events in the ACE/ERE standards, section 5 looks at TAC-KBP slot-filling, and section 6 compares FrameNet to the other standards.
TL;DR: In this paper, a digital ink annotation process and system for processing digital documents and digital ink annotations therein is described, which maintains an annotation's position within a document such that the original intent and meaning of the annotation is preserved even if the document is edited, resized, displayed on a different device or otherwise modified.
Abstract: A digital ink annotation process and system for processing digital documents and digital ink annotations therein. The process and system maintain an annotation's position within a document such that the original intent and meaning of the annotation is preserved. This is true even if the document is edited, resized, displayed on a different device or otherwise modified. The digital ink annotation process includes automatic and manual grouping of digital ink strokes within a document to define digital ink annotations, classifying the annotations according to annotation type, and anchoring the annotations to appropriate regions or positions in a document. The process further includes reflowing the annotations in a new document layout such that the annotations conform and adapt to the new layout while preserving the original intents and meanings of the annotations. A digital ink annotation system includes a classification module, an anchoring module, a reflow module and a clean-up module to implement the digital ink annotation process.
TL;DR: A recursive, semi-automatic annotation method for video using a state-of-the-art video object segmentation method to propose initial annotations for all frames in a video based on only a few manual object segmentations.
Abstract: Deep learning requires large amounts of annotated data. Manual annotation of objects in video is, regardless of annotation type, a tedious and time-consuming process. In particular, for scarcely used image modalities human annotation is hard to justify. In such cases, semi-automatic annotation provides an acceptable option. In this work, a recursive, semi-automatic annotation method for video is presented. The proposed method utilizes a state-of-the-art video object segmentation method to propose initial annotations for all frames in a video based on only a few manual object segmentations. In the case of a multi-modal dataset, the multi-modality is exploited to refine the proposed annotations even further. The final tentative annotations are presented to the user for manual correction. The method is evaluated on a subset of the RGBT-234 visual-thermal dataset reducing the workload for a human annotator with approximately 78% compared to full manual annotation. Utilizing the proposed pipeline, sequences are annotated for the VOT-RGBT 2019 challenge.
TL;DR: An annotation type system for a data-driven NLP core system that covers formal document structure and document meta information, as well as the linguistic levels of morphology, syntax and semantics is introduced.
Abstract: We introduce an annotation type system for a data-driven NLP core system. The specifications cover formal document structure and document meta information, as well as the linguistic levels of morphology, syntax and semantics. The type system is embedded in the framework of the Unstructured Information Management Architecture (UIMA).
TL;DR: In this paper, a plurality of pre-annotated reference documents and a set of annotation types associated with the reference documents are received and evaluated using a text analysis engine, and an integral reference content rate is computed based on the number of reference annotation clusters for each annotation type.
Abstract: Evaluating the performance of a text analysis engine is provided. A plurality of pre-annotated reference documents and a set of annotation types associated with the pre-annotated reference documents are received. Annotation contexts of reference annotations in the plurality of pre-annotated reference documents are analyzed using the set of annotation types. Similar annotation contexts are identified between the reference annotations and the set of annotation types. Responsive to identifying the similar annotation contexts, the similar annotation contexts are clustered thereby forming a plurality of reference annotation clusters. A set of reference content heterogeneity scores are computed based on the number of reference annotation clusters for each annotation type in the set of annotation types. An integral reference content rate for the set of annotation types is then computed and output to a user.