Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models

doi:10.48550/arxiv.2311.07314

Journal Article10.48550/arxiv.2311.07314

Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models

Junpeng Li, +2 more

- 13 Nov 2023

- arXiv.org

- Vol. abs/2311.07314

5

TL;DR: This work proposes a method integrating a large language model (LLM) and a natural language inference (NLI) module to generate relation triples, thereby augmenting document-level relation datasets and demonstrates the effectiveness of the approach by introducing an enhanced dataset known as DocGNRE, which excels in re-annotating numerous long-tail relation types.

Abstract: Document-level Relation Extraction (DocRE), which aims to extract relations from a long context, is a critical challenge in achieving fine-grained structural comprehension and generating interpretable document representations. Inspired by recent advances in in-context learning capabilities emergent from large language models (LLMs), such as ChatGPT, we aim to design an automated annotation method for DocRE with minimum human effort. Unfortunately, vanilla in-context learning is infeasible for document-level relation extraction due to the plenty of predefined fine-grained relation types and the uncontrolled generations of LLMs. To tackle this issue, we propose a method integrating a large language model (LLM) and a natural language inference (NLI) module to generate relation triples, thereby augmenting document-level relation datasets. We demonstrate the effectiveness of our approach by introducing an enhanced dataset known as DocGNRE, which excels in re-annotating numerous long-tail relation types. We are confident that our method holds the potential for broader applications in domain-specific relation type definitions and offers tangible benefits in advancing generalized language semantic comprehension.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Preprint•10.48550/arxiv.2406.08223

Research Trends for the Interplay between Large Language Models and Knowledge Graphs

Hanieh Khorashadizadeh, +9 more

- 12 Jun 2024

TL;DR: The interplay between Large Language Models and Knowledge Graphs is a key area of research for advancing AI capabilities in understanding, reasoning, and language processing. The research explores areas such as KG Question Answering, ontology generation, and KG validation. It also examines the roles of LLMs in generating descriptive texts and natural language queries for KGs.

...read moreread less

3

Journal Article•10.1016/j.ipm.2024.103904

Automatically learning linguistic structures for entity relation extraction

Weizhe Yang, +4 more

- 26 Sep 2024

- Information Processing and Management

1

Journal Article•10.1016/j.ipm.2024.103909

An adaptive confidence-based data revision framework for Document-level Relation Extraction

Chao Jiang, +4 more

- 26 Sep 2024

- Information Processing and Management

Journal Article•10.1007/s44443-025-00055-w

Large and Small models for collaborative cross-lingual data augmentation in entity relationship extraction for low-resource languages

Longjie Bao, +1 more

- 01 Jun 2025

- Journal of King Saud University - Comput...

Journal Article•10.1007/s10489-024-05798-z

Hierarchical symmetric cross entropy for distant supervised relation extraction

Yun Liu, +6 more

- 03 Sep 2024

- Applied Intelligence

References

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019

- arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

26.2K

•Posted Content

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 23 Oct 2019

- arXiv: Learning

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

...read moreread less

12.9K

•Journal Article•10.1007/S11023-020-09548-1

GPT-3: Its Nature, Scope, Limits, and Consequences

Luciano Floridi, +2 more

- 01 Nov 2020

- Minds and Machines

TL;DR: The nature of reversible and irreversible questions is discussed, that is, questions that may enable one to identify the nature of the source of their answers, and GPT-3, a third-generation, autoregressive language model that uses deep learning to produce human-like texts, is introduced.

...read moreread less

1.7K

Journal Article•10.1073/pnas.2305016120

ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks

Fabrizio Gilardi, +2 more

- 27 Mar 2023

- Proceedings of the National Academy of S...

TL;DR: This paper showed that ChatGPT outperforms crowd workers for several annotation tasks, including relevance, stance, topics, and frame detection, and demonstrated the potential of large language models to drastically increase the efficiency of text classification.

...read moreread less

633

...

Expand