Open AccessPosted Content
HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification
TL;DR: It is shown that the performance of an existing state-of-the-art semantic-matching model degrades significantly on this dataset as the number of reasoning hops increases, hence demonstrating the necessity of many-hop reasoning to achieve strong results.
read more
Abstract: We introduce HoVer (HOppy VERification), a dataset for many-hop evidence extraction and fact verification. It challenges models to extract facts from several Wikipedia articles that are relevant to a claim and classify whether the claim is Supported or Not-Supported by the facts. In HoVer, the claims require evidence to be extracted from as many as four English Wikipedia articles and embody reasoning graphs of diverse shapes. Moreover, most of the 3/4-hop claims are written in multiple sentences, which adds to the complexity of understanding long-range dependency relations such as coreference. We show that the performance of an existing state-of-the-art semantic-matching model degrades significantly on our dataset as the number of reasoning hops increases, hence demonstrating the necessity of many-hop reasoning to achieve strong results. We hope that the introduction of this challenging dataset and the accompanying evaluation task will encourage research in many-hop fact retrieval and information verification. We make the HoVer dataset publicly available at this https URL
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
TL;DR: A novel dataset, MultiHop-RAG, which consists of a knowledge base, a large collection of multi-hop queries, their ground-truth answers, and the associated supporting evidence, and it is hoped MultiHop-RAG will be a valuable resource for the community in developing effective RAG systems, thereby facilitating greater adoption of LLMs in practice.
29
How Robust are Fact Checking Systems on Colloquial Claims
Byeongchang Kim,Hyunwoo Kim,Seokhee Hong,Gunhee Kim +3 more
- 01 Jun 2021
TL;DR: It is found that existing fact checking systems that perform well on claims in formal style significantly degenerate on colloquial claims with the same semantics, and it is shown that document retrieval is the weakest spot in the system even vulnerable to filler words, such as “yeah” and “you know”.
Fact Checking with Insufficient Evidence
TL;DR: This work is the first to study what information FC models consider sufficient for FC by introducing a novel task and advancing it with three main contributions, finding that models are least successful in detecting missing evidence when adverbial modifiers are omitted.
Sustainable Development of Information Dissemination: A Review of Current Fake News Detection Research and Practice
TL;DR: The survey includes fake news datasets, research methods for fake news detection, general technical models and multimodal related technical methods and proposes an explainable human-machine-theory triangle communication system, aiming at establishing a people-centered, sustainable human–machine interaction information dissemination system.
21
Mitigating Covertly Unsafe Text within Natural Language Systems
Alex Mei,Anisha Kabir,Sharon Levy,Melanie Subbiah,Emily Allaway,John Judge,Desmond Upton Patton,Bruce Bimber,Kathleen R. McKeown,William Yang Wang +9 more
- 17 Oct 2022
TL;DR: This work distinguishes types of text that can lead to physical harm and establishes one particularly underexplored category: covertly unsafe text, which is further broken down with respect to the system’s information and discusses solutions to mitigate the generation of text in each of these subcategories.
11
References
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova +3 more
- 11 Oct 2018
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
24.6K
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams,Nikita Nangia,Samuel R. Bowman +2 more
- 01 Jun 2018
TL;DR: The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.
5.4K
A large annotated corpus for learning natural language inference
Samuel R. Bowman,Gabor Angeli,Christopher Potts,Christopher D. Manning +3 more
- 21 Aug 2015
TL;DR: The Stanford Natural Language Inference (SNLI) corpus as discussed by the authors is a large-scale collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning.
•Posted Content
Reading Wikipedia to Answer Open-Domain Questions
TL;DR: In this paper, a multi-layer recurrent neural network model was proposed to detect answer spans in Wikipedia paragraphs, which combines a search component based on bigram hashing and TF-IDF matching.