Open AccessPosted Content
HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification
TL;DR: It is shown that the performance of an existing state-of-the-art semantic-matching model degrades significantly on this dataset as the number of reasoning hops increases, hence demonstrating the necessity of many-hop reasoning to achieve strong results.
read more
Abstract: We introduce HoVer (HOppy VERification), a dataset for many-hop evidence extraction and fact verification. It challenges models to extract facts from several Wikipedia articles that are relevant to a claim and classify whether the claim is Supported or Not-Supported by the facts. In HoVer, the claims require evidence to be extracted from as many as four English Wikipedia articles and embody reasoning graphs of diverse shapes. Moreover, most of the 3/4-hop claims are written in multiple sentences, which adds to the complexity of understanding long-range dependency relations such as coreference. We show that the performance of an existing state-of-the-art semantic-matching model degrades significantly on our dataset as the number of reasoning hops increases, hence demonstrating the necessity of many-hop reasoning to achieve strong results. We hope that the introduction of this challenging dataset and the accompanying evaluation task will encourage research in many-hop fact retrieval and information verification. We make the HoVer dataset publicly available at this https URL
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
The Web as a Knowledge-base for Answering Complex Questions
Alon Talmor,Jonathan Berant +1 more
TL;DR: This paper proposes to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers, and empirically demonstrates that question decomposition improves performance from 20.8 precision@1 to 27.5 precision @1 on this new dataset.
Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence.
Tal Schuster,Adam Fisch,Regina Barzilay +2 more
- 01 Jun 2021
TL;DR: VitaminC is presented, a benchmark infused with challenging cases that require fact verification models to discern and adjust to slight factual changes, and it is shown that training using this design increases robustness—improving accuracy by 10% on adversarial fact verification and 6% on adversary natural language inference (NLI).
237
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Cunxiang Wang,Xiaoze Liu,Yuanhao Yue,Xiangru Tang,Tianhang Zhang,Jiayang Cheng,Yunzhi Yao,Wenyang Gao,Xuming Hu,Zehan Qi,Yidong Wang,Linyi Yang,Jindong Wang,Xing Xie,Zheng Zhang,Yue Zhang +15 more
TL;DR: This survey offers a structured guide for researchers aiming to fortify the factual reliability of LLMs, focusing two primary LLM configurations standalone LLMs and Retrieval-Augmented LLMs that utilizes external data.
The State of Human-centered NLP Technology for Fact-checking
TL;DR: This paper reviewed the capabilities and limitations of the current NLP technologies for fact-checking and further charted the design space for how these technologies can be harnessed and refined in order to better meet the needs of human fact-checkers.
Complex Claim Verification with Evidence Retrieved in the Wild
TL;DR: In this paper , the authors present the first fully automated pipeline to check real-world claims by retrieving raw evidence from the web, restricting their retriever to only search documents available prior to the claim's making, modeling the realistic scenario where an emerging claim needs to be checked.
References
Uncertain Natural Language Inference
Tongfei Chen,Zhengping Jiang,Adam Poliak,Keisuke Sakaguchi,Benjamin Van Durme +4 more
- 01 Jul 2020
TL;DR: The feasibility of collecting annotations for UNLI is demonstrated by relabeling a portion of the SNLI dataset under a probabilistic scale, where items even with the same categorical label differ in how likely people judge them to be true given a premise.
Local Textual Inference: Can it be Defined or Circumscribed?
Annie Zaenen,Lauri Karttunen,Richard S. Crouch +2 more
- 30 Jun 2005
TL;DR: It is argued that local textual inferences come in three well-defined varieties (entailments, conventional implicatures/presuppositions, and conversationalimplicatures) and one less clearly defined one, generally available world knowledge.
72
DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking
Christopher Hidey,Tuhin Chakrabarty,Tariq Alhindi,Siddharth Varia,Kriste Krstovski,Mona Diab,Smaranda Muresan +6 more
- 01 Jul 2020
TL;DR: This work shows that current systems for FEVER are vulnerable to three categories of realistic challenges for fact-checking – multiple propositions, temporal reasoning, and ambiguity and lexical variation – and introduces a resource with these types of claims, and presents a system designed to be resilient to these “attacks”.
Multi-hop inference for sentence-level textgraphs: How challenging is meaningfully combining information for science question answering?
Peter Jansen
- 29 May 2018
TL;DR: The authors empirically characterize the difficulty of building or traversing a graph of sentences connected by lexical overlap, by evaluating chance sentence aggregation quality through 9,784 manually-annotated judgements across knowledge graphs built from three free-text corpora (including study guides and Simple Wikipedia).
•Proceedings Article
Understanding the Impact of Text Highlighting in Crowdsourcing Tasks
Jorge Ramírez,Marcos Baez,Fabio Casati,Boualem Benatallah +3 more
- 28 Oct 2019
TL;DR: In this article, the authors investigate if and under what conditions highlighting selected parts of the text can (or cannot) improve classification cost and/or accuracy, and in general how it affects the process and outcome of the human intelligence tasks.