Right, No Matter Why: AI Fact-checking and AI Authority in Health-related Inquiry Settings

doi:10.48550/arxiv.2310.14358

Journal Article10.48550/arxiv.2310.14358

Right, No Matter Why: AI Fact-checking and AI Authority in Health-related Inquiry Settings

Elena Sergeeva, +4 more

- 22 Oct 2023

- arXiv.org

- Vol. abs/2310.14358

TL;DR: An exploratory evaluation of users' AI-advice accepting behavior when evaluating the truthfulness of a health-related statement in different advice quality settings finds that even feedback that is confined to just stating that "the AI thinks that the statement is false/true" results in more than half of people moving their statement veracity assessment towards the AI suggestion.

Abstract: Previous research on expert advice-taking shows that humans exhibit two contradictory behaviors: on the one hand, people tend to overvalue their own opinions undervaluing the expert opinion, and on the other, people often defer to other people's advice even if the advice itself is rather obviously wrong. In our study, we conduct an exploratory evaluation of users' AI-advice accepting behavior when evaluating the truthfulness of a health-related statement in different"advice quality"settings. We find that even feedback that is confined to just stating that"the AI thinks that the statement is false/true"results in more than half of people moving their statement veracity assessment towards the AI suggestion. The different types of advice given influence the acceptance rates, but the sheer effect of getting a suggestion is often bigger than the suggestion-type effect.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Fig. 4. Distribution of the statement veracity ratings before(blue) and after(orange) seeing the AI advice in different advice type conditions. The blue and orange lines represent the median ratings given by the users before and after seeing the systems suggestion

Fig. 1. A fact-checking app set-up: Top right: The User is asked to rate a health-related statement as correct or incorrect. Top left, bottom left, bottom right: The System provides a statement assessment (one of the three types, depending on a random group assignment) to the user and the user is asked to rate the statement’s veracity again.

Fig. 5. Distribution of the statement veracity ratings before(blue) and after(orange) seeing the AI advice in different advice type conditions. The blue and orange lines represent the median ratings given by the users before and after seeing the systems suggestion

Table 4. Self-reported Trust and the delta of the magnitude opinion change correlation metric. The values indicating at least a weak correlation (0.25 absolute value or higher are highlighted in green. The better the explanation is, the worse the professed Trust in the system correlates with the actual opinion change on the topic

Table 5. The Averaged Statement Veracity assessment as compared to the Ground Truth assessment for “False Feedback” questions before (B) and after (A) reading AI provided feedback. “Y” indicates the majority being correct about the veracity of the statement, “N” indicates the majority being incorrect: plausible incorrect feedback results in the shift to incorrect in all 3 questions where the majority opinion was right before the AI intervention.

Table 8. Descriptive Statistics for Categorical Data: Participant Sample

References

•Proceedings Article

Language Models are Few-Shot Learners

Tom B. Brown, +30 more

- 28 May 2020

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less

25.2K

•Journal Article•10.1177/2158244014522633

Qualitative Content Analysis: A Focus on Trustworthiness

Satu Elo, +7 more

- 11 Feb 2014

- SAGE Open

TL;DR: In this article, the authors examined the trustworthiness of content analysis in nursing science studies and found that content analysis is commonly used for analyzing qualitative data, however, few articles have examined the use of QCA in nursing studies.

...read moreread less

8.2K

Proceedings Article•10.48550/arXiv.2203.02155

Training language models to follow instructions with human feedback

Long Ouyang, +19 more

- 04 Mar 2022

TL;DR: The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.

...read moreread less

7.1K

Journal Article•10.48550/arXiv.2303.12712

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Sébastien Bubeck, +13 more

- 22 Mar 2023

- arXiv.org

TL;DR: In this paper , an early version of GPT-4 was investigated, when it was still in active development by OpenAI, and it was shown that it can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without requiring any special prompting.

...read moreread less

1.8K

Journal Article•10.48550/arXiv.2206.07682

Emergent Abilities of Large Language Models

Jason Loh Seong Wei, +15 more

- 15 Jun 2022

TL;DR: The authors discusses an unpredictable phenomenon that is referred to as emergent abilities of large language models, i.e., an ability to be emergent if it is not present in smaller models but is present in larger models.

...read moreread less

1.4K

...

Expand