Automatic Detection of Generated Text is Easiest when Humans are Fooled.

Open AccessPosted Content

Automatic Detection of Generated Text is Easiest when Humans are Fooled.

- 02 Nov 2019

195

TL;DR: The authors performed a benchmarking and analysis of three sampling-based decoding strategies (top-k, nucleus sampling, and untruncated random sampling) and found that they are primarily optimized for fooling humans.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arxiv.2310.10669

Unbiased Watermark for Large Language Models

Zhengmian Hu, +5 more

- 22 Sep 2023

- arXiv.org

TL;DR: This research demonstrates that it is possible to integrate watermarks without affecting the output probability distribution with appropriate implementation, and refers to this type of watermark as an unbiased watermark, suggesting that unbiased watermarks can serve as an effective means of tracking and attributing model outputs without sacrificing output quality.

...read moreread less

Journal Article•10.18653/v1/2025.findings-emnlp.609

How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study

Matthieu Dubois, +2 more

- 01 Jan 2025

TL;DR: This study examines how sampling-based decoding affects the detectability of machine-written texts, revealing that minor adjustments to decoding parameters can severely impair detector accuracy, exposing blind spots in current detection methods.

...read moreread less

Journal Article•10.48550/arxiv.2409.16914

Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness

S.L. Ma, +1 more

- 25 Sep 2024

TL;DR: This paper proposes TOCSIN, a zero-shot detector that leverages token cohesiveness to identify LLM-generated text, outperforming state-of-the-art detectors on various datasets and source models, with a simple and efficient calculation method.

...read moreread less

Journal Article•10.48550/arXiv.2305.17359

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Xianjun Yang, +4 more

- 27 May 2023

- arXiv.org

TL;DR: Yang et al. as discussed by the authors proposed a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT), which truncates a text and then uses only the preceding portion as input to the LLMs to regenerate the new remaining parts.

...read moreread less

Journal Article•10.48550/arXiv.2305.07969

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

Yutian Chen, +4 more

- 13 May 2023

- arXiv.org

TL;DR: In this paper , the authors presented a novel approach for detecting ChatGPT-generated vs. human-written text using language models, which achieved remarkable results with an accuracy of over 97% on the test dataset, as evaluated through various metrics.

...read moreread less

...

Expand

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

Proceedings Article•10.18653/V1/N19-1423

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

24.6K

•Proceedings Article•10.18653/V1/P16-1162

Neural Machine Translation of Rare Words with Subword Units

Rico Sennrich, +2 more

- 12 Aug 2016

TL;DR: This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

...read moreread less

9.3K

•Posted Content

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Yonghui Wu, +30 more

- 26 Sep 2016

- arXiv: Computation and Language

TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.

...read moreread less

7.9K

•Journal Article•10.1126/SCIENCE.AAP9559

The spread of true and false news online

Soroush Vosoughi, +2 more

- 09 Mar 2018

- Science

TL;DR: A large-scale analysis of tweets reveals that false rumors spread further and faster than the truth, and false news was more novel than true news, which suggests that people were more likely to share novel information.

...read moreread less

7.2K