Multi-step Jailbreaking Privacy Attacks on ChatGPT

doi:10.18653/v1/2023.findings-emnlp.272

Journal Article10.18653/v1/2023.findings-emnlp.272

Multi-step Jailbreaking Privacy Attacks on ChatGPT

Haoran Li, +6 more

- 01 Jan 2023

66

TL;DR: Multi-step jailbreaking privacy attacks on ChatGPT reveal potential privacy threats from application-integrated LLMs.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Table 3: Email address recovery results on 50 pairs of collected faculty information from worldwide universities. 5 prompts are evaluated on ChatGPT.

Table 1: Email address recovery results on sampled emails from the Enron Email Dataset.

Table 7: The ablation study on email content recovery. All results are measured in %. For each email, we combine the email addresses of its sender and receiver with a subset of {date, msg_id, subject} as queried indentifers.

Table 4: The New Bing’s DP results of partially identified extraction.

Table 5: The New Bing’s FE results on email addresses.

Citations

Journal Article•10.48550/arxiv.2312.02003

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

Yifan Yao, +5 more

- 04 Dec 2023

- arXiv.org

TL;DR: This work investigates how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs, and identifies areas that require further research efforts.

...read moreread less

200

Journal Article•10.1016/j.hcc.2024.100211

A survey on Large Language Model (LLM) security and privacy: The Good, The Bad, and The Ugly

Yifan Yao, +5 more

- 01 Mar 2024

- High-Confidence computing

TL;DR: A survey on Large Language Model (LLM) security and privacy explores the intersection of LLMs with security and privacy, investigating their positive and negative impacts and vulnerabilities.

...read moreread less

154

Journal Article•10.1109/ojcs.2023.3300321

A Survey on ChatGPT: AI–Generated Contents, Challenges, and Solutions

Yuntao Wang, +4 more

- 01 Jan 2023

- IEEE open journal of the Computer Societ...

TL;DR: AIGC is revolutionizing content creation and knowledge representation, but faces challenges in security, privacy, ethics, and legalities. The survey explores AIGC technologies, security and privacy threats, solutions, and future challenges.

...read moreread less

102

Journal Article•10.1038/s42256-023-00765-8

Defending ChatGPT against jailbreak attack via self-reminders

Yueqi Xie, +7 more

- 01 Dec 2023

- Nature Machine Intelligence

TL;DR: This work systematically documents the threats posed by jailbreak attacks, introduces and analyses a dataset for evaluating defensive interventions and proposes the psychologically inspired self-reminder technique that can efficiently and effectively mitigate against jailbreaks without further training.

...read moreread less

100

Journal Article•10.1109/jbhi.2023.3316750

Large AI Models in Health Informatics: Applications, Challenges, and the Future

Jianing Qiu, +13 more

- 01 Dec 2023

- IEEE Journal of Biomedical and Health In...

TL;DR: Large AI models are revolutionizing health informatics by enabling advancements in various sectors, including bioinformatics, medical diagnosis, medical imaging, and public health. Their potential for transformative impact is vast, yet challenges and ethical considerations must be addressed to harness their full potential.

...read moreread less

74

...

Expand

References

•Proceedings Article

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

- 25 Jul 2004

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

14.8K

•Posted Content•10.48550/arxiv.2203.02155

Training language models to follow instructions with human feedback

04 Mar 2022

TL;DR: The authors used reinforcement learning from human feedback to align language models with user intent on a wide range of tasks by fine-tuning with human feedback, and showed that the resulting models showed improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.

...read moreread less

2.5K

•Posted Content•10.48550/arxiv.2201.11903

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

27 Jan 2022

TL;DR: The authors explore how generating a chain of thought (a series of intermediate reasoning steps) significantly improves the ability of large language models to perform complex reasoning, and demonstrate that such reasoning abilities emerge naturally in sufficiently large language model via a simple method called chain-of-thought prompting, where a few chains of thought demonstrations are provided as exemplars in prompting.

...read moreread less

1.9K

•Journal Article•10.1145/3560815

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

16 Jan 2023

- ACM Computing Surveys

TL;DR: The authors surveys and organizes research works in a new paradigm in natural language processing, which they dub "prompt-based learning" and describe a unified set of mathematical notations that can cover a wide variety of existing work.

...read moreread less

1.7K

•Proceedings Article•10.18653/V1/2021.ACL-LONG.353

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Xiang Lisa Li, +1 more

- 01 Aug 2021

TL;DR: The authors propose prefix-tuning, a lightweight alternative to finetuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which they call the prefix.

...read moreread less

1.4K

...

Expand