Journal Article10.48550/arXiv.2302.06716
Machine Learning Model Attribution Challenge
2
TL;DR: The Machine Learning Model Attribution Challenge (MLMAC) as mentioned in this paper was the first attempt to identify the publicly available base models that underlie a set of anonymous, fine-tuned large language models using only textual output of the models.
read more
Abstract: We present the findings of the Machine Learning Model Attribution Challenge. Fine-tuned machine learning models may derive from other trained models without obvious attribution characteristics. In this challenge, participants identify the publicly-available base models that underlie a set of anonymous, fine-tuned large language models (LLMs) using only textual output of the models. Contestants aim to correctly attribute the most fine-tuned models, with ties broken in the favor of contestants whose solutions use fewer calls to the fine-tuned models' API. The most successful approaches were manual, as participants observed similarities between model outputs and developed attribution heuristics based on public documentation of the base models, though several teams also submitted automated, statistical solutions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
What can we learn from Data Leakage and Unlearning for Law?
19 Jul 2023
TL;DR: The authors showed that fine-tuned models not only leak their training data but also leak the pre-training data (and PII) memorized during the training phase, which can pose significant privacy and legal concerns for companies that use large language models to offer services.
Detection and Attribution of Models Trained on Generated Data
Ahmed Salem,Zheng Li,Shanqing Guo,Michael Backes,Yang Zhang +4 more
- 14 Apr 2024
TL;DR: This work takes the first step in the forensic analysis of models trained on GAN-generated data, and detects whether a model is trained on GAN-generated or real data, and attributes these models, trained on GAN-generated data, to their respective source GANs.
References
•Posted Content
Automatic Detection of Generated Text is Easiest when Humans are Fooled.
TL;DR: The authors performed a benchmarking and analysis of three sampling-based decoding strategies (top-k, nucleus sampling, and untruncated random sampling) and found that they are primarily optimized for fooling humans.
195
Generating Steganographic Text with LSTMs
Tina Tina Fang,Martin Jaggi,Katerina Argyraki +2 more
- 30 May 2017
TL;DR: In this paper, a steganographic system based on a Long Short-Term Memory (LSTM) neural network was proposed to enable two users to exchange encrypted messages without an adversary detecting that such an exchange is taking place.
Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations
TL;DR: The authors assesses how language models might change influence operations in the future and what steps can be taken to mitigate this threat, and provide a framework for stages of the language model-to-influence operations pipeline that mitigations could target (model construction, model access, content dissemination, and belief formation).
146
Artificial Interrogation for Attributing Language Models
TL;DR: The Machine Learn- ing Model Attribution Challenge (MLMAC) was organized by MITRE, Microsoft, Schmidt-Futures, Robust-Intelligence, Lincoln-Network, and Huggingface community as mentioned in this paper .
1