Qi Pang, Shengyuan Hu, Wenting Zheng, Virginia Smith
TL;DR通过对现有 LLM 水印系统的攻击研究,提出了一套实用准则,用于生成和检测 LLM 水印,旨在解决水印系统在保留质量、鲁棒性和公共检测 API 等方面所面临的各种攻击问题。
Abstract
Advances in generative models have made it possible for AI-generated text,
code, and images to mirror human-generated content in many applications.
Watermarking, a technique that aims to embed information in the output of a
model to verify its source, is useful for mitigating misuse of