BriefGPT.xyz
May, 2025
通过自信息重写攻击揭示文本水印的脆弱性
Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks
HTML
PDF
Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal
TL;DR
本研究针对现有文本水印算法的脆弱性,提出了一种新的自信息重写攻击方法,揭示了高熵标记的设计缺陷。研究表明,该攻击几乎可以对七种现有水印方法实现100%的成功率,强调了现有水印技术在安全性方面的急迫改进需求。
Abstract
Text Watermarking
aims to subtly embed statistical signals into text by controlling the Large Language Model (LLM)'s sampling process, enabling watermark detectors to verify that the output was generated by the specified model. The
→