BriefGPT.xyz
Mar, 2024
通过探测采样加速贪婪坐标梯度
Accelerating Greedy Coordinate Gradient via Probe Sampling
HTML
PDF
Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi...
TL;DR
为了减少GCG的时间成本,加快LLM安全研究的进展,本文介绍了一种名为“Probe sampling”的新算法,通过动态确定较小草图模型与目标模型预测的相似度,实现了多达5.6倍的加速,且在AdvBench上具有相等或更好的攻击成功率(ASR)。
Abstract
safety
of
large language models
(LLMs) has become a central issue given their rapid progress and wide applications.
greedy coordinate gradient
→