BriefGPT.xyz
Jun, 2024
链路审查:针对大型语言模型的后门攻击检测
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
HTML
PDF
Xi Li, Yusen Zhang, Renze Lou, Chen Wu, Jiaqi Wang
TL;DR
回溯攻击对大型语言模型(LLM)构成重大威胁,本文提出了一种名为Chain-of-Scrutiny(CoS)的解决方案,通过为输入提供详细的推理步骤并审查推理过程以确保与最终答案的一致性,以防止回溯攻击,验证了CoS的有效性。
Abstract
backdoor attacks
present significant threats to
large language models
(LLMs), particularly with the rise of third-party services that offer API integration and prompt engineering. Untrustworthy third parties can
→