BriefGPT.xyz
Dec, 2023
禁止事实:Llama-2中竞争目标的调查
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
HTML
PDF
Tony T. Wang, Miles Wang, Kaivu Hariharan, Nir Shavit
TL;DR
研究表明,LLMs在处理有用与无害之间存在竞争压力,禁令事实任务下的Llama-2-chat模型研究揭示了解决这类冲突的方式,在研究中发现使用约35种不同组件可可靠实现完全抑制行为。
Abstract
llms
often face
competing pressures
(for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-chat models on the
→