禁止事实：Llama-2中竞争目标的调查

Dec, 2023

禁止事实：Llama-2中竞争目标的调查

Forbidden Facts: An Investigation of Competing Objectives in Llama-2

Tony T. Wang, Miles Wang, Kaivu Hariharan, Nir Shavit

TL;DR研究表明，LLMs在处理有用与无害之间存在竞争压力，禁令事实任务下的Llama-2-chat模型研究揭示了解决这类冲突的方式，在研究中发现使用约35种不同组件可可靠实现完全抑制行为。

Abstract

llms often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-chat models on the →