The open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. This includes both base models, which are pre-trained on extensive datasets without alignment, and aligned models, deliberately designed to align with ethical standards and human values. Contrary to the prevalent assumption that the inherent instruction-following limitations of base LLMs serve as a safeguard against misuse, our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions. To systematically assess these risks, we introduce a novel set of risk evaluation metrics. Empirical results reveal that the outputs from base LLMs can exhibit risk levels on par with those of models fine-tuned for malicious purposes. This vulnerability, requiring neither specialized knowledge nor training, can be manipulated by almost anyone, highlighting the substantial risk and the critical need for immediate attention to the base LLMs' security protocols.

大型语言模型的开源加速应用开发、创新和科学进步，但对于基础语言模型的固有指令限制是否可以防止滥用的普遍假设存在关键的疏忽。我们的研究通过精心设计的演示表明，基础语言模型能够有效地解释和执行恶意指令，此漏洞无需特殊知识或训练即可被操纵，强调了对基础语言模型安全协议的紧急关注的重大风险。

透过上下文学习揭示基础大型语言模型的滥用潜力