探索大型语言模型鲁棒性的规模趋势

Jul, 2024

探索大型语言模型鲁棒性的规模趋势

Exploring Scaling Trends in LLM Robustness

Nikolaus Howe, Michał Zajac, Ian McKenzie, Oskar Hollinsworth, Tom Tseng...

TL;DR本文研究了大型语言模型在规模扩展下的鲁棒性，填补了现有对于鲁棒性与模型规模之间关系的研究空白。文章提出了通过对抗性训练来提升模型的鲁棒性这一新方法，并发现更大的模型在这种训练下能显著提升其反应能力，而在缺乏明确防御机制的情况下，则几乎没有规模的益处。这一发现对理解和改进语言模型的安全性具有重要意义。

Abstract

Language model capabilities predictably improve from Scaling a model's size and training data. Motivated by this, increasingly Large Language Models have been trained, yielding an array of impressive capabilities