Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets renders it less faithful. Conversely, instruction-tuned Vicuna-7B shows degraded performance at following instructions when further optimized on tasks that require contextual grounding. One common remedy is multi-task learning (MTL) with data mixing, yet it remains far from achieving a synergic outcome. We propose a simple yet effective method that relies on Rejection Sampling for Continued Self-instruction Tuning (ReSet), which significantly outperforms vanilla MTL. Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results. Our findings offer a better understanding of objective discrepancies in alignment training of LMs.

本研究解决了现代语言模型在遵循人类指令和保持忠实性之间的权衡问题。我们提出了一种新颖的方法——基于拒绝采样的持续自我指导微调（ReSet），其有效性显著超越传统的多任务学习方法，甚至在数据量较少的情况下仍然取得了更好的结果。研究结果有助于深入理解语言模型对齐训练中目标差异的问题。

在链中舞蹈：在语言模型中协调指令遵循与忠实性