Large Language Models (LLMs) have been adopted and deployed worldwide for a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms prevalent in Western-centric datasets, and safety protocols frequently fail to extend to multilingual settings. In this work, we explore model merging in a diverse multi-task setting, combining safety and general-purpose tasks within a multilingual context. Each language introduces unique and varied learning challenges across tasks. We find that objective-based merging is more effective than mixing data, with improvements of up to 8% and 10% in general performance and safety respectively. We also find that language-based merging is highly effective -- by merging monolingually fine-tuned models, we achieve a 4% increase in general performance and 7% reduction in harm across all languages on top of the data mixtures method using the same available data. Overall, our comprehensive study of merging approaches provides a useful framework for building strong and safe multilingual models.

本研究解决了大型语言模型在多语言环境下安全使用的挑战，尤其是其在西方中心数据集中的偏见问题。通过结合安全性和通用任务的方法，研究发现目标导向的模型合并比混合数据更有效，性能提升达8%，安全性提升达10%。此外，跨语言模型合并也取得了显著成效，为构建强大且安全的多语言模型提供了有益框架。

混合数据还是合并模型？为多任务学习优化