BriefGPT.xyz
Oct, 2024
用于大型语言模型的零-shot跨语言迁移的层交换
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
HTML
PDF
Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui Hou, Nayan Singhal...
TL;DR
本研究解决了在目标非英语任务中缺乏特定任务数据的挑战,提出了一种通过组合语言和数学能力来促进跨语言迁移的新方法。研究表明,采用层交换技术的合并模型在数学基准测试中比传统方法提高了10%的性能,展示了在不同语言间成功转移推理能力的潜力。
Abstract
Model Merging
, such as model souping, is the practice of combining different models with the same architecture together without further training. In this work, we present a
Model Merging
methodology that addresse
→