Bridging the significant gap between large language model's English and non-English performance presents a great challenge. While some previous studies attempt to mitigate this gap with translated training data, the recently proposed question alignment approach leverages the model's English expertise to improve multilingual performance with minimum usage of expensive, error-prone translation. In this paper, we explore how broadly this method can be applied by examining its effects in reasoning with executable code and reasoning with common sense. We also explore how to apply this approach efficiently to extremely large language models using proxy-tuning. Experiment results on multilingual reasoning benchmarks mGSM, mSVAMP and xCSQA demonstrate that the question alignment approach can be used to boost multilingual performance across diverse reasoning scenarios, model families, and sizes. For instance, when applied to the LLaMA2 models, our method brings an average accuracy improvements of 12.2% on mGSM even with the 70B model. To understand the mechanism of its success, we analyze representation space, chain-of-thought and translation data scales, which reveals how question translation training strengthens language alignment within LLMs and shapes their working patterns.

在这篇论文中，我们研究了如何利用问题对齐方法提高大型语言模型在非英语性能上的应用，通过对可执行代码推理和常识推理的影响进行探索，并通过代理调整的方式实现对极大型语言模型的高效应用。多语言推理基准测试结果显示，问题对齐方法能够在不同推理场景、模型系列和大小上提升多语言性能。与LLaMA2模型相比，我们的方法平均提高了mGSM的准确率12.2%，即使在70B模型上也是如此。通过分析表示空间、思维链和翻译数据规模，我们还揭示了问题翻译训练如何增强LLMs内部的语言对齐，并塑造它们的工作模式。

多语言推理中问题翻译训练的力量：扩大范围与深化见解