Large Language Models (LLMs) demonstrate strong capability across multiple tasks, including machine translation. Our study focuses on evaluating Llama2's machine translation capabilities and exploring how translation depends on languages in its training data. Our experiments show that the 7B Llama2 model yields above 10 BLEU score for all languages it has seen, but not always for languages it has not seen. Most gains for those unseen languages are observed the most with the model scale compared to using chat versions or adding shot count. Furthermore, our linguistic distance analysis reveals that syntactic similarity is not always the primary linguistic factor in determining translation quality. Interestingly, we discovered that under specific circumstances, some languages, despite having significantly less training data than English, exhibit strong correlations comparable to English. Our discoveries here give new perspectives for the current landscape of LLMs, raising the possibility that LLMs centered around languages other than English may offer a more effective foundation for a multilingual model.

基于大型语言模型（LLMs），本研究评估了Llama2在机器翻译方面的能力，并探讨了对训练数据中语言的依赖性。实验证明，7B规模的Llama2模型对其已见过的所有语言都具有10 BLEU分数以上，但对未见过的语言不一定如此。我们的语言距离分析表明，句法相似性并非决定翻译质量的主要语言因素。有趣的是，我们发现在特定条件下，某些语言虽然训练数据明显少于英语，却与英语具有可比较的强相关性。本研究结果为目前LLMs的发展提供了新的视角，提出了以非英语语言为中心构建多语言模型的可能性。

LLM翻译中的重要语言特征和语言