Large language models (LLMs) have exhibited great potential in mathematical reasoning. However, there remains a performance gap in this area between existing open-source models and closed-source models such as GPT-4. In this paper, we introduce MathGenie, a novel method for generating diverse and reliable math problems from a small-scale problem-solution dataset (denoted as seed data). We augment the ground-truth solutions of our seed data and train a back-translation model to translate the augmented solutions back into new questions. Subsequently, we generate code-integrated solutions for the new questions. To ensure the correctness of the code-integrated solutions, we employ rationale-based strategy for solution verification. Various pretrained models, ranging from 7B to 70B, are trained on the newly curated data to test the effectiveness of the proposed augmentation technique, resulting in a family of models known as MathGenieLM. These models consistently outperform previous open-source models across five representative mathematical reasoning datasets, achieving state-of-the-art performance. In particular, MathGenieLM-InternLM2 achieves an accuracy of 87.7% on GSM8K and 55.7% on MATH, securing the best overall score among open-source language models.

MathGenie是一种从小规模的问题解决数据集（称为种子数据）生成多样且可靠的数学问题的新方法，通过增加种子数据的真实解决方案，并训练一个回译模型将增加的解决方案翻译回新问题，从而产生与代码集成的问题解决方案，进而提供理性基础验证策略，该方法通过对新收集的数据训练从7B到70B范围的预训练模型，形成了MathGenieLM系列模型，这些模型在五个代表性数学推理数据集上始终优于以前的开放源语言模型，达到了最新的性能水平，尤其是MathGenieLM-InternLM2在GSM8K上达到了87.7％的准确率，在MATH上达到了55.7％的准确率，获得了开放源语言模型的最佳综合得分。

MathGenie: 利用问题逆向翻译生成合成数据以提升LLMs的数学推理能力