Recent advancements in large language models (LLMs) have demonstrated
remarkable abilities in handling a variety of natural language processing (NLP)
downstream tasks, even on mathematical tasks requiring multi-step reasoning. In
this report, we introduce the KwaiYiiMath which enhances the mathematical
reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT)
and Reinforced Learning from Human Feedback (RLHF), including on both English
and Chinese mathematical tasks. Meanwhile, we also constructed a small-scale
Chinese primary school mathematics test set (named KMath), consisting of 188
examples to evaluate the correctness of the problem-solving process generated
by the models. Empirical studies demonstrate that KwaiYiiMath can achieve
state-of-the-art (SOTA) performance on GSM8k, CMath, and KMath compared with
the similar size models, respectively.

KwaiYiiMath enhances mathematical reasoning abilities by applying Supervised Fine-Tuning and Reinforced Learning from Human Feedback on English and Chinese mathematical tasks, achieving state-of-the-art performance on GSM8k, CMath, and a small-scale Chinese primary school mathematics test set named KMath.