Training large language models (LLM) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM. Human evaluations on a complexity-balanced test bed show that instructions from Evol-Instruct are superior to human-created ones. By analyzing the human evaluation results of the high complexity part, we demonstrate that outputs from our WizardLM model are preferred to outputs from OpenAI ChatGPT. Even though WizardLM still lags behind ChatGPT in some aspects, our findings suggest that fine-tuning with AI-evolved instructions is a promising direction for enhancing large language models. Our codes and generated data are public at https://github.com/nlpxucan/WizardLM

本文介绍了一种使用大型语言模型（LLM）替代人类创建指令数据的方法，通过使用我们提出的 Evol-Instruct，从一个初始指令集开始，逐步将其重写为更复杂的指令，然后将生成的所有指令数据混合起来，以调整 LLaMA 模型，获得我们所称的 WizardLM 模型。人类评估证明，Evol-Instruct 出产的指令优于人工创建的指令，尤其是在高复杂度方面，WizardLM 模型的输出被认为比 OpenAI ChatGPT 的输出更好。尽管 WizardLM 在某些方面仍落后于 ChatGPT，但我们的研究表明，用人工智能生成的指令进行微调是提升大型语言模型的一个有前途的方向。

WizardLM：赋能大型语言模型遵循复杂指令