Recent advancements in Large Language Models (LLMs) have expanded the
horizons of natural language understanding and generation. Notably, the output
control and alignment with the input of LLMs can be refined through instruction
tuning. However, as highlighted in several studies, low-quality data in the
training set are usually detrimental to instruction tuning, resulting in
inconsistent or even misleading LLM outputs. We propose a novel method, termed
"reflection-tuning," which addresses the problem by self-improvement and
judging capabilities of LLMs. This approach utilizes an oracle LLM to recycle
the original training data by introspecting and enhancing the quality of
instructions and responses in the data. Extensive experiments on widely used
evaluation benchmarks show that LLMs trained with our recycled data outperform
those trained with existing datasets in various benchmarks.

通过反思调整指令的判断能力，本研究提出了一种名为 “reflection-tuning” 的新方法，利用 Oracle LLM 自省和提高数据中指令和回应的质量来优化大型语言模型（LLMs），在广泛使用的评估基准上的实验证明，我们用反思调整后的数据训练的 LLMs 在各种测评中表现优于使用现有数据集训练的模型。