Large language models (LLMs) have achieved success in acting as agents, which
interact with environments through tools like search engines. However, LLMs are
not optimized specifically for tool use during training or alignment, limiting
their effectiveness as agents. To resolve this problem, previous work has
collected interaction trajectories between GPT-4 and environments, and
fine-tuned smaller models with them. As part of this, the standard approach has
been to simply discard trajectories that do not finish the task successfully,
which, on the one hand, leads to a significant waste of data and resources, and
on the other hand, has the potential to limit the possible optimization paths
during fine-tuning. In this paper, we contend that large language models can
learn from failures through appropriate data cleaning and fine-tuning
strategies. We conduct experiments on mathematical reasoning, multi-hop
question answering, and strategic question answering tasks. Experimental
results demonstrate that compared to solely using positive examples,
incorporating negative examples enhances model performance by a large margin.

大语言模型在与环境进行交互时存在工具使用方面的优化限制，然而通过适当的数据清理和微调策略，大语言模型可以从失败中学习并显著提高性能。