Large language models (LLMs) can perform complex reasoning in few- and
zero-shot settings by generating intermediate chain of thought (CoT) reasoning
steps. Further, each reasoning step can rely on external tools to support
computation beyond the core LLM capabilities (e.g. search/running code). Prior
work on CoT prompting and tool use typically requires hand-crafting
task-specific demonstrations and carefully scripted interleaving of model
generations with tool use. We introduce Automatic Reasoning and Tool-use (ART),
a framework that uses frozen LLMs to automatically generate intermediate
reasoning steps as a program. Given a new task to solve, ART selects
demonstrations of multi-step reasoning and tool use from a task library. At
test time, ART seamlessly pauses generation whenever external tools are called,
and integrates their output before resuming generation. ART achieves a
substantial improvement over few-shot prompting and automatic CoT on unseen
tasks in the BigBench and MMLU benchmarks, and matches performance of
hand-crafted CoT prompts on a majority of these tasks. ART is also extensible,
and makes it easy for humans to improve performance by correcting errors in
task-specific programs or incorporating new tools, which we demonstrate by
drastically improving performance on select tasks with minimal human
intervention.

ART 框架使用冻结的 LLM 自动生成中间推理步骤作为程序，并能无缝集成生成和外部工具使用，使得在 BigBench 和 MMLU 基准测试中，通过自动 CoT 和 few-shot 提示，ART 实现了对未知任务的实质性改进。在选定的任务上，人们可以通过纠正特定的程序错误或整合新工具来改善 ART 的性能.

大型语言模型的自动多步推理和工具使用

ART: Automatic multi-step reasoning and tool-use for large language models

How to best explore in domains with sparse, delayed, and deceptive rewards is
an important open problem for reinforcement learning (RL). This paper considers
one such domain, the recently-proposed multi-agent benchmark of Pommerman. This
domain is very challenging for RL --- past work has shown that model-free RL
algorithms fail to achieve significant learning without artificially reducing
the environment's complexity. In this paper, we illuminate reasons behind this
failure by providing a thorough analysis on the hardness of random exploration
in Pommerman. While model-free random exploration is typically futile, we
develop a model-based automatic reasoning module that can be used for safer
exploration by pruning actions that will surely lead the agent to death. We
empirically demonstrate that this module can significantly improve learning.

本研究研究了如何在具有稀疏、延迟和欺骗性回报的域中进行最佳探索，通过分析 Pommerman 的难度，提出了一种基于模型的自动推理模块，可以用于更安全的探索，通过实验证明了该模块可以显著提高学习效果。