Large language models (LLMs) are powerful dialogue agents, but specializing
them towards fulfilling a specific function can be challenging. Instructing
tuning, i.e. tuning models on instruction and sample responses generated by
humans (Ouyang et al., 2022), has proven as an effective method to do so, yet
requires a number of data samples that a) might not be available or b) costly
to generate. Furthermore, this cost increases when the goal is to make the LLM
follow a specific workflow within a dialogue instead of single instructions.
Inspired by the self-play technique in reinforcement learning and the use of
LLMs to simulate human agents, we propose a more effective method for data
collection through LLMs engaging in a conversation in various roles. This
approach generates a training data via "self-talk" of LLMs that can be refined
and utilized for supervised fine-tuning. We introduce an automated way to
measure the (partial) success of a dialogue. This metric is used to filter the
generated conversational data that is fed back in LLM for training. Based on
our automated and human evaluations of conversation quality, we demonstrate
that such self-talk data improves results. In addition, we examine the various
characteristics that showcase the quality of generated dialogues and how they
can be connected to their potential utility as training data.

通过使用大型语言模型进行自我对话的方法可以改进对话质量并生成用于训练的自我对话数据集。

通过自我对话增强基于 LLM 的任务导向对话系统

Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk

Transformer based language models (LMs) demonstrate increasing performance
with scale across a wide variety of tasks. Scale alone however cannot enable
models to solve tasks that require access to ephemeral, changing, or private
data that was unavailable at training time. Many useful tasks may also benefit
from LMs being able to access APIs that read or modify state. In this work, we
present Tool Augmented Language Models (TALM), combining a text-only approach
to augment language models with non-differentiable tools, and an iterative
"self-play" technique to bootstrap performance starting from few tool
demonstrations. TALM exhibits strong performance on both a knowledge-heavy QA
task and a reasoning oriented math task with simple tools. At a given model
scale, TALM significantly outperforms non-augmented LMs. We further demonstrate
that TALM successfully performs out-of-distribution inferences on both QA and
math tasks, where non-augmented LMs fail. Our results suggest that Tool
Augmented Language Models are a promising direction to enrich LMs'
capabilities, with less dependence on scale.

本文介绍了一种基于迭代 “自我对弈” 技术的文本增强语言模型方法，使用不可微分的工具扩充语言模型功能，成功在知识丰富型问答和简单工具所需的数学任务中具有很强的表现力，优于非增强型语言模型，在 QA 和数学任务的超越分布推理方面更是取得了成功，证明了工具增强型语言模型是一种非常有前景的方法，可以使语言模型在不依赖于模型（尺度）的基础上具备更多的能力。