Many settings of interest involving humans and machines -- from virtual
personal assistants to autonomous vehicles -- can naturally be modelled as
principals (humans) delegating to agents (machines), which then interact with
each other on their principals' behalf. We refer to these multi-principal,
multi-agent scenarios as delegation games. In such games, there are two
important failure modes: problems of control (where an agent fails to act in
line their principal's preferences) and problems of cooperation (where the
agents fail to work well together). In this paper we formalise and analyse
these problems, further breaking them down into issues of alignment (do the
players have similar preferences?) and capabilities (how competent are the
players at satisfying those preferences?). We show -- theoretically and
empirically -- how these measures determine the principals' welfare, how they
can be estimated using limited observations, and thus how they might be used to
help us design more aligned and cooperative AI systems.

本文中，我们正式分析了代理人协作中的控制问题、合作问题、对齐问题和能力问题，以及这些问题对委托人利益的影响，并展示了如何通过有限观察来估计这些指标，并为设计更协调和合作的 AI 系统提供帮助。

委派博弈中的合作与控制

Cooperation and Control in Delegation Games

By formally defining the training processes of large language models (LLMs),
which usually encompasses pre-training, supervised fine-tuning, and
reinforcement learning with human feedback, within a single and unified machine
learning paradigm, we can glean pivotal insights for advancing LLM
technologies. This position paper delineates the parallels between the training
methods of LLMs and the strategies employed for the development of agents in
two-player games, as studied in game theory, reinforcement learning, and
multi-agent systems. We propose a re-conceptualization of LLM learning
processes in terms of agent learning in language-based games. This framework
unveils innovative perspectives on the successes and challenges in LLM
development, offering a fresh understanding of addressing alignment issues
among other strategic considerations. Furthermore, our two-player game approach
sheds light on novel data preparation and machine learning techniques for
training LLMs.

通过在单一统一的机器学习范式中正式定义大型语言模型（LLM）的训练过程，包括预训练、监督微调和强化学习与人类反馈，我们可以获得推进 LLM 技术的重要见解。本文勾勒出 LLM 训练方法与两人博弈中代理人发展所采用的战略之间的相似之处，从博弈论、强化学习和多智能体系统的角度研究。我们提出了一种用基于语言游戏中代理人学习的方式重新构思 LLM 学习过程的框架。这个框架揭示了 LLM 发展中成功和挑战的创新视角，为解决对齐问题等战略考虑提供了新的理解。此外，我们的两人博弈方法为 LLM 训练提供了新颖的数据准备和机器学习技术。