In the real world, some of the most complex settings for learned agents
involve interaction with humans, who often exhibit suboptimal, unpredictable
behavior due to sophisticated biases. Agents that interact with people in such
settings end up influencing the actions that these people take. Our goal in
this work is to enable agents to leverage that influence to improve the human's
performance in collaborative tasks, as the task unfolds. Unlike prior work, we
do not assume online training with people (which tends to be too expensive and
unsafe), nor access to a high fidelity simulator of the environment. Our idea
is that by taking a variety of previously observed human-human interaction data
and labeling it with the task reward, offline reinforcement learning (RL) can
learn to combine components of behavior, and uncover actions that lead to more
desirable human actions. First, we show that offline RL can learn strategies to
influence and improve human behavior, despite those strategies not appearing in
the dataset, by utilizing components of diverse, suboptimal interactions. In
addition, we demonstrate that offline RL can learn influence that adapts with
humans, thus achieving long-term coordination with them even when their
behavior changes. We evaluate our proposed method with real people in the
Overcooked collaborative benchmark domain, and demonstrate successful
improvement in human performance.

本篇论文提出了一种离线强化学习方法，通过利用多样化的人机交互行为，在不需要在线训练或高保真模拟器的情况下，学习一些对人类行为产生积极影响的策略，从而提高人类在合作任务中的表现。该方法成功在 Overcooked 协作基准域中提高了人类的表现。