We consider the problem of synthetically generating data that can closely
resemble human decisions made in the context of an interactive human-AI system
like a computer game. We propose a novel algorithm that can generate synthetic,
human-like, decision making data while starting from a very small set of
decision making data collected from humans. Our proposed algorithm integrates
the concept of reward shaping with an imitation learning algorithm to generate
the synthetic data. We have validated our synthetic data generation technique
by using the synthetically generated data as a surrogate for human interaction
data to solve three sequential decision making tasks of increasing complexity
within a small computer game-like setup. Different empirical and statistical
analyses of our results show that the synthetically generated data can
substitute the human data and perform the game-playing tasks almost
indistinguishably, with very low divergence, from a human performing the same
tasks.

本研究通过结合奖励塑造和模仿学习算法，提出了一种生成人工智能系统中类似于人类决策数据的新算法，证明使用这种合成的数据可以成功解决具有逐步增加难度的计算机游戏中的决策任务，并且与人类表现几乎无差异。

利用奖励塑形模仿学习方法合成生成类似人类数据以解决序列决策问题

Synthetically Generating Human-like Data for Sequential Decision Making  Tasks via Reward-Shaped Imitation Learning

With humans interacting with AI-based systems at an increasing rate, it is
necessary to ensure the artificial systems are acting in a manner which
reflects understanding of the human. In the case of humans and artificial AI
agents operating in the same environment, we note the significance of
comprehension and response to the actions or capabilities of a human from an
agent's perspective, as well as the possibility to delegate decisions either to
humans or to agents, depending on who is deemed more suitable at a certain
point in time. Such capabilities will ensure an improved responsiveness and
utility of the entire human-AI system. To that end, we investigate the use of
cognitively inspired models of behavior to predict the behavior of both human
and AI agents. The predicted behavior, and associated performance with respect
to a certain goal, is used to delegate control between humans and AI agents
through the use of an intermediary entity. As we demonstrate, this allows
overcoming potential shortcomings of either humans or agents in the pursuit of
a goal.

研究人工智能系统和人类在同一环境下应怎样理解和相应对方行为，通过认知模型预测双方行为并通过中介控制实现目标的达成。