This paper proposes a novel deep reinforcement learning algorithm to perform
automatic analysis and detection of gameplay issues in complex 3D navigation
environments. The Curiosity-Conditioned Proximal Trajectories (CCPT) method
combines curiosity and imitation learning to train agents to methodically
explore in the proximity of known trajectories derived from expert
demonstrations. We show how CCPT can explore complex environments, discover
gameplay issues and design oversights in the process, and recognize and
highlight them directly to game designers. We further demonstrate the
effectiveness of the algorithm in a novel 3D navigation environment which
reflects the complexity of modern AAA video games. Our results show a higher
level of coverage and bug discovery than baselines methods, and it hence can
provide a valuable tool for game designers to identify issues in game design
automatically.

本文提出了一种新型的深度强化学习算法，通过结合好奇心和仿真学习以训练智能代理，从而在复杂的三维导航环境中自动分析和检测游戏问题，并直接向游戏设计师总结其在游戏设计中的问题和设计疏忽。通过在全新的反映现代 AAA 视频游戏复杂性的三维导航环境中进行实验，证明了 CCPT 比基线方法具有更高的覆盖率和 Bug 发现率，从而为游戏设计师提供了一种有价值的工具，可以自动识别游戏设计中的问题。

CCPT：基于好奇心条件近端轨迹的自动游戏测试和验证

CCPT: Automatic Gameplay Testing and Validation with Curiosity-Conditioned Proximal Trajectories

Deep reinforcement learning algorithms that estimate state and state-action
value functions have been shown to be effective in a variety of challenging
domains, including learning control strategies from raw image pixels. However,
algorithms that estimate state and state-action value functions typically
assume a fully observed state and must compensate for partial observations by
using finite length observation histories or recurrent networks. In this work,
we propose a new deep reinforcement learning algorithm based on counterfactual
regret minimization that iteratively updates an approximation to an
advantage-like function and is robust to partially observed state. We
demonstrate that this new algorithm can substantially outperform strong
baseline methods on several partially observed reinforcement learning tasks:
learning first-person 3D navigation in Doom and Minecraft, and acting in the
presence of partially observed objects in Doom and Pong.

本研究提出了一种新的基于反事实遗憾最小化的深度强化学习算法，能够有效处理部分观测状态，并在 Doom 和 Minecraft 中的学习第一人称的 3D 导航以及在 Doom 和 Pong 中进行部分观测对象的动作等强化学习任务中显著优于现有基线算法。