Advances in deep reinforcement learning have allowed autonomous agents to
perform well on Atari games, often outperforming humans, using only raw pixels
to make their decisions. However, most of these games take place in 2D
environments that are fully observable to the agent. In this paper, we present
the first architecture to tackle 3D environments in first-person shooter games,
that involve partially observable states. Typically, deep reinforcement
learning methods only utilize visual input for training. We present a method to
augment these models to exploit game feature information such as the presence
of enemies or items, during the training phase. Our model is trained to
simultaneously learn these features along with minimizing a Q-learning
objective, which is shown to dramatically improve the training speed and
performance of our agent. Our architecture is also modularized to allow
different models to be independently trained for different phases of the game.
We show that the proposed architecture substantially outperforms built-in AI
agents of the game as well as humans in deathmatch scenarios.

本研究提出了一种融合游戏特征信息的深度强化学习神经网络模型，其能够在处理 3D FPS 游戏的部分可观察状态下显著提高训练效率和性能。

使用深度强化学习玩射击游戏

Playing FPS Games with Deep Reinforcement Learning

Successful applications of reinforcement learning in real-world problems
often require dealing with partially observable states. It is in general very
challenging to construct and infer hidden states as they often depend on the
agent's entire interaction history and may require substantial domain
knowledge. In this work, we investigate a deep-learning approach to learning
the representation of states in partially observable tasks, with minimal prior
knowledge of the domain. In particular, we propose a new family of hybrid
models that combines the strength of both supervised learning (SL) and
reinforcement learning (RL), trained in a joint fashion: The SL component can
be a recurrent neural networks (RNN) or its long short-term memory (LSTM)
version, which is equipped with the desired property of being able to capture
long-term dependency on history, thus providing an effective way of learning
the representation of hidden states. The RL component is a deep Q-network (DQN)
that learns to optimize the control for maximizing long-term rewards. Extensive
experiments in a direct mailing campaign problem demonstrate the effectiveness
and advantages of the proposed approach, which performs the best among a set of
previous state-of-the-art methods.

本文研究了一种深度学习方法，将强化学习和监督学习结合，通过长短时记忆网络对隐藏状态的表示进行学习，在部分可观测任务中表现出了很好的性能。