Exploration has been one of the greatest challenges in reinforcement learning
(RL), which is a large obstacle in the application of RL to robotics. Even with
state-of-the-art RL algorithms, building a well-learned agent often requires
too many trials, mainly due to the difficulty of matching its actions with
rewards in the distant future. A remedy for this is to train an agent with
real-time feedback from a human observer who immediately gives rewards for some
actions. This study tackles a series of challenges for introducing such a
human-in-the-loop RL scheme. The first contribution of this work is our
experiments with a precisely modeled human observer: binary, delay,
stochasticity, unsustainability, and natural reaction. We also propose an RL
method called DQN-TAMER, which efficiently uses both human feedback and distant
rewards. We find that DQN-TAMER agents outperform their baselines in Maze and
Taxi simulated environments. Furthermore, we demonstrate a real-world
human-in-the-loop RL application where a camera automatically recognizes a
user's facial expressions as feedback to the agent while the agent explores a
maze.

本研究使用即时反馈，通过引入人与环境的互动，提高了强化学习在机器人学中的应用性，并提出了一种 DQN-TAMER 算法，在模拟和现实环境中都有优越表现。