关于强化学习中的困难探索：Pommerman的案例研究

Jul, 2019

关于强化学习中的困难探索：Pommerman的案例研究

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

Chao Gao, Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

TL;DR本研究研究了如何在具有稀疏、延迟和欺骗性回报的域中进行最佳探索，通过分析Pommerman的难度，提出了一种基于模型的自动推理模块，可以用于更安全的探索，通过实验证明了该模块可以显著提高学习效果。

Abstract

How to best explore in domains with sparse, delayed, and deceptive rewards is an important open problem for reinforcement learning (RL). This paper considers one such domain, the recently-proposed multi-agent benchmark of →