使深度Q学习方法对时间离散化具有鲁棒性

Jan, 2019

使深度Q学习方法对时间离散化具有鲁棒性

Making Deep Q-learning methods robust to time discretization

Corentin Tallec, Léonard Blier, Yann Ollivier

TL;DR本研究证明了 Q-learning 不存在于连续时间中，指出时间离散化的敏感性是 Deep Reinforcement Learning 具有鲁棒性的关键因素，提出了一种无模型的强化学习算法，能够在不同的时间离散化下稳健地工作。

Abstract

Despite remarkable successes, deep reinforcement learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018). Overcoming such sensitivity is key to making DRL applicable to real world problems. I