TL;DR本研究证明了 Q-learning 不存在于连续时间中,指出时间离散化的敏感性是 Deep Reinforcement Learning 具有鲁棒性的关键因素,提出了一种无模型的强化学习算法,能够在不同的时间离散化下稳健地工作。
Abstract
Despite remarkable successes, deep reinforcement learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018). Overcoming such sensitivity is key to making DRL applicable to real world problems. I