TL;DR介绍了 SABER 工具以及 human world records baseline, 通过 SABER 对当前最先进的 Rainbow 项目进行了评估,通过将 Implicit Quantile Networks 添加到 Rainbow 中提出了 Rainbow-IQN 算法用于提高性能。
Abstract
Consistent and reproducible evaluation of deep reinforcement learning (DRL) is not straightforward. In the arcade learning environment (ALE), small changes in environment parameters such as stochasticity or the m