TL;DR介绍了一种基于关键度量的步长算法,利用人工提供或从环境中自学习的关键性函数,测试表明其优于深度 Q 学习和 Monte Carlo 等流行学习算法,适用于 Atari Pong、Road-Tree 和射击游戏等多个领域。
Abstract
In the context of reinforcement learning we introduce the concept of criticality of a state, which indicates the extent to which the choice of action in that particular state influences the expected return. That