坏策略密度：一种强化学习难度度量

Oct, 2021

坏策略密度：一种强化学习难度度量

Bad-Policy Density: A Measure of Reinforcement Learning Hardness

David Abel, Cameron Allen, Dilip Arumugam, D. Ellis Hershkowitz, Michael L. Littman...

TL;DR本文提出了衡量强化学习难度的一种指标：坏策略密度，它衡量了在某个特定环境下，固定策略空间中低于某个阈值的策略的比例。同时，文章还证明了该指标有许多学习难度指标应该具备的性质。然而，完全计算该指标是 NP-hard 的，但是该指标也存在多项式时间的近似算法。最后，文章总结了该指标的潜在研究方向和用途。

Abstract

reinforcement learning is hard in general. Yet, in many specific environments, learning is easy. What makes learning easy in one environment, but difficult in another? We address this question by proposing a simple measure of reinforcement-learning hardness called the →