强化学习中的一次性回馈理论

May, 2021

On the Theory of Reinforcement Learning with Once-per-Episode Feedback

Niladri S. Chatterji, Aldo Pacchiano, Peter L. Bartlett, Michael I. Jordan

TL;DR我们研究了一种强化学习的理论，学习者只在每次学习完成后收到一次二元反馈。我们提供了一种具有统计和计算效率的算法，可以在这种更具挑战性的情况下实现学习。该算法可以在未知参数模型生成的轨迹标签上运行，并达到亚线性遗憾。

Abstract

We introduce a theory of reinforcement learning (RL) in which the learner receives feedback only once at the end of an episode. While this is an extreme test case for theory, it is also arguably more representative of r