BriefGPT.xyz
May, 2021
强化学习中的一次性回馈理论
On the Theory of Reinforcement Learning with Once-per-Episode Feedback
HTML
PDF
Niladri S. Chatterji, Aldo Pacchiano, Peter L. Bartlett, Michael I. Jordan
TL;DR
我们研究了一种强化学习的理论,学习者只在每次学习完成后收到一次二元反馈。我们提供了一种具有统计和计算效率的算法,可以在这种更具挑战性的情况下实现学习。该算法可以在未知参数模型生成的轨迹标签上运行,并达到亚线性遗憾。
Abstract
We introduce a theory of
reinforcement learning
(RL) in which the learner receives feedback only once at the end of an episode. While this is an extreme test case for theory, it is also arguably more representative of
r
→