BriefGPT.xyz
Mar, 2024
顺序任务设置中最小化局部遗憾的谬误
The Fallacy of Minimizing Local Regret in the Sequential Task Setting
HTML
PDF
Ziping Xu, Kelly W. Zhang, Susan A. Murphy
TL;DR
强化学习中,研究任务间具有变化时,通过最小化后悔累积可以实现更好的结果,即在每个任务中过度探索,尤其在任务之间出现重大变化时。
Abstract
In the realm of
reinforcement learning
(RL),
online rl
is often conceptualized as an optimization problem, where an algorithm interacts with an unknown environment to minimize cumulative
→