BriefGPT.xyz
Jan, 2025
自助奖励塑形
Bootstrapped Reward Shaping
HTML
PDF
Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V. Kulkarni
TL;DR
本研究解决了强化学习中稀疏奖励领域观察奖励信息所需步骤过多的问题。我们提出了一种“自助式”奖励塑形方法(BSRS),使得代理当前的状态值函数估计可作为潜在函数,从而在保留最优策略不变的情况下提高奖励信号的密度。研究表明,该方法加速了Atari游戏中的训练过程,具有显著的影响力。
Abstract
In
Reinforcement Learning
, especially in sparse-reward domains, many environment steps are required to observe reward information. In order to increase the frequency of such observations, "potential-based
Reward Shaping
→