May, 2024
RL 代理体验的影响因素:高效估计经验的影响
Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
Takuya Hiraoka, Guanquan Wang, Takashi Onishi, Yoshimasa Tsuruoka
TL;DR本文介绍了一种高效估计经验影响的方法,Policy Iteration with Turn-over Dropout (PIToD),并应用于改进表现不佳的强化学习代理,通过估计负面有影响的经验并删除它们的影响,显著提高了代理的性能。