BriefGPT.xyz
Jul, 2022
通过学习动机一致的内在回报自动设计奖励
Automatic Reward Design via Learning Motivation-Consistent Intrinsic Rewards
HTML
PDF
Yixiang Wang, Yujing Hu, Feng Wu, Yingfeng Chen
TL;DR
本文提出了一种基于动机的奖励设计方法,自动生成目标一致的内在奖励,以最大程度地增大期望的累积外在奖励,该方法在处理延迟奖励、探索和信用分配问题方面优于现有方法。
Abstract
reward design
is a critical part of the application of
reinforcement learning
, the performance of which strongly depends on how well the reward signal frames the goal of the designer and how well the signal asses
→