BriefGPT.xyz
Apr, 2018
学习策略梯度方法的内部奖励
On Learning Intrinsic Rewards for Policy Gradient Methods
HTML
PDF
Zeyu Zheng, Junhyuk Oh, Satinder Singh
TL;DR
本文中,研究了在序列决策任务中,优化奖励函数对于强化学习的性能具有重要意义,提出了一种适用于基于策略梯度的学习代理的学习内在奖励的算法,并在性能上对比了使用该方法的强化学习代理和仅使用外在奖励的代理。
Abstract
In many
sequential decision making
tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the
→