BriefGPT.xyz
Jun, 2019
通过内在奖励调节行为:一项调查和实证研究
Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study
HTML
PDF
Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White
TL;DR
本文通过在一个类似于赌博机的并行学习测试平台中比较14个不同的回报机制,探索并比较不同的内在回报机制,重点突出了奖励和预测学习器之间的交互作用和内省预测学习器的重要性。结果表明,基于学习量的内在奖励可以生成有用的行为,如果每个学习器是内省的。
Abstract
Learning about many things can provide numerous benefits to a
reinforcement learning
system. For example, learning many auxiliary
value functions
, in addition to optimizing the environmental reward, appears to im
→