BriefGPT.xyz
May, 2016
价值强化学习防止引线欺骗
Avoiding Wireheading with Value Reinforcement Learning
HTML
PDF
Tom Everitt, Marcus Hutter
TL;DR
本论文的主要研究领域是针对人工智能代理制定良好的目标,并提出一种被称为价值强化学习的替代方案,它使用奖励信号来学习效用函数,解决了机器学习中所面临的wireheading问题。
Abstract
How can we design good goals for
arbitrarily intelligent agents
?
reinforcement learning
(RL) is a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentiv
→