BriefGPT.xyz
Aug, 2019
深度强化学习中的悬赏篡改问题及其解决方案: 因果影响图的视角
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
HTML
PDF
Tom Everitt, Marcus Hutter
TL;DR
本文讨论强化学习代理如何通过篡改奖励信号等路径达到其终身目标,并提出了防范奖励篡改的设计原则,得出了结果受因果影响图的启示。
Abstract
Can an arbitrarily intelligent
reinforcement learning
agent be kept under control by a human user? Or do agents with sufficient intelligence inevitably find ways to shortcut their reward signal? This question impacts how far
→