BriefGPT.xyz
Dec, 2019
通过评估假设行为学习人类目标
Learning Human Objectives by Evaluating Hypothetical Behavior
HTML
PDF
Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike
TL;DR
通过最大限度地提高信息价值的可跟踪代理来学习用户奖励模型,以与强化学习中未知动态、未知奖励函数和未知不安全状态的用户目标相一致。
Abstract
We seek to align agent behavior with a user's objectives in a
reinforcement learning
setting with unknown dynamics, an unknown
reward function
, and unknown
→