BriefGPT.xyz
Apr, 2021
模拟过去的学习
Learning What To Do by Simulating the Past
HTML
PDF
David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan
TL;DR
本文摘要:本研究的目的是基于人类反馈对智能体进行政策学习,同时通过学习特征编码器结合学习反向模型,从而使得智能体能够向后模拟人类行为以推断人类行为背后的动机。
Abstract
Since
reward functions
are hard to specify, recent work has focused on
learning policies
from
human feedback
. However, such approaches are
→