BriefGPT.xyz
Mar, 2023
Preference Transformer:使用 Transformers 模拟人类偏好的 RL 建模
Preference Transformer: Modeling Human Preferences using Transformers for RL
HTML
PDF
Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel...
TL;DR
研究了基于偏好的强化学习应用于人类决策,使用transformer建立时间依赖的偏好模型,在控制任务上成功训练,对人类决策的时间依赖可以自动捕捉。
Abstract
preference-based reinforcement learning
(RL) provides a framework to train agents using
human preferences
between two behaviors. However, preference-based RL has been challenging to scale since it requires a larg
→