BriefGPT.xyz
Nov, 2022
基于偏好的快速适应元强化学习
Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation
HTML
PDF
Zhizhou Ren, Anji Liu, Yitao Liang, Jian Peng, Jianzhu Ma
TL;DR
本研究基于元强化学习框架,探究了在人机交互中,通过基于偏好的反馈,而非数值奖励,在少数试验中快速调整策略以适应新任务的机制,并通过信息论技术设计问题序列来最大化人类专家的信息获取效率,实验结果表明其显著优于传统算法。
Abstract
Learning new task-specific skills from a few trials is a fundamental challenge for artificial intelligence.
meta reinforcement learning
(meta-RL) tackles this problem by learning transferable policies that support
few-s
→