BriefGPT.xyz
May, 2023
基于奖励机器的自适应强化学习
Reward-Machine-Guided, Self-Paced Reinforcement Learning
HTML
PDF
Cevahir Koprulu, Ufuk Topcu
TL;DR
本研究提出一种基于奖励机制的自适应学习算法,它可以通过自动生成特定上下文概率分布的课程来提高强化学习的数据效率,并在长期规划任务中取得了可靠的最优行为。
Abstract
self-paced reinforcement learning
(RL) aims to improve the
data efficiency
of learning by automatically creating sequences, namely curricula, of probability distributions over contexts. However, existing techniqu
→