BriefGPT.xyz
May, 2022
一种非情节式强化学习的状态分布匹配方法
A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning
HTML
PDF
Archit Sharma, Rehaan Ahmad, Chelsea Finn
TL;DR
提出一种名为MEDAL的新方法,它将反向策略训练成与提供的演示中的状态分布匹配,以使代理保持接近与任务相关的状态,从而为前向策略提供易于和困难的起始状态,而且在连续控制任务上匹配或优于先前的方法,同时做出比以前更少的假设。
Abstract
While
reinforcement learning
(RL) provides a framework for learning through trial and error, translating RL algorithms into the real world has remained challenging. A major hurdle to
real-world
application arises
→