BriefGPT.xyz
Jul, 2019
自监督学习距离函数用于目标条件强化学习
Self-supervised Learning of Distance Functions for Goal-Conditioned Reinforcement Learning
HTML
PDF
Srinivas Venkattaramanujam, Eric Crawford, Thang Doan, Doina Precup
TL;DR
本文在使用子目标分解强化学习问题时,提出学习适当距离的方法以确定目标是否已实现,并就三种不同情境提出了解决方案,同时还提出了一个目标生成机制。
Abstract
goal-conditioned policies
are used in order to break down complex
reinforcement learning
(RL) problems by using subgoals, which can be defined either in state space or in a latent feature space. This can increase
→