BriefGPT.xyz
Nov, 2018
无参判别奖励的无监督控制
Unsupervised Control Through Non-Parametric Discriminative Rewards
HTML
PDF
David Warde-Farley, Tom Van de Wiele, Tejas Kulkarni, Catalin Ionescu, Steven Hansen...
TL;DR
本文提出了一种基于无人监督学习的算法,用于训练代理达成感知确定目标,通过学习目标条件化策略和目标实现奖励函数,代理人能够在没有手工奖励或专业数据的情况下掌握环境的控制方法。
Abstract
Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of
reinforcement learning
research. We present an
unsupervised learning
algorithm to train
→