BriefGPT.xyz
Apr, 2022
从无向状态经验中学习价值函数
Learning Value Functions from Undirected State-only Experience
HTML
PDF
Matthew Chang, Arjun Gupta, Saurabh Gupta
TL;DR
本文介绍了一种从无指向性状态体验(即(s,s',r)三元组,没有动作标签的状态转换)中学习价值函数的方法,该方法基于 Q-learning 将离散潜在变量预测模型中产生的离散潜在动作与值函数联系起来,并实验证明其效益。
Abstract
This paper tackles the problem of learning
value functions
from
undirected state-only experience
(state transitions without action labels i.e. (s,s',r) tuples). We first theoretically characterize the applicabili
→