BriefGPT.xyz
Nov, 2022
运算分裂价值迭代
Operator Splitting Value Iteration
HTML
PDF
Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand
TL;DR
介绍一种基于近似环境模型的规划与强化学习算法,名为 Operator Splitting Value Iteration (OS-VI),能更快地达到收敛,同时提出了针对采样的版本 OS-Dyna 用于处理模型误差问题。
Abstract
We introduce new
planning
and
reinforcement learning
algorithms for discounted
mdps
that utilize an approximate model of the environment t
→