BriefGPT.xyz
Feb, 2023
离线强化学习中的选择性不确定性传播
Selective Uncertainty Propagation in Offline RL
HTML
PDF
Sanath Kumar Krishnamurthy, Tanmay Gangwani, Sumeet Katariya, Branislav Kveton, Anshuka Rangi
TL;DR
研究了有限时间内的离线强化学习问题,提出了一种基于动作影响估计的算法,可在统计上简单实例上胜过传统的悲观方法。
Abstract
We study the
finite-horizon
offline reinforcement learning
(RL) problem. Since actions at any state can affect next-state distributions, the related
→