离线强化学习中的选择性不确定性传播

Feb, 2023

离线强化学习中的选择性不确定性传播

Selective Uncertainty Propagation in Offline RL

Sanath Kumar Krishnamurthy, Tanmay Gangwani, Sumeet Katariya, Branislav Kveton, Anshuka Rangi

TL;DR研究了有限时间内的离线强化学习问题，提出了一种基于动作影响估计的算法，可在统计上简单实例上胜过传统的悲观方法。

Abstract

We study the finite-horizon offline reinforcement learning (RL) problem. Since actions at any state can affect next-state distributions, the related →