BriefGPT.xyz
Feb, 2024
离线约束强化学习的低秩MDP原始-对偶算法
A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Low-Rank MDPs
HTML
PDF
Kihyuk Hong, Ambuj Tewari
TL;DR
该论文提出了一种用于解决低秩Markov决策过程的离线强化学习算法,该算法在折扣无限时间段设置中具有较低的样本复杂度,且支持离线约束强化学习设置。
Abstract
offline reinforcement learning
(RL) aims to learn a policy that maximizes the expected cumulative reward using a pre-collected dataset. Offline RL with
low-rank mdps
or general function approximation has been wid
→