BriefGPT.xyz
May, 2023
具低秩结构的离线强化学习矩阵估计
Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure
HTML
PDF
Xumei Xi, Christina Lee Yu, Yudong Chen
TL;DR
本文提出了一种离线策略评估算法,该算法利用了隐含的低秩结构来估计未被覆盖的状态-动作对的值,同时提供了一个离线策略优化算法,且具有非渐近性能保证。
Abstract
We consider
offline reinforcement learning
(RL), where the agent does not interact with the environment and must rely on offline data collected using a behavior policy. Previous works provide
policy evaluation
gu
→