BriefGPT.xyz
May, 2024
免模型强化学习中的$φ$-散度使用离线和在线数据
Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data
HTML
PDF
Kishan Panaganti, Adam Wierman, Eric Mazumdar
TL;DR
鲁棒的φ-正则化马尔可夫决策过程(RRMDP)框架的关键贡献是提出了无模型算法,通过历史数据和在线采样来学习最优的鲁棒政策,并在高维系统中进行了理论保证。
Abstract
The
robust
$\phi$-regularized
markov decision process
(RRMDP) framework focuses on designing control policies that are
robust
against para
→