无模型鲁棒平均奖励强化学习

May, 2023

Model-Free Robust Average-Reward Reinforcement Learning

Yue Wang, Alvaro Velasquez, George Atia, Ashley Prater-Bennette, Shaofeng Zou

TL;DR该研究主要关注如何处理模型不确定性对于Markov决策进程的影响。研究提出了两个无模型算法并探讨了常用的不确定性集合。

Abstract

robust markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on the robust →