BriefGPT.xyz
Jun, 2023
利普希茨动态风险度量下的风险敏感型强化学习遗憾界
Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures
HTML
PDF
Hao Liang, Zhi-quan Luo
TL;DR
本研究应用Lipschitz动态风险度量,提出了两种模型算法用于有限时间马尔可夫决策过程,建立了遗憾上界和下界,并通过数值实验证实了理论结果。
Abstract
We study finite episodic
markov decision processes
incorporating
dynamic risk measures
to capture risk sensitivity. To this end, we present two model-based algorithms applied to \emph{
→