BriefGPT.xyz
Nov, 2021
指数贝尔曼方程与强化学习风险敏感性的改进遗憾界
Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning
HTML
PDF
Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang
TL;DR
本研究旨在探究基于熵风险度量的风险敏感强化学习,通过开发一种新的风险敏感反馈机制,使得监督过程能够更有效地引导智能体策略的改进,进而提升其性能表现。
Abstract
We study
risk-sensitive reinforcement learning
(RL) based on the
entropic risk measure
. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential g
→