BriefGPT.xyz
Jul, 2024
平均奖励和分段强化学习的乐观Q学习
Optimistic Q-learning for average reward and episodic reinforcement learning
HTML
PDF
Priyank Agrawal, Shipra Agrawal
TL;DR
我们提出了一种乐观的Q学习算法,用于在额外假设下的平均奖励强化学习中实现遗憾最小化,该额外假设是对底层MDP的所有策略来说,访问某些频繁状态s0的预期时间是有限的并且上界为H。
Abstract
We present an
optimistic q-learning algorithm
for
regret minimization
in
average reward reinforcement learning
under an additional assumpt
→