BriefGPT.xyz
Feb, 2024
监控的马尔可夫决策过程
Monitored Markov Decision Processes
HTML
PDF
Simone Parisi, Montaser Mohammedalamen, Alireza Kazemipour, Matthew E. Taylor, Michael Bowling
TL;DR
在本文中,我们提出了一种新的强化学习框架-监控马尔可夫决策过程(Monitored MDPs),该框架解决了强化学习中奖励无法被完全观测到的问题,并讨论了该设置的理论和实践后果,提出了相应的算法。
Abstract
In
reinforcement learning
(RL), an agent learns to perform a task by interacting with an environment and receiving feedback (a numerical reward) for its actions. However, the assumption that
rewards
are always ob
→