BriefGPT.xyz
Mar, 2023
带外部状态和奖励的强化学习
Reinforcement Learning with Exogenous States and Rewards
HTML
PDF
George Trimponias, Thomas G. Dietterich
TL;DR
本文提出了一种将MDP分解为外源Markov奖励过程和内源Markov决策过程的方法,以优化内源的奖励,以解决外源状态变量和奖励对MDP强化学习造成的干扰,并给出了在线发现状态空间中混合外源和内源状态的算法,改进了强化学习的效率。
Abstract
exogenous state variables
and rewards can slow
reinforcement learning
by injecting uncontrolled variation into the reward signal. This paper formalizes
→