BriefGPT.xyz
Sep, 2019
强化学习奖励机制和政策的联合推理
Joint Inference of Reward Machines and Policies for Reinforcement Learning
HTML
PDF
Zhe Xu, Ivan Gavran, Yousef Ahmad, Rupak Majumdar, Daniel Neider...
TL;DR
研究了如何通过迭代算法将奖励机器与q-learning相结合,以便在复杂任务中实现快速政策优化。
Abstract
Incorporating high-level knowledge is an effective way to expedite
reinforcement learning
(RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of
→