BriefGPT.xyz
Feb, 2024
透过部分监督强化学习学习后见可观测部分可解释策略
Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement Learning
HTML
PDF
Michael Lanier, Ying Xu, Nathan Jacobs, Chongjie Zhang, Yevgeniy Vorobeychik
TL;DR
通过融合监督学习和无监督学习,部分监督强化学习(PSRL)框架能够提供更可解释的策略和丰富的潜在洞察力,从而在奖励和收敛速度等方面保持并大大超越传统方法的性能基准。
Abstract
deep reinforcement learning
has demonstrated remarkable achievements across diverse domains such as video games, robotic control, autonomous driving, and drug discovery. Common methodologies in
partially-observable doma
→