超越乐观：具有部分可观察奖励的探索

Jun, 2024

超越乐观：具有部分可观察奖励的探索

Beyond Optimism: Exploration With Partially Observable Rewards

Simone Parisi, Alireza Kazemipour, Michael Bowling

TL;DR通过提出一种新的探索策略，克服现有方法的局限性，即使奖励不总是可观察到，也能保证收敛到最佳策略。我们还提出了一系列用于在强化学习中进行探索的表格环境（有或没有不可观察的奖励），并展示我们的方法优于现有方法。

Abstract

Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative →