面向目标驱动任务的基于计划的放松奖励塑形

Jul, 2021

面向目标驱动任务的基于计划的放松奖励塑形

Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks

Ingmar Schubert, Ozgur S. Oguz, Marc Toussaint

TL;DR本文提出了一种解决状态空间高维时强化学习探索问题的Final-Volume-Preserving Reward Shaping (FV-RS)方法，相比于之前的potential-based reward shaping方法，FV-RS放松了不断保持最优解保证，从而更适合于提高强化学习算法的样本效率，并在模拟机器人操作任务中实现了显著的改进

Abstract

In high-dimensional state spaces, the usefulness of reinforcement learning (RL) is limited by the problem of exploration. This issue has been addressed using potential-based reward shaping (PB-RS) previously. In