In this work, we investigate how to leverage pre-trained visual-language
models (VLM) for online Reinforcement Learning (RL). In particular, we focus on
sparse reward tasks with pre-defined textual task descriptions. We first
identify the problem of reward misalignment when applying VLM as a reward in RL
tasks. To address this issue, we introduce a lightweight fine-tuning method,
named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL.
Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse
reward tasks by fine-tuning VLM representations and using relay RL to avoid
local minima. Extensive experiments on the Meta-world benchmark tasks
demonstrate the efficacy of the proposed method. Code is available at:
{\footnotesizehttps://github.com/fuyw/FuRL}.

本研究调查了如何利用预训练的视觉语言模型（VLM）用于在线强化学习（RL），特别关注稀疏奖励任务下的奖励错位问题，提出了一种轻量级微调方法（称为 FuRL），通过奖励对齐和中继 RL 来增强 SAC/DrQ 基准智能体在稀疏奖励任务中的性能，实验证明了该方法的有效性。