Online Reinforcement learning (RL) typically requires high-stakes online interaction data to learn a policy for a target task. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that -- without information on the dynamics shift -- general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that achieves problem-dependent sample complexity and outperforms pure online RL. Finally, our experimental results demonstrate that HySRL surpasses state-of-the-art online RL baseline.

本研究旨在解决在线强化学习中利用历史数据提高样本效率的挑战。提出了一种混合迁移强化学习（HTRL）设置，利用来自具有转移动力学的源环境的离线数据，使学习过程更有效。实验结果表明，所提出的HySRL算法在样本复杂性方面优于传统的在线强化学习方法，有潜力显著提升在不同环境中学习的效率。

混合迁移强化学习：基于转移动力学数据的可证样本效率