CORL is an open-source library that provides single-file implementations of
Deep Offline Reinforcement Learning algorithms. It emphasizes a simple
developing experience with a straightforward codebase and a modern analysis
tracking tool. In CORL, we isolate methods implementation into distinct single
files, making performance-relevant details easier to recognise. Additionally,
an experiment tracking feature is available to help log metrics,
hyperparameters, dependencies, and more to the cloud. Finally, we have ensured
the reliability of the implementations by benchmarking a commonly employed D4RL
benchmark. The source code can be found at this https URL

CORL 是一个开源库，提供单文件实现的深度离线强化学习算法，强调简单的开发体验和现代化的分析跟踪工具，通过将方法实现隔离到不同的单个文件中，使得性能相关的细节更易识别，同时提供实验跟踪功能，可将指标、超参数、依赖等日志记录到云端，并通过对常用的 D4RL 基准测试进行基准测试，确保了实现的可靠性。

CORL：面向研究的深度离线强化学习库

CORL: Research-oriented Deep Offline Reinforcement Learning Library

Recent advance in deep offline reinforcement learning (RL) has made it
possible to train strong robotic agents from offline datasets. However,
depending on the quality of the trained agents and the application being
considered, it is often desirable to fine-tune such agents via further online
interactions. In this paper, we observe that state-action distribution shift
may lead to severe bootstrap error during fine-tuning, which destroys the good
initial policy obtained via offline RL. To address this issue, we first propose
a balanced replay scheme that prioritizes samples encountered online while also
encouraging the use of near-on-policy samples from the offline dataset.
Furthermore, we leverage multiple Q-functions trained pessimistically offline,
thereby preventing overoptimism concerning unfamiliar actions at novel states
during the initial training phase. We show that the proposed method improves
sample-efficiency and final performance of the fine-tuned robotic agents on
various locomotion and manipulation tasks. Our code is available at:
this https URL

本文提出了一种平衡重放机制和多 Q 函数的方法来解决深度离线强化学习中的状态 - 动作分布移位问题，以提高机器人代理在不同运动和操作任务中的样本效率和最终性能。