Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.

从先前记录的数据中学习策略是实现真实世界机器人任务的一个有前景的方向，我们提出了一个基准，其中包括：使用能力强大的强化学习代理在模拟中训练的两个任务的熟练操纵平台的大量离线学习数据的收集，在真实世界机器人系统和模拟中执行学习策略的选项以进行高效调试。我们评估了知名的开源离线强化学习算法，并为真实系统上的离线强化学习提供了可重现的实验设置。

在真实机器人硬件上进行离线强化学习的基准测试