Offline Reinforcement Learning (RL) has shown promising results in learning a
task-specific policy from a fixed dataset. However, successful offline RL often
relies heavily on the coverage and quality of the given dataset. In scenarios
where the dataset for a specific task is limited, a natural approach is to
improve offline RL with datasets from other tasks, namely, to conduct
Multi-Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from
other tasks exacerbates the distribution shift in offline RL. In this paper, we
propose an uncertainty-based MTDS approach that shares the entire dataset
without data selection. Given ensemble-based uncertainty quantification, we
perform pessimistic value iteration on the shared offline dataset, which
provides a unified framework for single- and multi-task offline RL. We further
provide theoretical analysis, which shows that the optimality gap of our method
is only related to the expected data coverage of the shared dataset, thus
resolving the distribution shift issue in data sharing. Empirically, we release
an MTDS benchmark and collect datasets from three challenging domains. The
experimental results show our algorithm outperforms the previous
state-of-the-art methods in challenging MTDS problems. See
this https URL for the datasets and code.

离线强化学习（RL）在从固定数据集中学习特定任务策略方面显示出有希望的结果。然而，成功的离线 RL 往往严重依赖于给定数据集的覆盖范围和质量。在特定任务的数据集有限的情况下，一种自然的方法是通过来自其他任务的数据集改进离线 RL，即进行多任务数据共享（MTDS）。然而，直接共享来自其他任务的数据集会加剧离线 RL 中的分布偏移。在本文中，我们提出了一种基于不确定性的 MTDS 方法，该方法在不进行数据选择的情况下共享整个数据集。根据基于集合的不确定性量化，我们在共享的离线数据集上执行悲观值迭代，为单任务和多任务离线 RL 提供了统一框架。我们进一步提供了理论分析，表明我们的方法的最优性差距仅与共享数据集的预期数据覆盖相关，从而解决了数据共享中的分布偏移问题。在实证方面，我们发布了一个 MTDS 基准，并从三个具有挑战性的领域收集了数据集。实验结果显示，我们的算法在具有挑战性的 MTDS 问题中优于先前的最先进方法。