Recent studies on online reinforcement learning (RL) have demonstrated the advantages of learning multiple behaviors from a single task, as in the case of few-shot adaptation to a new environment. Although this approach is expected to yield similar benefits in offline RL, appropriate methods for learning multiple solutions have not been fully investigated in previous studies. In this study, we therefore addressed the problem of finding multiple solutions from a single task in offline RL. We propose algorithms that can learn multiple solutions in offline RL, and empirically investigate their performance. Our experimental results show that the proposed algorithm learns multiple qualitatively and quantitatively distinctive solutions in offline RL.

通过研究在线强化学习，在少样本适应新环境的情况下，从一个任务中学习多种行为的优势已经被证明。然而，在离线强化学习中，学习多个解决方案的适当方法并未在先前研究中得到充分探讨。本研究因此解决了在离线强化学习中从单个任务中找到多个解决方案的问题。我们提出了一些可以在离线强化学习中学习多个解决方案的算法，并通过实证研究了它们的性能。实验结果表明，所提出的算法在离线强化学习中学习到了多个在质量和数量上都有显著差异的解决方案。

离线强化学习中从单个任务中发现多个解决方案