Reinforcement learning (RL) methods with a high replay ratio (RR) and
regularization have gained interest due to their superior sample efficiency.
However, these methods have mainly been developed for dense-reward tasks. In
this paper, we aim to extend these RL methods to sparse-reward goal-conditioned
tasks. We use Randomized Ensemble Double Q-learning (REDQ) (Chen et al., 2021),
an RL method with a high RR and regularization. To apply REDQ to sparse-reward
goal-conditioned tasks, we make the following modifications to it: (i) using
hindsight experience replay and (ii) bounding target Q-values. We evaluate REDQ
with these modifications on 12 sparse-reward goal-conditioned tasks of Robotics
(Plappert et al., 2018), and show that it achieves about $2 \times$ better
sample efficiency than previous state-of-the-art (SoTA) RL methods.
Furthermore, we reconsider the necessity of specific components of REDQ and
simplify it by removing unnecessary ones. The simplified REDQ with our
modifications achieves $\sim 8 \times$ better sample efficiency than the SoTA
methods in 4 Fetch tasks of Robotics.

纵观文中，研究主要集中在强化学习方法中如何将高重放比率（RR）与正则化相结合，以推进稀疏奖励目标条件任务并提高样本效率。作者对 Randomized Ensemble Double Q-learning 方法进行了修改并应用于稀疏奖励目标条件任务，在 12 个机器人学任务的评估中表现出了约 2 倍于先前的最先进强化学习方法的样本效率，并同时降低了 REDQ 的复杂性，使之在 4 个 Fetch 机器人任务中达到了约 8 倍于先前方法的样本效率。