In multi-goal reinforcement learning in an environment, agents learn policies to achieve multiple goals by using experiences gained from interactions with the environment. With a sparse binary reward, training agents is particularly challenging, due to a lack of successful experiences. To solve this problem, hindsight experience replay (HER) generates successful experiences from unsuccessful experiences. However, generating successful experiences without consideration of the property of achieved goals is less efficient. In this paper, a novel cluster-based sampling strategy exploiting the property of achieved goals is proposed. The proposed sampling strategy groups episodes with different achieved goals and samples experiences in the manner of HER. For the grouping, K-means clustering algorithm is used. The centroids of the clusters are obtained from the distribution of failed goals defined as the original goals not achieved. The proposed method is validated by experiments with three robotic control tasks of the OpenAI Gym. The results of experiments demonstrate that the proposed method significantly reduces the number of epochs required for convergence in two of the three tasks and marginally increases the success rates in the remaining one. It is also shown that the proposed method can be combined with other sampling strategies for HER.

提出了一种基于聚类的采样策略，利用成就目标的属性对轨迹进行分组，并在此基础上采样经验，用于解决多目标强化学习中稀疏奖励的问题。实验结果表明，该方法在三个机器人控制任务中具有显著的优化效果，可以缩短模型收敛时间和提升成功率。

基于聚类的反事实经验回放在机器人控制中的应用