In the same way that the computer vision (CV) and natural language processing
(NLP) communities have developed self-supervised methods, reinforcement
learning (RL) can be cast as a self-supervised problem: learning to reach any
goal, without requiring human-specified rewards or labels. However, actually
building a self-supervised foundation for RL faces some important challenges.
Building on prior contrastive approaches to this RL problem, we conduct careful
ablation experiments and discover that a shallow and wide architecture,
combined with careful weight initialization and data augmentation, can
significantly boost the performance of these contrastive RL approaches on
challenging simulated benchmarks. Additionally, we demonstrate that, with these
design decisions, contrastive approaches can solve real-world robotic
manipulation tasks, with tasks being specified by a single goal image provided
after training.

通过对比强化学习问题的先前对比方法，我们发现，结合谨慎的权重初始化和数据增强，使用浅而宽的架构可以显著提高这些对比强化学习方法在挑战性的模拟基准测试中的性能，并且可以通过这些设计决策解决实际的机器人操作任务。