Deep visuomotor policy learning, which aims to map raw visual observation to
action, achieves promising results in control tasks such as robotic
manipulation and autonomous driving. However, it requires a huge number of
online interactions with the training environment, which limits its real-world
application. Compared to the popular unsupervised feature learning for visual
recognition, feature pretraining for visuomotor control tasks is much less
explored. In this work, we aim to pretrain policy representations for driving
tasks by watching hours-long uncurated YouTube videos. Specifically, we train
an inverse dynamic model with a small amount of labeled data and use it to
predict action labels for all the YouTube video frames. A new contrastive
policy pretraining method is then developed to learn action-conditioned
features from the video frames with pseudo action labels. Experiments show that
the resulting action-conditioned features obtain substantial improvements for
the downstream reinforcement learning and imitation learning tasks,
outperforming the weights pretrained from previous unsupervised learning
methods and ImageNet pretrained weight. Code, model weights, and data are
available at: this https URL

本文提出了一种基于相反动态模型和对比策略预训练的方法来为自动驾驶任务预训练策略模型，使用未经筛选的 YouTube 视频作为数据源，显著提高了强化学习和模仿学习等下游任务的准确性和效率。