When in a new situation or geographical location, human drivers have an
extraordinary ability to watch others and learn maneuvers that they themselves
may have never performed. In contrast, existing techniques for learning to
drive preclude such a possibility as they assume direct access to an
instrumented ego-vehicle with fully known observations and expert driver
actions. However, such measurements cannot be directly accessed for the non-ego
vehicles when learning by watching others. Therefore, in an application where
data is regarded as a highly valuable asset, current approaches completely
discard the vast portion of the training data that can be potentially obtained
through indirect observation of surrounding vehicles. Motivated by this key
insight, we propose the Learning by Watching (LbW) framework which enables
learning a driving policy without requiring full knowledge of neither the state
nor expert actions. To increase its data, i.e., with new perspectives and
maneuvers, LbW makes use of the demonstrations of other vehicles in a given
scene by (1) transforming the ego-vehicle's observations to their points of
view, and (2) inferring their expert actions. Our LbW agent learns more robust
driving policies while enabling data-efficient learning, including quick
adaptation of the policy to rare and novel scenarios. In particular, LbW drives
robustly even with a fraction of available driving data required by existing
methods, achieving an average success rate of 92% on the original CARLA
benchmark with only 30 minutes of total driving data and 82% with only 10
minutes.

提出了一个名为 Learning by Watching (LbW) 的框架，通过间接观察周围车辆的演示来增加驾驶策略的数据量和新颖性，从而实现更加鲁棒的驾驶，快速适应新场景，并且只需要 10 分钟的数据即可达到 82% 的成功率。

观察学习

Learning by Watching

Reinforcement learning is considered as a promising direction for driving
policy learning. However, training autonomous driving vehicle with
reinforcement learning in real environment involves non-affordable
trial-and-error. It is more desirable to first train in a virtual environment
and then transfer to the real environment. In this paper, we propose a novel
realistic translation network to make model trained in virtual environment be
workable in real world. The proposed network can convert non-realistic virtual
image input into a realistic one with similar scene structure. Given realistic
frames as input, driving policy trained by reinforcement learning can nicely
adapt to real world driving. Experiments show that our proposed virtual to real
(VR) reinforcement learning (RL) works pretty well. To our knowledge, this is
the first successful case of driving policy trained by reinforcement learning
that can adapt to real world driving data.

本文提出了一种虚拟到现实的转换网络，使得在虚拟环境中训练的强化学习驾驶策略可在现实世界中适应，实验证明此方法效果显著且为首次成功的案例。