Autonomous driving is a multi-agent setting where the host vehicle must apply
sophisticated negotiation skills with other road users when overtaking, giving
way, merging, taking left and right turns and while pushing ahead in
unstructured urban roadways. Since there are many possible scenarios, manually
tackling all possible cases will likely yield a too simplistic policy.
Moreover, one must balance between unexpected behavior of other
drivers/pedestrians and at the same time not to be too defensive so that normal
traffic flow is maintained.
In this paper we apply deep reinforcement learning to the problem of forming
long term driving strategies. We note that there are two major challenges that
make autonomous driving different from other robotic tasks. First, is the
necessity for ensuring functional safety - something that machine learning has
difficulty with given that performance is optimized at the level of an
expectation over many instances. Second, the Markov Decision Process model
often used in robotics is problematic in our case because of unpredictable
behavior of other agents in this multi-agent scenario. We make three
contributions in our work. First, we show how policy gradient iterations can be
used without Markovian assumptions. Second, we decompose the problem into a
composition of a Policy for Desires (which is to be learned) and trajectory
planning with hard constraints (which is not learned). The goal of Desires is
to enable comfort of driving, while hard constraints guarantees the safety of
driving. Third, we introduce a hierarchical temporal abstraction we call an
"Option Graph" with a gating mechanism that significantly reduces the effective
horizon and thereby reducing the variance of the gradient estimation even
further.

本文介绍了一种利用深度强化学习解决自动驾驶问题的方案，不同于其他机器人任务，自动驾驶需要确保功能安全和在多个智能体情境下执行正确的决策，其中的主要挑战包括如何处理多个智能体的不确定行为，以及如何在 “Desires” 策略和难以控制的路径规划之间实现平衡。