In the typical autonomous driving stack, planning and control systems
represent two of the most crucial components in which data retrieved by sensors
and processed by perception algorithms are used to implement a safe and
comfortable self-driving behavior. In particular, the planning module predicts
the path the autonomous car should follow taking the correct high-level
maneuver, while control systems perform a sequence of low-level actions,
controlling steering angle, throttle and brake. In this work, we propose a
model-free Deep Reinforcement Learning Planner training a neural network that
predicts both acceleration and steering angle, thus obtaining a single module
able to drive the vehicle using the data processed by localization and
perception algorithms on board of the self-driving car. In particular, the
system that was fully trained in simulation is able to drive smoothly and
safely in obstacle-free environments both in simulation and in a real-world
urban area of the city of Parma, proving that the system features good
generalization capabilities also driving in those parts outside the training
scenarios. Moreover, in order to deploy the system on board of the real
self-driving car and to reduce the gap between simulated and real-world
performances, we also develop a module represented by a tiny neural network
able to reproduce the real vehicle dynamic behavior during the training in
simulation.

使用深层强化学习训练神经网络来实现自动驾驶规划栈中的控制系统，不仅使模拟环境中的自动驾驶汽车能够在没有障碍物的情况下平稳、安全地行驶，在真实世界城市中也能表现出良好的泛化能力。

运用深度强化学习解决实际自动驾驶问题

Tackling Real-World Autonomous Driving using Deep Reinforcement Learning

This paper considers the problem of learning a model in model-based
reinforcement learning (MBRL). We examine how the planning module of an MBRL
algorithm uses the model, and propose that the model learning module should
incorporate the way the planner is going to use the model. This is in contrast
to conventional model learning approaches, such as those based on maximum
likelihood estimate, that learn a predictive model of the environment without
explicitly considering the interaction of the model and the planner. We focus
on policy gradient type of planning algorithms and derive new loss functions
for model learning that incorporate how the planner uses the model. We call
this approach Policy-Aware Model Learning (PAML). We theoretically analyze a
generic model-based policy gradient algorithm and provide a convergence
guarantee for the optimized policy. We also empirically evaluate PAML on some
benchmark problems, showing promising results.

本文研究了模型基强化学习中模型的学习，提出了基于 Policy-Aware Model Learning (PAML) 的带权损失函数来学习模型， 结果证明该方法在某些基准问题上表现良好。

面向策略梯度方法的策略感知模型学习

Policy-Aware Model Learning for Policy Gradient Methods

We introduce the value iteration network (VIN): a fully differentiable neural
network with a `planning module' embedded within. VINs can learn to plan, and
are suitable for predicting outcomes that involve planning-based reasoning,
such as policies for reinforcement learning. Key to our approach is a novel
differentiable approximation of the value-iteration algorithm, which can be
represented as a convolutional neural network, and trained end-to-end using
standard backpropagation. We evaluate VIN based policies on discrete and
continuous path-planning domains, and on a natural-language based search task.
We show that by learning an explicit planning computation, VIN policies
generalize better to new, unseen domains.

文章介绍了价值迭代网络（VIN），它是一个内嵌有 “规划模块” 的全可微神经网络，可以学习规划和预测基于规划的推理，如强化学习策略，其中的关键是一种新颖的可微近似值迭代算法，可表示为卷积神经网络，并使用标准反向传播进行端到端训练。我们在离散和连续路径规划域以及基于自然语言的搜索任务上评估 VIN 策略，并表明通过学习显式规划计算，VIN 策略可以更好地推广到新的、未见过的域。