Reinforcement learning (RL) methods work in discrete time. In order to apply
RL to inherently continuous problems like robotic control, a specific time
discretization needs to be defined. This is a choice between sparse time
control, which may be easier to train, and finer time control, which may allow
for better ultimate performance. In this work, we propose SusACER, an
off-policy RL algorithm that combines the advantages of different time
discretization settings. Initially, it operates with sparse time discretization
and gradually switches to a fine one. We analyze the effects of the changing
time discretization in robotic control environments: Ant, HalfCheetah, Hopper,
and Walker2D. In all cases our proposed algorithm outperforms state of the art.

提出了一种名为 SusACER 的离策略强化学习算法，它结合了不同时间离散化设置的优势，通过始初稀疏时间离散化逐渐转换为精细离散化，在机器人控制环境中进行分析，证实该算法在 Ant、HalfCheetah、Hopper 和 Walker2D 等场景中优于现有技术。

通过持续行动的可变时间离散化的演员 - 评论家方法

Actor-Critic with variable time discretization via sustained actions

Deep reinforcement learning (DRL) has been demonstrated to be effective for
several complex decision-making applications such as autonomous driving and
robotics. However, DRL is notoriously limited by its high sample complexity and
its lack of stability. Prior knowledge, e.g. as expert demonstrations, is often
available but challenging to leverage to mitigate these issues. In this paper,
we propose General Reinforced Imitation (GRI), a novel method which combines
benefits from exploration and expert data and is straightforward to implement
over any off-policy RL algorithm. We make one simplifying hypothesis: expert
demonstrations can be seen as perfect data whose underlying policy gets a
constant high reward. Based on this assumption, GRI introduces the notion of
offline demonstration agents. This agent sends expert data which are processed
both concurrently and indistinguishably with the experiences coming from the
online RL exploration agent. We show that our approach enables major
improvements on vision-based autonomous driving in urban environments. We
further validate the GRI method on Mujoco continuous control tasks with
different off-policy RL algorithms. Our method ranked first on the CARLA
Leaderboard and outperforms World on Rails, the previous state-of-the-art, by
17%.

本研究提出了一种名为 GRI 的新方法，将探索和专家数据的优点相结合，简单易实现。通过提出离线演示智能体的概念，将专家数据与在线探索智能体的经验同时处理，证明了该方法在基于视觉的城市道路场景下的自主驾驶和 Mujoco 连续控制任务上的有效性，并在 CARLA Leaderboard 上取得了第一名。

通用强化模仿及其在基于视觉的自动驾驶中的应用

GRI: General Reinforced Imitation and its Application to Vision-Based  Autonomous Driving

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust
to hyperparameterization, implementation details, or small environment changes
(Henderson et al. 2017, Zhang et al. 2018). Overcoming such sensitivity is key
to making DRL applicable to real world problems. In this paper, we identify
sensitivity to time discretization in near continuous-time environments as a
critical factor; this covers, e.g., changing the number of frames per second,
or the action frequency of the controller. Empirically, we find that
Q-learning-based approaches such as Deep Q- learning (Mnih et al., 2015) and
Deep Deterministic Policy Gradient (Lillicrap et al., 2015) collapse with small
time steps. Formally, we prove that Q-learning does not exist in continuous
time. We detail a principled way to build an off-policy RL algorithm that
yields similar performances over a wide range of time discretizations, and
confirm this robustness empirically.

本研究证明了 Q-learning 不存在于连续时间中，指出时间离散化的敏感性是 Deep Reinforcement Learning 具有鲁棒性的关键因素，提出了一种无模型的强化学习算法，能够在不同的时间离散化下稳健地工作。