Deep reinforcement learning (RL) has achieved great empirical successes in
various domains. However, the large search space of neural networks requires a
large amount of data, which makes the current RL algorithms not sample
efficient. Motivated by the fact that many environments with continuous state
space have smooth transitions, we propose to learn a smooth policy that behaves
smoothly with respect to states. We develop a new framework -- \textbf{S}mooth
\textbf{R}egularized \textbf{R}einforcement \textbf{L}earning
($\textbf{SR}^2\textbf{L}$), where the policy is trained with
smoothness-inducing regularization. Such regularization effectively constrains
the search space, and enforces smoothness in the learned policy. Moreover, our
proposed framework can also improve the robustness of policy against
measurement error in the state space, and can be naturally extended to
distribubutionally robust setting. We apply the proposed framework to both
on-policy (TRPO) and off-policy algorithm (DDPG). Through extensive
experiments, we demonstrate that our method achieves improved sample efficiency
and robustness.

提出了新的 Deep RL 框架 $	extbf {SR}^2	extbf {L}$，通过引入 smoothness-induced regularization，使学习到的 policy 对连续状态空间的过渡 smooth，提高抗扰动能力和样本效率。在 TRPO 和 DDPG 上实验表明，该方法取得了效果的提升。