Robustness remains a paramount concern in deep reinforcement learning (DRL),
with randomized smoothing emerging as a key technique for enhancing this
attribute. However, a notable gap exists in the performance of current smoothed
DRL agents, often characterized by significantly low clean rewards and weak
robustness. In response to this challenge, our study introduces innovative
algorithms aimed at training effective smoothed robust DRL agents. We propose
S-DQN and S-PPO, novel approaches that demonstrate remarkable improvements in
clean rewards, empirical robustness, and robustness guarantee across standard
RL benchmarks. Notably, our S-DQN and S-PPO agents not only significantly
outperform existing smoothed agents by an average factor of $2.16\times$ under
the strongest attack, but also surpass previous robustly-trained agents by an
average factor of $2.13\times$. This represents a significant leap forward in
the field. Furthermore, we introduce Smoothed Attack, which is $1.89\times$
more effective in decreasing the rewards of smoothed agents than existing
adversarial attacks.

我们提出了 S-DQN 和 S-PPO 方法，通过对现有平滑代理的改进，在标准 RL 基准测试中显著提高了干净奖励、经验鲁棒性和鲁棒性保证，平均因子分别为 $2.16	imes$ 和 $2.13	imes$。此外，我们引入了 Smoothed Attack，比现有对抗性攻击方法降低平滑代理奖励的效果提高了 $1.89	imes$。