Deep Reinforcement Learning (DRL) suffers from uncertainties and inaccuracies in the observation signal in realworld applications. Adversarial attack is an effective method for evaluating the robustness of DRL agents. However, existing attack methods targeting individual sampled actions have limited impacts on the overall policy distribution, particularly in continuous action spaces. To address these limitations, we propose the Distribution-Aware Projected Gradient Descent attack (DAPGD). DAPGD uses distribution similarity as the gradient perturbation input to attack the policy network, which leverages the entire policy distribution rather than relying on individual samples. We utilize the Bhattacharyya distance in DAPGD to measure policy similarity, enabling sensitive detection of subtle but critical differences between probability distributions. Our experiment results demonstrate that DAPGD achieves SOTA results compared to the baselines in three robot navigation tasks, achieving an average 22.03% higher reward drop compared to the best baseline.

本研究解决了深度强化学习在实际应用中由于观察信号的不确定性和不准确性所面临的问题。论文提出了一种新颖的分布感知投影梯度下降攻击（DAPGD），利用分布相似性作为梯度扰动输入，从而综合利用整个策略分布，而不仅仅依赖个别样本。实验结果显示，DAPGD在三项机器人导航任务中表现出色，相较于最佳基线平均提升了22.03%的奖励下降效果。

从策略分布角度重新思考强化学习中的对抗攻击