Deep neural networks have demonstrated remarkable effectiveness across a wide range of tasks such as semantic segmentation. Nevertheless, these networks are vulnerable to adversarial attacks that add imperceptible perturbations to the input image, leading to false predictions. This vulnerability is particularly dangerous in safety-critical applications like automated driving. While adversarial examples and defense strategies are well-researched in the context of image classification, there is comparatively less research focused on semantic segmentation. Recently, we have proposed an uncertainty-based method for detecting adversarial attacks on neural networks for semantic segmentation. We observed that uncertainty, as measured by the entropy of the output distribution, behaves differently on clean versus adversely perturbed images, and we utilize this property to differentiate between the two. In this extended version of our work, we conduct a detailed analysis of uncertainty-based detection of adversarial attacks including a diverse set of adversarial attacks and various state-of-the-art neural networks. Our numerical experiments show the effectiveness of the proposed uncertainty-based detection method, which is lightweight and operates as a post-processing step, i.e., no model modifications or knowledge of the adversarial example generation process are required.

本研究针对深度神经网络在语义分割任务中易受对抗攻击的脆弱性进行了深入探讨，提出了一种基于不确定性的方法来检测这些攻击。研究发现，清晰图像与受扰图像在输出分布的不确定性（熵）上表现出显著差异，利用这一特性，提出的检测方法可有效识别对抗样本，且无需对模型进行修改或了解攻击生成过程。

通过不确定性估计检测语义分割中的对抗攻击：深入分析