Despite their numerous successes, there are many scenarios where adversarial
risk metrics do not provide an appropriate measure of robustness. For example,
test-time perturbations may occur in a probabilistic manner rather than being
generated by an explicit adversary, while the poor train--test generalization
of adversarial metrics can limit their usage to