We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of training on poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of evaluation settings (e.g., alleviating membership inference attacks), they fail to remove the effects of data poisoning, across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. Our results suggest that a broader perspective, including a wider variety of evaluations, is required to avoid a false sense of confidence in machine unlearning procedures for deep learning without provable guarantees. Moreover, while unlearning methods show some signs of being useful to efficiently remove poisoned datapoints without having to retrain, our work suggests that these methods are not yet "ready for prime time", and currently provide limited benefit over retraining.

我们重新审视了用于大规模深度学习的几种近似机器遗忘方法的功效。虽然现有的遗忘方法在一些评估设置下表现出了有效性，但我们实验证明它们无法消除数据污染的影响，在各种类型的污染攻击和模型中都表现出失败的情况。我们引入了基于数据污染的遗忘评估指标，结果表明需要更广泛的视角来避免对没有可证保证的深度学习机器遗忘程序产生虚假的自信。此外，我们的工作表明尽管遗忘方法在有效消除毒害数据点方面显示出一些迹象且无需重新训练，但目前这些方法还不是“时机已到”，并且相对于重新训练而言带来的好处有限。

机器反学习无法消除数据投毒攻击