恢复生成模型的预Fine-Tuning权重

Feb, 2024

Recovering the Pre-Fine-Tuning Weights of Generative Models

Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen

TL;DR通过使用少量低秩（LoRA）微调模型，我们提出了一种名为Spectral DeTuning的方法，能够恢复出预微调模型的权重，利用这一新的漏洞攻击大规模模型。

Abstract

The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via →