BriefGPT.xyz
Feb, 2024
恢复生成模型的预Fine-Tuning权重
Recovering the Pre-Fine-Tuning Weights of Generative Models
HTML
PDF
Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen
TL;DR
通过使用少量低秩(LoRA)微调模型,我们提出了一种名为Spectral DeTuning的方法,能够恢复出预微调模型的权重,利用这一新的漏洞攻击大规模模型。
Abstract
The dominant paradigm in
generative modeling
consists of two steps: i)
pre-training
on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via
→