BriefGPT.xyz
Oct, 2024
安全感知的大型语言模型微调
Safety-Aware Fine-Tuning of Large Language Models
HTML
PDF
Hyeong Kyu Choi, Xuefeng Du, Yixuan Li
TL;DR
本研究解决了大型语言模型微调中存在的安全隐患问题,通过提出一种新颖的安全感知微调(SAFT)框架,自动检测和移除可能有害的数据样本。实验结果显示,该框架在多种语言模型和数据污染率下有效降低了有害性,最高达27.8%,具有良好的适用性和实用价值。
Abstract
Fine-Tuning
Large Language Models
(LLMs) has emerged as a common practice for tailoring models to individual needs and preferences. The choice of datasets for
→