BriefGPT.xyz
Jul, 2024
Q-Adapter: 将您的LLM适配器训练为残差Q函数
Q-Adapter: Training Your LLM Adapter as a Residual Q-Function
HTML
PDF
Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia...
TL;DR
本论文介绍了一种名为Q-Adapter的新方法,该方法通过在下游偏好数据上学习一个逼近残差Q-function的模块,以定制化预训练的大型语言模型(LLM),在多个任务和安全对齐任务的实验中展现了对防止遗忘和学习新偏好方面的卓越性能。
Abstract
We consider the problem of adapting
large language models
(LLMs) pre-trained with
reinforcement learning
from Human Feedback (RLHF) to
downstream
→