In this paper, we present Delta-LoRA, which is a novel parameter-efficient approach to fine-tune large language models (LLMs). In contrast to LoRA and other low-rank adaptation methods such as AdaLoRA, Delta-LoRA not only updates the low-rank matrices $\bA$ and $\bB$, but also propagate the learning to the pre-trained weights $\bW$ via updates utilizing the delta of the product of two low-rank matrices ($\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$). Such a strategy effectively addresses the limitation that the incremental update of low-rank matrices is inadequate for learning representations capable for downstream tasks. Moreover, as the update of $\bW$ does not need to compute the gradients of $\bW$ and store their momentums, Delta-LoRA shares comparable memory requirements and computational costs with LoRA. Extensive experiments show that Delta-LoRA significantly outperforms existing low-rank adaptation methods. We further support these results with comprehensive analyses that underscore the effectiveness of Delta-LoRA.

本文介绍了Delta-LoRA，这是一种新颖的参数高效的方法，用于微调大型语言模型（LLMs）。与LoRA和其他低秩适应方法相比，Delta-LoRA不仅更新低秩矩阵A和B，还通过利用两个低秩矩阵的乘积的增量将学习传播到预训练权重W，从而有效地解决了低秩矩阵的增量更新对于学习适用于下游任务的表示的不足。此外，由于W的更新不需要计算W的梯度并存储其动量，Delta-LoRA与LoRA具有相当的内存需求和计算成本。大量实验表明，Delta-LoRA明显优于现有的低秩适应方法。我们通过全面的分析进一步支持了这些结果，强调了Delta-LoRA的有效性。

Delta-LoRA：用低秩矩阵的增量微调高秩参数