Fine-tuning is the primary methodology for tailoring pre-trained large
language models to specific tasks. As the model's scale and the diversity of
tasks expand, parameter-efficient fine-tuning methods are of paramount
importance. One of the most widely used family of methods is low-rank
adaptation (LoRA) and its variants. LoRA encodes weight update as the product
of two low-rank matrices. Despite its advantages, LoRA falls short of
full-parameter fine-tuning in terms of generalization error for certain tasks.
We introduce Chain of LoRA (COLA), an iterative optimization framework
inspired by the Frank-Wolfe algorithm, to bridge the gap between LoRA and full
parameter fine-tuning, without incurring additional computational costs or
memory overheads. COLA employs a residual learning procedure where it merges
learned LoRA modules into the pre-trained language model parameters and
re-initilize optimization for new born LoRA modules. We provide theoretical
convergence guarantees as well as empirical results to validate the
effectiveness of our algorithm. Across various models (OPT and llama-2) and
seven benchmarking tasks, we demonstrate that COLA can consistently outperform
LoRA without additional computational or memory costs.

通过梯度投影方法，我们提出了一种新的迭代优化框架 COLA，通过将学习的链式 LoRA 模块与预训练的语言模型参数进行融合，并为新生成的 LoRA 模块重新初始化优化过程，从而在无需额外的计算和内存成本的情况下弥合了 LoRA 和完全参数微调之间的差距。