Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks. Fine-tuning these pre-trained models on downstream datasets provides further significant performance gains, but this process has been challenging due to its extraordinary resource requirements. To this end, existing efforts focus on parameter-efficient fine-tuning, which, unfortunately, fail to capitalize on the powerful potential of full-parameter fine-tuning. In this work, we propose QFT, a novel Quantized Full-parameter Tuning framework for LLMs that enables memory-efficient fine-tuning without harming performance. Our framework incorporates two novel ideas: (i) we adopt the efficient Lion optimizer, which only keeps track of the momentum and has consistent update magnitudes for each parameter, an inherent advantage for robust quantization; and (ii) we quantize all model states and store them as integer values, and present a gradient flow and parameter update scheme for the quantized weights. As a result, QFT reduces the model state memory to 21% of the standard solution while achieving comparable performance, e.g., tuning a LLaMA-7B model requires only <30GB of memory, satisfied by a single A6000 GPU.

提出了一种名为QFT的新型量化全参数调优框架，可以实现内存高效调优而不损害性能。该框架采用高效的Lion优化器和整数值量化的模型状态存储方法，并提供了梯度流和参数更新方案。结果表明，QFT将模型状态内存减少到标准解决方案的21%，同时达到可比较的性能，例如，调优LLaMA-7B模型仅需30GB内存，一张A6000 GPU即可满足。

QFT：量子化的低资源LLM全参数调整