Fine-tuning large language models (LLMs) has achieved remarkable performance
across various natural language processing tasks, yet it demands more and more
memory as model sizes keep growing. To address this issue, the recently
proposed Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs
using only forward passes, thereby avoiding the need for a backpropagation
graph. However, significant performance drops and a high risk of divergence
have limited their widespread adoption. In this paper, we propose the Adaptive
Zeroth-order Tensor-Train Adaption (AdaZeta) framework, specifically designed
to improve the performance and convergence of the ZO methods. To enhance
dimension-dependent ZO estimation accuracy, we introduce a fast-forward,
low-parameter tensorized adapter. To tackle the frequently observed divergence
issue in large-scale ZO fine-tuning tasks, we propose an adaptive query number
schedule that guarantees convergence. Detailed theoretical analysis and
extensive experimental results on Roberta-Large and Llama-2-7B models
substantiate the efficacy of our AdaZeta framework in terms of accuracy, memory
efficiency, and convergence speed.

通过提出 Adaptive Zeroth-order Tensor-Train Adaption (AdaZeta) 框架，本文致力于改进 ZO 方法的性能和收敛性，主要关注的问题包括维度相关的 ZO 估计准确性、大规模 ZO 微调任务中的发散问题，通过详细的理论分析和实验结果论证了 AdaZeta 框架在准确性、内存效率和收敛速度方面的有效性。