In the era of large language models, the demand for efficient use of
computational resources has become critically important. Although
parameter-efficient fine-tuning techniques have achieved results comparable to
full fine-tuning, their application during the pre-training phase poses
significant challenges. Specifically, employing parameter-efficient strategies
at the onset of pre-training can severely compromise efficiency, especially in
larger models. In this paper, building upon the fine-tuning method LoRA, we
introduce a novel parameter-efficient training technique that frequently alters
trainable part of parameters, facilitating effective pre-training. Our method
not only achieves memory reductions and computational overhead comparable to
current state-of-the-art parameter-efficient algorithms during the pre-training
phase but also maintains accuracy levels comparable to those of full
pre-training. We provide both theoretical analyses and empirical evidence to
demonstrate the effectiveness of our approach.

在大语言模型时代，提高计算资源的有效利用需求变得非常重要。本文基于 LoRA 精调方法，引入了一种新颖的参数高效训练技术，通过频繁改变可训练参数的一部分，提高了有效的预训练。我们的方法不仅在预训练阶段实现了内存和计算开销减少，与当前最先进的参数高效算法相当，而且保持了与完全预训练相当的准确性水平。我们提供了理论分析和实证证据来证明我们的方法的有效性。

通过动态参数调整彻底改变大型语言模型训练

Revolutionizing Large Language Model Training through Dynamic Parameter  Adjustment

Multimodal Large Language Models (MLLMs) are widely regarded as crucial in
the exploration of Artificial General Intelligence (AGI). The core of MLLMs
lies in their capability to achieve cross-modal alignment. To attain this goal,
current MLLMs typically follow a two-phase training paradigm: the pre-training
phase and the instruction-tuning phase. Despite their success, there are
shortcomings in the modeling of alignment capabilities within these models.
Firstly, during the pre-training phase, the model usually assumes that all
image-text pairs are uniformly aligned, but in fact the degree of alignment
between different image-text pairs is inconsistent. Secondly, the instructions
currently used for finetuning incorporate a variety of tasks, different tasks's
instructions usually require different levels of alignment capabilities, but
previous MLLMs overlook these differentiated alignment needs. To tackle these
issues, we propose a new multimodal large language model AlignGPT. In the
pre-training stage, instead of treating all image-text pairs equally, we assign
different levels of alignment capabilities to different image-text pairs. Then,
in the instruction-tuning phase, we adaptively combine these different levels
of alignment capabilities to meet the dynamic alignment needs of different
instructions. Extensive experimental results show that our model achieves
competitive performance on 12 benchmarks.

利用新的多模态大型语言模型 AlignGPT，通过在预训练阶段为不同的图像 - 文本对分配不同级别的对齐能力，并在指导微调阶段自适应地组合这些不同级别的对齐能力，以满足不同指令的动态对齐需求，取得了 12 个基准测试的竞争性性能。