Prompting and contextual-based fine-tuning methods, which we call Prefix
Learning, have been proposed to enhance the performance of language models on
various downstream tasks that can match full parameter fine-tuning. There
remains a limited theoretical understanding of how these methods work. In this
paper, we aim to relieve this limitation by studying the learning ability of
Prefix Learning from the perspective of prefix length. In particular, we
approximate the infinite-long Prefix Learning optimization process by the
Neural Tangent Kernel (NTK) technique. We formulate and solve it as a learning
problem of the infinite-long prefix in a one-layer attention network. Our
results confirm the over-parameterization property and arbitrary small loss
convergence guarantee of the infinite-long Prefix Learning in attention. To the
implementation end, we propose our NTK-Attention method, which is "equivalent"
to attention computation with arbitrary prefix length efficiently. Its time
complexity mainly depends on the sub-quadratic of input length (without
prefix), and our method only requires $d^2 + d$ extra parameters for
representation, where $d$ is the feature dimension. In addition, we conducted
experiments that compare our NTK-Attention with full parameters fine-tuning,
LoRA, and P-Tuning V2 methods across vision or natural language datasets. The
results indicate our approach may be a promising
parameter-efficient-fine-tuning method since it has demonstrated superior
performance in numerous scenarios. Our code can be found at
https://github.com/ChristianYang37/chiwun/tree/main/src/NTK-Attention.

研究了前缀学习的学习能力，通过无限长度前缀在一层注意力网络中的表达和解决问题，证实了无限长度前缀学习在注意力中的过度参数化性质和任意小的损失收敛性保证。提出了 NTK-Attention 方法，可实现任意前缀长度的注意力计算，具有参数效率高、在多种场景中表现优越的潜力。