Pre-trained large language models (LLMs) based on Transformer have
demonstrated striking in-context learning (ICL) abilities. With a few
demonstration input-label pairs, they can predict the label for an unseen input
without any parameter updates. In this paper, we show an exciting phenomenon
that SVD-based weight pruning can enhance ICL performance, and more surprising,
pruning weights in deep layers often results in more stable performance
improvements in shallow layers. However, the underlying mechanism of those
findings still remains an open question. To reveal those findings, we conduct
an in-depth theoretical analysis by presenting the implicit gradient descent
(GD) trajectories of ICL and giving the mutual information based generalization
bounds of ICL via full implicit GD trajectories. This helps us reasonably
explain the surprising experimental findings. Besides, based on all our
experimental and theoretical insights, we intuitively propose a simple,
model-compression and derivative-free algorithm for downstream tasks in
enhancing ICL inference. Experiments on benchmark datasets and open source LLMs
display the method effectiveness\footnote{The code is available at
https://github.com/chen123CtrlS/EnhancingICL_SVDPruning}.

基于 Transfomer 的预训练大型语言模型（LLM）展示了令人惊叹的上下文学习能力（ICL）。在本文中，我们展示了基于 SVD 的权重剪枝可以增强 ICL 性能的有趣现象，并且在深层剪枝权重通常导致浅层性能的更稳定的改善。然而，这些发现的基本机制仍然是一个悬而未决的问题。为了揭示这些发现，我们通过展示 ICL 的隐式梯度下降（GD）轨迹，并通过完全的隐式 GD 轨迹给出基于互信息的 ICL 泛化界限进行了深入的理论分析。这有助于我们合理地解释令人惊讶的实验结果。此外，基于所有的实验和理论观察，我们直观地提出了一个用于增强 ICL 推断的简单、压缩模型和无导数的算法。在基准数据集和开源 LLM 上的实验证明了该方法的有效性。