The traditional notion of "Junk DNA" has long been linked to non-coding segments within the human genome, constituting roughly 98% of its composition. However, recent research has unveiled the critical roles some of these seemingly non-functional DNA sequences play in cellular processes. Intriguingly, the weights within deep neural networks exhibit a remarkable similarity to the redundancy observed in human genes. It was believed that weights in gigantic models contained excessive redundancy, and could be removed without compromising performance. This paper challenges this conventional wisdom by presenting a compelling counter-argument. We employ sparsity as a tool to isolate and quantify the nuanced significance of low-magnitude weights in pre-trained large language models (LLMs). Our study demonstrates a strong correlation between these weight magnitudes and the knowledge they encapsulate, from a downstream task-centric angle. we raise the "Junk DNA Hypothesis" backed by our in-depth investigation: while small-magnitude weights may appear "useless" for simple tasks and suitable for pruning, they actually encode crucial knowledge necessary for solving more difficult downstream tasks. Removing these seemingly insignificant weights can lead to irreversible knowledge forgetting and performance damage in difficult tasks. These findings offer fresh insights into how LLMs encode knowledge in a task-sensitive manner, pave future research direction in model pruning, and open avenues for task-aware conditional computation during inference.

通过稀疏性作为工具，本研究证明预训练大型语言模型中的低幅度权重与它们所承载的知识之间存在强相关性，支持了我们对“垃圾基因”假设的深入调查，揭示了删除这些看似不重要的权重可能导致不可逆的知识遗忘和性能损害，为LLMs如何以任务敏感的方式编码知识提供了新的见解，为模型剪枝开辟了未来的研究方向，并为推理期间的任务感知条件计算开辟了道路。

垃圾DNA假设：通过稀疏性对LLM预训练权重的任务为中心的视角