In this work, we systematically investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models. Despite the potential of dynamic activation methods to reduce computation and increase speed in models using the ReLU activation function, our empirical findings have uncovered several inherent pitfalls in the current dynamic activation schemes. Through extensive experiments across various dynamic activation strategies, we demonstrate that LLaMA models usually underperform when compared to their ReLU counterparts, particularly in scenarios demanding high sparsity ratio. We attribute these deficiencies to a combination of factors: 1) the inherent complexity of dynamically predicting activation heads and neurons; 2) the inadequate sparsity resulting from activation functions; 3) the insufficient preservation of information resulting from KV cache skipping. Our analysis not only sheds light on the limitations of dynamic activation in the context of large-scale LLaMA models but also proposes roadmaps for enhancing the design of future sparsity schemes.

我们对LLaMA系列语言模型中动态激活机制的功效进行了系统的调查，发现了当前动态激活方案存在的一些内在缺陷。通过对各种动态激活策略进行广泛的实验证明，与ReLU激活函数的对应模型相比，LLaMA模型在要求高稀疏比率的场景中通常表现较差。我们将这些缺陷归因于以下几个因素：1）动态预测激活头和神经元的内在复杂性；2）激活函数引起的不充分稀疏性；3）KV缓存跳过引起的信息不充分保留。我们的分析不仅揭示了大规模LLaMA模型中动态激活的局限性，还提出了增强未来稀疏方案设计的路线图。

LLaMA模型中的动态激活陷阱：实证研究