In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model depth, addressing the redundancy observed across layer representations for various input samples. Our framework is integrated with the Supervised Fine-Tuning (SFT) stage, eliminating the need for resource-intensive Continual Pre-Training (CPT). Experimental results demonstrate that DLO not only outperforms the original unscaled models but also achieves comparable results to densely expanded models with significantly improved efficiency. Our work offers a promising direction for building efficient yet powerful LLMs. We will release our implementation and model weights upon acceptance.

本文介绍了一种名为动态层操作（DLO）的新方法，通过基于层内特征相似性的复杂路由策略，动态地扩展、激活或跳过层来实现对基于Transformer的大型语言模型（LLMs）的垂直扩展。与传统的专家混合（MoE）方法专注于扩展模型的宽度不同，我们的方法针对的是模型的深度，解决了各个输入样本的层表示中存在的冗余问题。我们的框架集成了监督微调（SFT）阶段，消除了资源密集型的持续预训练（CPT）的需求。实验结果表明，DLO不仅优于原始的未扩展模型，而且在显著提高效率的同时，实现了与密集扩展模型相当的结果。我们的工作为构建高效而强大的LLMs提供了一个有希望的方向。一旦被接受，我们将发布我们的实现和模型权重。

DLO：用于LLMs高效垂直扩展的动态层操作