The training of Transformer models has revolutionized natural language processing and computer vision, but it remains a resource-intensive and time-consuming process. This paper investigates the applicability of the early-bird ticket hypothesis to optimize the training efficiency of Transformer models. We propose a methodology that combines iterative pruning, masked distance calculation, and selective retraining to identify early-bird tickets in various Transformer architectures, including ViT, Swin-T, GPT-2, and RoBERTa. Our experimental results demonstrate that early-bird tickets can be consistently found within the first few epochs of training or fine-tuning, enabling significant resource optimization without compromising performance. The pruned models obtained from early-bird tickets achieve comparable or even superior accuracy to their unpruned counterparts while substantially reducing memory usage. Furthermore, our comparative analysis highlights the generalizability of the early-bird ticket phenomenon across different Transformer models and tasks. This research contributes to the development of efficient training strategies for Transformer models, making them more accessible and resource-friendly. By leveraging early-bird tickets, practitioners can accelerate the progress of natural language processing and computer vision applications while reducing the computational burden associated with training Transformer models.

通过组合迭代剪枝、遮蔽距离计算和选择性重训练等方法，本研究调查并验证了早鸟票假设对Transformer模型训练效率的适用性。实验结果表明，在Transformer模型的训练或微调的前几个周期内，可以持续发现早鸟票，并且在显著减少资源占用的情况下，可以获得与未剪枝模型相媲美甚至更高的准确率。此外，对比分析突显了早鸟票现象在不同Transformer模型和任务中的普适性，为Transformer模型的高效训练策略的发展做出了贡献。通过利用早鸟票，从业者可以加速自然语言处理和计算机视觉应用的进展，同时减少Transformer模型训练的计算负担。

早期Transformer：通过提前中签抽奖实现Transformer模型的高效训练