Aug, 2023
专家权重平均化:一种新的用于视觉 Transformer 的通用训练方案
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen...
TL;DRVision Transformers (ViTs) can be trained more efficiently using a modified Mixture-of-Experts (MoE) training scheme, where MoEs are utilized to replace certain parts of the ViT during training and converted back to the original ViT for inference, resulting in improved performance without increasing inference cost.