Aug, 2023

专家权重平均化:一种新的用于视觉 Transformer 的通用训练方案

TL;DRVision Transformers (ViTs) can be trained more efficiently using a modified Mixture-of-Experts (MoE) training scheme, where MoEs are utilized to replace certain parts of the ViT during training and converted back to the original ViT for inference, resulting in improved performance without increasing inference cost.