The Mixture-of-Experts (MoE) model has succeeded in deep learning (DL). However, its complex architecture and advantages over dense models in image classification remain unclear. In previous studies, MoE performance has often been affected by noise and outliers in the input space. Some approaches incorporate input clustering for training MoE models, but most clustering algorithms lack access to labeled data, limiting their effectiveness. This paper introduces the Double-stage Feature-level Clustering and Pseudo-labeling-based Mixture of Experts (DFCP-MoE) framework, which consists of input feature extraction, feature-level clustering, and a computationally efficient pseudo-labeling strategy. This approach reduces the impact of noise and outliers while leveraging a small subset of labeled data to label a large portion of unlabeled inputs. We propose a conditional end-to-end joint training method that improves expert specialization by training the MoE model on well-labeled, clustered inputs. Unlike traditional MoE and dense models, the DFCP-MoE framework effectively captures input space diversity, leading to competitive inference results. We validate our approach on three benchmark datasets for multi-class classification tasks.

本研究针对混合专家模型在深度学习中的复杂性和对噪声及异常值的敏感性问题，提出了一种双阶段特征级聚类及伪标签的混合专家框架（DFCP-MoE）。该框架通过特征提取、特征级聚类和伪标签策略，显著提升了模型在标注数据稀缺情况下的性能，增强了专家的专业化，并在多分类任务中取得了竞争力的推断结果。

双阶段特征级聚类基础的专家混合框架