Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across various domains. However, the implementation in Computer Vision remains limited, and often requires large-scale datasets comprising billions of samples. In this study, we investigate t