Few-shot Learning aims to learn and distinguish new categories with a very limited number of available images, presenting a significant challenge in the realm of deep learning. Recent researchers have sought to leverage the additional textual or linguistic information of these rare categories with a pre-trained language model to facilitate learning, thus partially alleviating the problem of insufficient supervision signals. However, the full potential of the textual information and pre-trained language model have been underestimated in the few-shot learning till now, resulting in limited performance enhancements. To address this, we propose a simple but effective framework for few-shot learning tasks, specifically designed to exploit the textual information and language model. In more detail, we explicitly exploit the zero-shot capability of the pre-trained language model with the learnable prompt. And we just add the visual feature with the textual feature for inference directly without the intricate designed fusion modules in previous works. Additionally, we apply the self-ensemble and distillation to further enhance these components. Our extensive experiments conducted across four widely used few-shot datasets demonstrate that our simple framework achieves impressive results. Particularly noteworthy is its outstanding performance in the 1-shot learning task, surpassing state-of-the-art methods by an average of 3.0\% in classification accuracy. \footnote{We will make the source codes of the proposed framework publicly available upon acceptance. }.

为了解决深度学习中少样本学习的挑战，我们提出了一个简单而有效的框架，专门设计用于利用文本信息和语言模型，通过学习可调的提示来显式地利用预训练的语言模型的零样本能力，并且直接将视觉特征和文本特征进行推断而无需复杂设计的融合模块，进一步运用自集成和蒸馏来增强这些组件，在四个广泛使用的少样本数据集上进行了大量实验证明我们的简单框架取得了令人印象深刻的结果，特别值得注意的是，在1-shot学习任务中，我们的分类准确率平均超过基准方法3.0%。

少即是多: 多模态少样本学习的深入研究