In recent years, few-shot action recognition has attracted increasing attention. It generally adopts the paradigm of meta-learning. In this field, overcoming the overlapping distribution of classes and outliers is still a challenging problem based on limited samples. We believe the combination of Multi-modal and Multi-view can improve this issue depending on information complementarity. Therefore, we propose a method of Multi-view Distillation based on Multi-modal Fusion. Firstly, a Probability Prompt Selector for the query is constructed to generate probability prompt embedding based on the comparison score between the prompt embeddings of the support and the visual embedding of the query. Secondly, we establish a Multi-view. In each view, we fuse the prompt embedding as consistent information with visual and the global or local temporal context to overcome the overlapping distribution of classes and outliers. Thirdly, we perform the distance fusion for the Multi-view and the mutual distillation of matching ability from one to another, enabling the model to be more robust to the distribution bias. Our code is available at the URL: \url{https://github.com/cofly2014/MDMF}.

在最近几年，少量样本行为识别引起了越来越多的关注。该领域通常采用元学习的范式。在有限样本的基础上，克服类别的重叠分布和异常值仍然是一个具有挑战性的问题。我们相信多模态和多视角相结合可以改善这个问题，取决于信息的互补性。因此，我们提出了一种基于多模态融合的多视角蒸馏方法。首先，构建一个用于查询的概率提示选择器，根据支持样本的提示嵌入和查询的视觉嵌入之间的比较分数生成概率提示嵌入。其次，在每个视角中，我们将提示嵌入与视觉嵌入以及全局或局部时间上下文融合，克服类别的重叠分布和异常值。第三，我们对多视角进行距离融合，并进行互相之间的匹配能力蒸馏，使模型对分布偏差更加鲁棒。我们的代码可在以下网址找到：https://github.com/cofly2014/MDMF。

基於多模態融合的多視角教師蒸餾方法用於少樣本動作識別