Confusion and forgetting of object classes have been challenges of prime interest in Few-Shot Object Detection (FSOD). To overcome these pitfalls in metric learning based FSOD techniques, we introduce a novel Submodular Mutual Information Learning (SMILe) framework which adopts combinatorial mutual information functions to enforce the creation of tighter and discriminative feature clusters in FSOD. Our proposed approach generalizes to several existing approaches in FSOD, agnostic of the backbone architecture demonstrating elevated performance gains. A paradigm shift from instance based objective functions to combinatorial objectives in SMILe naturally preserves the diversity within an object class resulting in reduced forgetting when subjected to few training examples. Furthermore, the application of mutual information between the already learnt (base) and newly added (novel) objects ensures sufficient separation between base and novel classes, minimizing the effect of class confusion. Experiments on popular FSOD benchmarks, PASCAL-VOC and MS-COCO show that our approach generalizes to State-of-the-Art (SoTA) approaches improving their novel class performance by up to 5.7% (3.3 mAP points) and 5.4% (2.6 mAP points) on the 10-shot setting of VOC (split 3) and 30-shot setting of COCO datasets respectively. Our experiments also demonstrate better retention of base class performance and up to 2x faster convergence over existing approaches agnostic of the underlying architecture.

为了克服Few-Shot Object Detection中存在的物体类别混淆和遗忘的问题，我们引入了一种新颖的Submodular Mutual Information Learning（SMILe）框架，该框架采用组合互信息函数来强调在FSOD中创建更紧密和具有区分性的特征簇。我们的方法广泛适用于FSOD中的几种现有方法，不受骨干架构的影响，并展示了提高性能的效果。在流行的FSOD基准测试PASCAL-VOC和MS-COCO上的实验证明，我们的方法改善了State-of-the-Art（SoTA）方法在新类别的性能，VOC（split 3）的10-shot设置中性能提升高达5.7%（3.3 mAP points），COCO数据集的30-shot设置中性能提升高达5.4%（2.6 mAP points）。我们的实验还表明，与现有方法相比，基类性能更好，并且在不考虑底层架构的情况下收敛速度提高了2倍。

SMILe: 基于子模互信息的强韧少样本目标检测