TL;DR本文提出了一种名为 “Polarity loss” 的新型 Loss 函数,通过度量学习来优化 “语义词汇表” 上的噪声语义嵌入,从而改进了零样本目标检测中的视觉 - 语义对齐,同时通过显式地最大化正负预测之间的差距,达到更好的差别化效果。
Abstract
Conventional object detection models require large amounts of training data.
In comparison, humans can recognize previously unseen objects by merely knowing
their semantic description. To mimic similar behaviour, zero-shot object
detection aims to recognize and localize 'unseen' object instances by using
only their semantic information. The model is first tr