Computer vision tasks are traditionally defined and evaluated using semantic categories. However, it is known to the field that semantic classes do not necessarily correspond to a unique visual class (e.g. inside and outside of a car). Furthermore, many of the feasible learning techniques at hand cannot model a visual class which appears consistent to the human eye. These problems have motivated the use of 1) Unsupervised or supervised clustering as a preprocessing step to identify the visual subclasses to be used in a mixture-of-experts learning regime. 2) Felzenszwalb et al. part model and other works model mixture assignment with latent variables which is optimized during learning 3) Highly non-linear classifiers which are inherently capable of modelling multi-modal input space but are inefficient at the test time. In this work, we promote an incremental view over the recognition of semantic classes with varied appearances. As the first attempt, by taking a new incremental optimization approach, we find maximal visual subclasses in a regularized risk minimization framework that can be modelled using simple classifiers (e.g. Linear SVM) while preventing over-fitting (e.g. large margin). Following this approach we show both qualitatively and quantitatively significant results on the object detection task of PASCAL VOC compared to the state of the art methods.

本文提出了一种增量学习方法，将聚类和分类步骤统一在单个算法中，以发现在规则化风险最小化框架下的最大视觉子类，这可在计算机视觉任务的语义类别中增加对外观不同的视觉子类的识别，同时发现DPM等物体检测方法无法利用这些视觉子类中的50%的训练样本。

自我调节视觉子类学习与共享样本的增量方法