Unsupervised approaches to learning in neural networks are of substantial interest for furthering artificial intelligence, both because they would enable the training of networks without the need for large numbers of expensive annotations, and because they would be better models of the kind of general-purpose learning deployed by humans. However, unsupervised networks have long lagged behind the performance of their supervised counterparts, especially in the domain of large-scale visual recognition. Recent developments in training deep convolutional embeddings to maximize non-parametric instance separation and clustering objectives have shown promise in closing this gap. Here, we describe a method that trains an embedding function to maximize a metric of local aggregation, causing similar data instances to move together in the embedding space, while allowing dissimilar instances to separate. This aggregation metric is dynamic, allowing soft clusters of different scales to emerge. We evaluate our procedure on several large-scale visual recognition datasets, achieving state-of-the-art unsupervised transfer learning performance on object recognition in ImageNet, scene recognition in Places 205, and object detection in PASCAL VOC.

本文研究了如何利用无监督学习方法训练神经网络，通过优化局部聚合的度量方式实现相似的数据实例在嵌入空间中聚集在一起，从而实现在大规模视觉识别领域中的非监督迁移学习，实现了ImageNet、Places 205和PASCAL VOC数据集中物体识别、场景识别和物体检测方面的最佳性能。

无监督学习视觉嵌入的本地聚合