Self-supervised learning has achieved remarkable success in acquiring high-quality representations from unlabeled data. The widely adopted contrastive learning framework aims to learn invariant representations by minimizing the distance between positive views originating from the same image. However, existing techniques to construct positive views highly rely on manual transformations, resulting in limited diversity and potentially false positive pairs. To tackle these challenges, we present GenView, a controllable framework that augments the diversity of positive views leveraging the power of pretrained generative models while preserving semantics. We develop an adaptive view generation method that dynamically adjusts the noise level in sampling to ensure the preservation of essential semantic meaning while introducing variability. Additionally, we introduce a quality-driven contrastive loss, which assesses the quality of positive pairs by considering both foreground similarity and background diversity. This loss prioritizes the high-quality positive pairs we construct while reducing the influence of low-quality pairs, thereby mitigating potential semantic inconsistencies introduced by generative models and aggressive data augmentation. Thanks to the improved positive view quality and the quality-driven contrastive loss, GenView significantly improves self-supervised learning across various tasks. For instance, GenView improves MoCov2 performance by 2.5%/2.2% on ImageNet linear/semi-supervised classification. Moreover, GenView even performs much better than naively augmenting the ImageNet dataset with Laion400M or ImageNet21K. Code is available at https://github.com/xiaojieli0903/genview.

自监督学习通过从无标签数据中获取高质量的表示已经取得了显著的成功。GenView是一个可控的框架，通过增加积极视角的多样性利用预训练生成模型的能力，同时保留语义。研究中引入了自适应视角生成方法来调整采样噪声水平，以确保保留基本语义意义并引入变异性。此外，引入了质量驱动的对比损失，通过考虑前景相似性和背景多样性评估正样本对的质量。GenView明显改善了各种任务中的自监督学习性能，例如，在ImageNet线性/半监督分类上，GenView将MoCov2的性能提高了2.5％/2.2％。此外，与简单地使用Laion400M或ImageNet21K扩充ImageNet数据集相比，GenView的性能更好。

GenView: 使用预训练生成模型提升自监督学习的视图质量