Object-centric architectures can learn to extract distinct object representations from visual scenes, enabling downstream applications on the object level. Similarly to autoencoder-based image models, object-centric approaches have been trained on the unsupervised reconstruction loss of images encoded by RGB color spaces. In our work, we challenge the common assumption that RGB images are the optimal color space for unsupervised learning in computer vision. We discuss conceptually and empirically that other color spaces, such as HSV, bear essential characteristics for object-centric representation learning, like robustness to lighting conditions. We further show that models improve when requiring them to predict additional color channels. Specifically, we propose to transform the predicted targets to the RGB-S space, which extends RGB with HSV's saturation component and leads to markedly better reconstruction and disentanglement for five common evaluation datasets. The use of composite color spaces can be implemented with basically no computational overhead, is agnostic of the models' architecture, and is universally applicable across a wide range of visual computing tasks and training types. The findings of our approach encourage additional investigations in computer vision tasks beyond object-centric learning.

本研究解决了RGB色彩空间在计算机视觉中无监督学习效果未必最佳的问题。我们提出了基于HSV色彩空间的对象中心表示学习的新方法，展示了预测额外颜色通道可以显著改善重建和解耦性能。我们的发现对视觉计算任务具有广泛的应用潜力，推动了对其他计算机视觉任务的进一步研究。

利用颜色通道独立性提升无监督物体检测