BriefGPT.xyz
Jan, 2024
看见看不见的:语义放置的视觉常识
Seeing the Unseen: Visual Common Sense for Semantic Placement
HTML
PDF
Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng...
TL;DR
通过自动化处理网络数据生成一组数据集,其中包括了有和没有对象的图像,并使用该数据集构建了一个名为CLIP-UNet的SP预测模型,该模型在真实和模拟图像上超过了现有的视觉语义模型和基线,并证明了其在室内环境中构建整洁机器人等下游应用的潜力。
Abstract
computer vision
tasks typically involve describing what is present in an image (e.g. classification, detection, segmentation, and captioning). We study a
visual common sense
task that requires understanding what
→