BriefGPT.xyz
Jun, 2020
学习视觉通识以实现强健场景图生成
Learning Visual Commonsense for Robust Scene Graph Generation
HTML
PDF
Alireza Zareian, Haoxuan You, Zhecan Wang, Shih-Fu Chang
TL;DR
论文提出了一种通过获取视觉常识来改善场景图生成模型的鲁棒性的方法,并使用 Transformer 模型结合场景图结构训练了 GLAT 模型,该模型可以纠正明显的错误。通过实验证明,该模型比其他方法更好地学习了视觉常识,并提高了最先进场景图生成模型的准确性。
Abstract
scene graph
generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild. Perception errors often lead to nonsensical compositions in the output
→