BriefGPT.xyz
Nov, 2020
基于语言驱动的区域指针推进的可控图像字幕生成
Language-Driven Region Pointer Advancement for Controllable Image Captioning
HTML
PDF
Annika Lindh, Robert J. Ross, John D. Kelleher
TL;DR
本文提出了一种通过在语言结构中引入NEXT-token的方法来预测区域指针前进的时机的新方法,该方法可提高准确度并显著增加有效词汇量。
Abstract
controllable image captioning
is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated
natural language caption
→