BriefGPT.xyz
May, 2023
HAAV: 图像字幕增强视图的层次聚合
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HTML
PDF
Chia-Wen Kuo, Zsolt Kira
TL;DR
该研究主要利用不同的图像编码方法,来提高图像描述的质量和数据效率,通过在编码视图之间提出对比损失的方式来提高编码质量,并且通过层次式解码器自适应地权衡编码视图的价值,得到了显著的性能提升。
Abstract
A great deal of progress has been made in
image captioning
, driven by research into how to encode the image using pre-trained models. This includes
visual encodings
(e.g. image grid features or detected objects)
→