使用GRIT模型进行巴西葡萄牙语的图像字幕生成

Feb, 2024

使用GRIT模型进行巴西葡萄牙语的图像字幕生成

Image captioning for Brazilian Portuguese using GRIT model

Rafael Silva de Alencar, William Alberto Cruz Castañeda, Marcellus Amadeus

TL;DR这项研究提出了用于巴西葡萄牙语的图像标题模型的早期开发工作。我们采用了GRIT（基于网格和区域的图像标题Transformer）模型来完成这项工作。GRIT是一种仅使用Transformer的神经架构，有效地利用两个视觉特征来生成更好的标题。GRIT方法作为一种更高效的生成图像标题的提案出现。在这项工作中，我们调整了GRIT模型以在巴西葡萄牙语数据集上进行训练，以获得巴西葡萄牙语的图像标题方法。

Abstract

This work presents the early development of a model of image captioning for the brazilian portuguese language. We used the GRIT (Grid - and Region-based →