多模态循环神经网络（m-RNN）实现的深度字幕生成

Dec, 2014

多模态循环神经网络（m-RNN）实现的深度字幕生成

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan Yuille

TL;DR本文提出了一种基于多模态循环神经网络的模型，用于生成图像标题，并在四个基准数据集上验证了该模型的有效性。

Abstract

In this paper, we present a multimodal recurrent neural network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an