BriefGPT.xyz
Nov, 2014
多模态神经语言模型统一视觉-语义嵌入
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
HTML
PDF
Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel
TL;DR
本文提出了一种多模态学习的编码器-解码器模型,学习图像和文本的多模态联合嵌入空间和现代语言模型。使用LSTM进行句子编码,该模型在Flickr8K和Flickr30K数据集上表现出色。同时,该模型通过线性编码器捕捉到了空间算术中的多模态规律。
Abstract
Inspired by recent advances in
multimodal learning
and machine translation, we introduce an
encoder-decoder
pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel langu
→