BriefGPT.xyz
Jan, 2023
多模态生成:将语言模型与图像相结合
Grounding Language Models to Images for Multimodal Generation
HTML
PDF
Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried
TL;DR
该研究提出了一种有效的方法,将预训练的纯文本语言模型转移到视觉领域,使其能够处理和生成任意交错的图像和文本数据,并在上下文图像检索和多模态对话等方面实现了强有力的效果。
Abstract
We propose an efficient method to ground pretrained text-only
language models
to the
visual domain
, enabling them to process and generate arbitrarily interleaved image-and-text data. Our method leverages the abil
→