BriefGPT.xyz
Oct, 2023
Kosmos-G: 使用多模态大型语言模型生成上下文中的图像
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
HTML
PDF
Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen...
TL;DR
Kosmos-G 是一个模型,利用 Multimodal Large Language Models(MLLMs)的视觉感知能力来生成来自泛化视觉-语言输入的图像,尤其是涉及多张图像的情况。
Abstract
Recent advancements in
text-to-image
(T2I) and
vision-language
-to-image (VL2I)
generation
have made significant strides. However, the
→