BriefGPT.xyz
May, 2024
为多模态LLM自动编码Morph-Tokens
Auto-Encoding Morph-Tokens for Multimodal LLM
HTML
PDF
Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow...
TL;DR
多模式LLMs的新方法利用形态标记解决了视觉理解和生成之间的冲突,并在多模式理解和生成中取得了SOTA结果。
Abstract
For
multimodal llms
, the synergy of
visual comprehension
(textual output) and
generation
(visual output) presents an ongoing challenge. Th
→