为多模态LLM自动编码Morph-Tokens

May, 2024

Auto-Encoding Morph-Tokens for Multimodal LLM

Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow...

TL;DR多模式LLMs的新方法利用形态标记解决了视觉理解和生成之间的冲突，并在多模式理解和生成中取得了SOTA结果。

Abstract

For multimodal llms, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge. Th