BriefGPT.xyz
Oct, 2023
语言模型领先于扩散 - 分词器是视觉生成的关键
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
HTML
PDF
Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn...
TL;DR
通过引入MAGVIT-v2作为视觉分词器,本文展示了大型语言模型(LLMs)在图像和视频生成上优于扩散模型,并超越以前在视频压缩和动作识别任务中表现最佳的视频分词器。
Abstract
While
large language models
(
llms
) are the dominant models for generative tasks in language, they do not perform as well as
diffusion models
→