BriefGPT.xyz
Jul, 2023
图像、视频、音频和语言任务的统一模型
Unified Model for Image, Video, Audio and Language Tasks
HTML
PDF
Mustafa Shukor, Corentin Dancette, Alexandre Rame, Matthieu Cord
TL;DR
通过UnIVAL统一模型,可以有效地支持图像、文本、视频和音频等多种模态任务,并通过模型权重插值实现多模态模型融合,展示其在特定领域的分布外泛化能力。
Abstract
large language models
(LLMs) have made the ambitious quest for generalist agents significantly far from being a fantasy. A key hurdle for building such general models is the diversity and heterogeneity of tasks and
moda
→