BriefGPT.xyz
Feb, 2023
将视觉Transformer扩展至220亿参数
Scaling Vision Transformers to 22 Billion Parameters
HTML
PDF
Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek...
TL;DR
本文介绍了高效稳定地训练一个22B参数的Vision Transformers(ViT-22B)的方法,并在结果模型上进行了大量实验。ViT-22B展示了在视觉领域实现LLM般的扩展的潜力,并提供了部分实现的关键步骤。
Abstract
The scaling of
transformers
has driven breakthrough capabilities for
language models
. At present, the largest large
language models
(LLMs)
→