BriefGPT.xyz
Mar, 2023
SPDF:大型语言模型的稀疏预训练和密集微调
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
HTML
PDF
Vithursan Thangarasa, Abhay Gupta, William Marshall, Tianda Li, Kevin Leong...
TL;DR
本文提出了一种基于稀疏先训练和密集微调的预训练语言模型方法,可将训练 FLOPs 的数量降低到原来的 2.5 倍,同时保持与密集基线相同的下游任务准确性。该方法为训练大规模 GPT 模型提供了一个可行的方向。
Abstract
The
pre-training
and
fine-tuning
paradigm has contributed to a number of breakthroughs in Natural Language Processing (
nlp
). Instead of di
→