BriefGPT.xyz
Jun, 2023
梯度上升后训练增强了语言模型的泛化能力
Gradient Ascent Post-training Enhances Language Model Generalization
HTML
PDF
Dongkeun Yoon, Joel Jang, Sungdong Kim, Minjoon Seo
TL;DR
本文发现使用梯度上升后训练预训练语言模型可以增强其零样本泛化能力,特别是使用Gradient Ascent Post-training方法可以让语言模型在12个不同的NLP任务上达到与2-3倍大的模型相媲美的水平,并且可以提高LM的泛化能力而无需进行任何特定任务的微调。
Abstract
In this work, we empirically show that updating
pretrained lms
(350M, 1.3B, 2.7B) with just a few steps of
gradient ascent post-training
(GAP) on random, unlabeled text corpora enhances its zero-shot generalizati
→