梯度上升后训练增强了语言模型的泛化能力

Jun, 2023

梯度上升后训练增强了语言模型的泛化能力

Gradient Ascent Post-training Enhances Language Model Generalization

Dongkeun Yoon, Joel Jang, Sungdong Kim, Minjoon Seo

TL;DR本文发现使用梯度上升后训练预训练语言模型可以增强其零样本泛化能力，特别是使用Gradient Ascent Post-training方法可以让语言模型在12个不同的NLP任务上达到与2-3倍大的模型相媲美的水平，并且可以提高LM的泛化能力而无需进行任何特定任务的微调。

Abstract

In this work, we empirically show that updating pretrained lms (350M, 1.3B, 2.7B) with just a few steps of gradient ascent post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalizati