ALBERT: 自监督学习语言表示的轻量级BERT

Sep, 2019

ALBERT: 自监督学习语言表示的轻量级BERT

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma...

TL;DR本文提出两种参数缩减技术，结合一个以自我监督方式处理句间连贯的损失函数，成功地使用更少的参数将BERT模型的性能扩展至其他基准测试集，包括GLUE，RACE和SQuAD。

Abstract

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memor