BriefGPT.xyz
May, 2022
简单递归改善遮蔽语言模型
Simple Recurrence Improves Masked Language Models
HTML
PDF
Tao Lei, Ran Tian, Jasmijn Bastings, Ankur P. Parikh
TL;DR
本文研究在Transformer架构中引入循环模块是否可以提高性能,实验结果表明,引入循环模块可以提高Transformer模型的稳定性和性能,而不需要使用低层次的性能优化方法,并且参数数量保持不变。
Abstract
In this work, we explore whether modeling recurrence into the
transformer architecture
can both be beneficial and efficient, by building an extremely simple
recurrent module
into the Transformer. We compare our m
→