BriefGPT.xyz
Sep, 2021
多语言BERT中注意力头的可剪枝性
On the Prunability of Attention Heads in Multilingual BERT
HTML
PDF
Aakriti Budhraja, Madhura Pande, Pratyush Kumar, Mitesh M. Khapra
TL;DR
通过对mBERT进行修剪,我们 quantifying 它的鲁棒性和逐层理解其重要性,结果表明缩减其注意力容量不会影响其鲁棒性。而在跨语言任务XNLI中,修剪会导致准确性下降,这表明跨语言转移的鲁棒性较低。此外,编码器层的重要性受语言族和预训练语料大小的影响。
Abstract
Large
multilingual models
, such as mBERT, have shown promise in
crosslingual transfer
. In this work, we employ
pruning
to quantify the
→