Masked language modeling (MLM) is a widely used self-supervised pretraining objective, where a model needs to predict an original token that is replaced with a mask given contexts. Although simpler and computationally efficient pretraining objectives, e.g., predicting the first character of a masked token, have recently shown comparable results to MLM, no objectives with a masking scheme actually outperform it in downstream tasks. Motivated by the assumption that their lack of complexity plays a vital role in the degradation, we validate whether more complex masked objectives can achieve better results and investigate how much complexity they should have to perform comparably to MLM. Our results using GLUE, SQuAD, and Universal Dependencies benchmarks demonstrate that more complicated objectives tend to show better downstream results with at least half of the MLM complexity needed to perform comparably to MLM. Finally, we discuss how we should pretrain a model using a masked objective from the task complexity perspective.

本研究旨在探究更加复杂的目标掩码方案是否能够取得比Masked language modeling 更好的效果，并验证它们需要具备多少的复杂性才能够达到相似的性能；结果表明，相对于 Masked language modeling 来说，更加复杂的掩码方案能够在半数量级的复杂性下取得更好的效果，最后，我们还从任务复杂性的角度探讨了如何预训练模型。

掩码预训练目标的任务复杂度如何影响下游性能？