Language model pre-training based on large corpora has achieved tremendous success in terms of constructing enriched contextual representations and has led to significant performance gains on a diverse range of Natural Language Understanding (NLU) tasks. Despite the success, most current pre-trained language models, such as BERT, are trained based on single-grained tokenization, usually with fine-grained characters or sub-words, making it hard for them to learn the precise meaning of coarse-grained words and phrases. In this paper, we propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi-grained information of input text. Our method can be applied to various pre-trained language models and improve their representation capability. Extensive experiments conducted on CLUE and SuperGLUE demonstrate that our method achieves comprehensive improvements on a wide variety of NLU tasks in both Chinese and English with little extra inference cost incurred, and that our best ensemble model achieves the state-of-the-art performance on CLUE benchmark competition.

本文提出一种名为LICHEE的简单而有效的预训练方法，旨在高效地融合输入文本的多粒度信息，以提高各种预先训练的语言模型的表示能力。实验结果表明，我们的方法在广泛的NLU任务中取得了全面的改进，并且我们的最佳集成模型在CLUE基准大赛上达到了最先进的性能。

LICHEE:基于多粒度分词的语言模型预训练优化