伪标签就是你所需

Aug, 2022

Pseudo-Labels Are All You Need

Bogdan Kostić, Mathis Lucka, Julian Risch

TL;DR本文介绍了我们在 Text Complexity DE Challenge 2022 中提交的论文，我们的目标是预测德语学习者在 B 级别的德语句子的复杂度，我们的方法是依靠超过220,000个伪标签训练基于Transformer的模型，使用了德语维基百科和其他语料库。该伪标签方法表现出卓越的结果，并且不需要任何特征工程或额外的标注数据，易于适应其他领域和任务。

Abstract

Automatically estimating the complexity of texts for readers has a variety of applications, such as recommending texts with an appropriate complexity level to language learners or supporting the evaluation of text simpl