TL;DR本研究探讨了输入大小作为限制因素,并展示了使用 Big Bird 嵌入方法训练的分类器在 Reddit-L2 数据集上明显优于语言特征工程模型的性能,此方法的有效性和计算效率使其成为未来 NLI 研究的有希望的途径。
Abstract
native language identification (NLI) intends to classify an author's native
language based on their writing in another language. Historically, the task has
heavily relied on time-consuming linguistic feature engineering