authorship attribution is the task of identifying the author of a given text.
The key is finding representations that can differentiate between authors.
Existing approaches typically use manually designed features that capture a
dataset's content and style, but these approaches are dat
在这篇论文中,我们提出了一个任务:主题混淆,用于区分写作风格捕捉能力不足还是主题转换造成的错误,我们表明带有词性标注的文体特征对主题变化最不敏感,将它们与其他特征相结合可显著降低主题混淆并提高归属准确性,最后表明像 BERT 和 RoBERTa 等预训练语言模型在这项任务中表现不佳,远不如诸如单词级 n 元语法等简单特征。