TL;DR本文研究 hate speech detection 模型的性能,构建了涵盖五个领域的大规模推文数据集,并利用变换器等算法,获得了至少5%(英文)和10%(土耳其语)的性能提升,在不同培训规模下具有强大的可扩展性和跨领域转移能力。
Abstract
The performance of hate speech detection models relies on the datasets on which the models are trained. Existing datasets are mostly prepared with a limited number of instances or hate domains that define hate topics. This hinders large-scale analysis and transfer learning with respect