BriefGPT.xyz
Mar, 2022
论攻击性语言分类器的鲁棒性
On The Robustness of Offensive Language Classifiers
HTML
PDF
Jonathan Rusert, Zubair Shafiq, Padmini Srinivasan
TL;DR
该研究对社交媒体平台上的机器学习型进攻性语言分类器的鲁棒性做出了系统的分析,并证明了具有贪婪和注意力机制的词汇选择和上下文感知嵌入的攻击可将这些分类器的准确性降低50%以上,同时还能保持修改后文本的可读性和含义。
Abstract
social media platforms
are deploying
machine learning
based offensive language classification systems to combat hateful, racist, and other forms of offensive speech at scale. However, despite their real-world dep
→