BriefGPT.xyz
Nov, 2023
自动文本规范化用于仇恨言论检测
Automatic Textual Normalization for Hate Speech Detection
HTML
PDF
Anh Thi-Hoang Nguyen, Dung Ha Nguyen, Nguyet Thi Nguyen, Khanh Thanh-Duy Ho, Kiet Van Nguyen
TL;DR
社交媒体数据是研究的宝贵资源,非标准词汇是对NLP工具运行的一种障碍。我们采用了一种简单的序列到序列模型,通过文本规范化的实验结果显示准确率接近70%,同时也提升了2%左右的仇恨言论检测任务的准确性,展示了提高复杂NLP任务性能的潜力。
Abstract
social media data
is a valuable resource for research, yet it contains a wide range of
non-standard words
(NSW). These irregularities hinder the effective operation of NLP tools. Current state-of-the-art methods
→