Employing language models to generate explanations for an incoming implicit
hate post is an active area of research. The explanation is intended to make
explicit the underlying stereotype and aid content moderators. The training
often combines top-k relevant knowledge graph (KG) tuples to provide world
knowledge and improve performance on standard metrics. Interestingly, our study
presents conflicting evidence for the role of the quality of KG tuples in
generating implicit explanations. Consequently, simpler models incorporating
external toxicity signals outperform KG-infused models. Compared to the
KG-based setup, we observe a comparable performance for SBIC (LatentHatred)
datasets with a performance variation of +0.44 (+0.49), +1.83 (-1.56), and
-4.59 (+0.77) in BLEU, ROUGE-L, and BERTScore. Further human evaluation and
error analysis reveal that our proposed setup produces more precise
explanations than zero-shot GPT-3.5, highlighting the intricate nature of the
task.

利用语言模型为隐藏的仇恨帖子生成解释是一个活跃的研究领域，解释的目的是明确潜在的刻板印象并帮助内容管理员。研究通常结合前 k 个相关的知识图谱 (KG) 元组来提供世界知识并改善标准指标的性能，但我们的研究提出了冲突的证据，认为 KG 元组的质量在生成隐含解释方面的作用不明确。因此，将外部毒性信号纳入的简化模型优于 KG 融合模型。在 SBIC（LatentHatred）数据集上，我们观察到相当的性能表现，BLEU、ROUGE-L 和 BERTScore 的性能变化分别为 + 0.44 (+0.49)、+1.83 (-1.56) 和 - 4.59 (+0.77)。进一步的人工评估和错误分析表明，我们提出的设置比零样本 GPT-3.5 产生了更精确的解释，突显了任务的复杂性。

Tox-BART：利用毒性属性生成隐含仇恨言论的解释

Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of  Implicit Hate Speech

Content moderators play a key role in keeping the conversation on social
media healthy. While the high volume of content they need to judge represents a
bottleneck to the moderation pipeline, no studies have explored how models
could support them to make faster decisions. There is, by now, a vast body of
research into detecting hate speech, sometimes explicitly motivated by a desire
to help improve content moderation, but published research using real content
moderators is scarce. In this work we investigate the effect of explanations on
the speed of real-world moderators. Our experiments show that while generic
explanations do not affect their speed and are often ignored, structured
explanations lower moderators' decision making time by 7.4%.

通过研究，我们发现，结构化的解释能够降低现实世界的内容审查员的决策时间 7.4%，而通用解释对其速度没有影响且常被忽视。