BriefGPT.xyz
May, 2024
揭示LLM生成的对话中的隐蔽伤害和社会威胁
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
HTML
PDF
Preetam Prabhu Srikar Dammu, Hayoung Jung, Anjali Singh, Monojit Choudhury, Tanushree Mitra
TL;DR
评估LLMs生成的对话中的隐蔽伤害时发现,七种LLMs表现出了一些恶意观点,特别是在涉及到种姓等非西方概念时更为明显,且用一种看似中立的语言表达,容易逃过现有方法的检测。
Abstract
large language models
(LLMs) have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like
recruitment
tools. Despite their u
→