揭示大型语言模型的脆弱性：对抗性诈骗检测与性能分析

Dec, 2024

揭示大型语言模型的脆弱性：对抗性诈骗检测与性能分析

Exposing LLM Vulnerabilities: Adversarial Scam Detection and Performance

Chen-Wei Chang, Shailik Sarkar, Shutonu Mitra, Qi Zhang, Hossein Salemi...

TL;DR本研究解决了大型语言模型（LLMs）在诈骗检测任务中对于对抗性诈骗信息的脆弱性问题。通过建立一个包含原始和对抗性诈骗信息的综合数据集，扩展了传统的诈骗检测二元分类为更细化的诈骗类型。研究发现，LLMs在对抗性例子面前表现出高误分类率，并提出了增强模型鲁棒性的策略。

Abstract

Can we trust Large Language Models (LLMs) to accurately predict scam? This paper investigates the vulnerabilities of LLMs when facing adversarial scam messages for the task of Scam Detection. We addressed this is