Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities. However, these models are inherently prone to various biases stemming from their training data. These include selection, linguistic, and confirmation biases, along with common stereotypes related to gender, ethnicity, sexual orientation, religion, socioeconomic status, disability, and age. This study explores the presence of these biases within the responses given by the most recent LLMs, analyzing the impact on their fairness and reliability. We also investigate how known prompt engineering techniques can be exploited to effectively reveal hidden biases of LLMs, testing their adversarial robustness against jailbreak prompts specially crafted for bias elicitation. Extensive experiments are conducted using the most widespread LLMs at different scales, confirming that LLMs can still be manipulated to produce biased or inappropriate responses, despite their advanced capabilities and sophisticated alignment processes. Our findings underscore the importance of enhancing mitigation techniques to address these safety issues, toward a more sustainable and inclusive artificial intelligence.

该研究探讨了最近的大型语言模型中存在的偏见，分析其对公正性和可靠性的影响，并研究了如何利用已知的提示工程技术来揭示大型语言模型的隐藏偏见，并对其进行了针对偏见引诱而设计的越狱提示的对抗强度测试。通过对不同规模的最广泛使用的大型语言模型进行广泛实验，证实了尽管这些模型具有先进的能力和复杂的对齐过程，但仍然可以操纵它们产生有偏见或不适当的回应，强调了加强缓解技术以解决这些安全问题的重要性，朝着更可持续和包容的人工智能发展。

大规模语言模型真的没有偏见吗？通过越狱提示评估偏见诱导的对抗鲁棒性