Large Language Models (LLMs) are central to a multitude of applications but
struggle with significant risks, notably in generating harmful content and
biases. Drawing an analogy to the human psyche's conflict between evolutionary
survival instincts and societal norm adherence elucidated in Freud's
psychoanalysis theory, we argue that LLMs suffer a similar fundamental
conflict, arising between their inherent desire for syntactic and semantic
continuity, established during the pre-training phase, and the post-training
alignment with human values. This conflict renders LLMs vulnerable to
adversarial attacks, wherein intensifying the models' desire for continuity can
circumvent alignment efforts, resulting in the generation of harmful
information. Through a series of experiments, we first validated the existence
of the desire for continuity in LLMs, and further devised a straightforward yet
powerful technique, such as incomplete sentences, negative priming, and
cognitive dissonance scenarios, to demonstrate that even advanced LLMs struggle
to prevent the generation of harmful information. In summary, our study
uncovers the root of LLMs' vulnerabilities to adversarial attacks, hereby
questioning the efficacy of solely relying on sophisticated alignment methods,
and further advocates for a new training idea that integrates modal concepts
alongside traditional amodal concepts, aiming to endow LLMs with a more nuanced
understanding of real-world contexts and ethical considerations.

我们的研究揭示了大型语言模型在面临对抗性攻击时的脆弱性的根源，质疑仅仅依赖复杂的对齐方法的有效性，并进一步主张将模态概念与传统的非模态概念相结合，为大型语言模型赋予对现实世界环境以及伦理考虑更细致的理解。