Trust in AI is undermined by the fact that there is no science that predicts -- or that can explain to the public -- when an LLM's output (e.g. ChatGPT) is likely to tip mid-response to become wrong, misleading, irrelevant or dangerous. With deaths and trauma already being blamed on LLMs, this uncertainty is even pushing people to treat their 'pet' LLM more politely to 'dissuade' it (or its future Artificial General Intelligence offspring) from suddenly turning on them. Here we address this acute need by deriving from first principles an exact formula for when a Jekyll-and-Hyde tipping point occurs at LLMs' most basic level. Requiring only secondary school mathematics, it shows the cause to be the AI's attention spreading so thin it suddenly snaps. This exact formula provides quantitative predictions for how the tipping-point can be delayed or prevented by changing the prompt and the AI's training. Tailored generalizations will provide policymakers and the public with a firm platform for discussing any of AI's broader uses and risks, e.g. as a personal counselor, medical advisor, decision-maker for when to use force in a conflict situation. It also meets the need for clear and transparent answers to questions like ''should I be polite to my LLM?''

本研究解决了当前人们对大型语言模型（LLM）输出可靠性缺乏科学预测的问题，提出了一种基于基本原理的确切公式，阐明了当注意力分散到极限时，LLM可能发生的“杰基尔与海德”临界点。该公式提供量化预测，帮助政策制定者和公众有效讨论人工智能的更广泛应用和风险，促进了对“应该对我的LLM礼貌吗”这类问题的清晰理解。

人工智能行为中的杰基尔与海德临界点