Instruction-tuned Large Language Models (LLMs) have achieved breakthrough
results, opening countless new possibilities for many practical applications.
However, LLMs lack elementary safety features that are established norms in
other areas of computer science, such as the separation between instructions
and data, causing them to malfunction or rendering them vulnerable to
manipulation and interference by third parties e.g., via indirect
prompt/command injection. Even worse, so far, there is not even an established
definition of what precisely such a separation would mean and how its violation
could be tested. In this work, we aim to close this gap. We introduce a formal
measure to quantify the phenomenon of instruction-data separation as well as an
empirical variant of the measure that can be computed from a model`s black-box
outputs. We also introduce a new dataset, SEP (Should it be Executed or
Processed?), which allows estimating the measure, and we report results on
several state-of-the-art open-source and closed LLMs. Finally, we
quantitatively demonstrate that all evaluated LLMs fail to achieve a high
amount of separation, according to our measure. The source code and SEP dataset
are openly accessible at
this https URL

我们介绍了一种量化指令和数据分离现象的形式化测量方法，以及可以从模型的黑盒输出计算的经验性变量。我们还引入了一个名为 SEP（应该执行还是处理？）的新数据集，并对几种最先进的开源和闭源大语言模型进行了测试。最后，我们定量证明所有评估的大语言模型都无法实现高度的分离，根据我们的测量方法。

LLM 能将指令与数据分离吗？我们用这个说法究竟是什么意思？

Can LLMs Separate Instructions From Data? And What Do We Even Mean By  That?

Adaptive Cruise Control (ACC) is a widely used driver assistance feature for
maintaining desired speed and safe distance to the leading vehicles. This paper
evaluates the security of the deep neural network (DNN) based ACC systems under
stealthy perception attacks that strategically inject perturbations into camera
data to cause forward collisions. We present a combined
knowledge-and-data-driven approach to design a context-aware strategy for the
selection of the most critical times for triggering the attacks and a novel
optimization-based method for the adaptive generation of image perturbations at
run-time. We evaluate the effectiveness of the proposed attack using an actual
driving dataset and a realistic simulation platform with the control software
from a production ACC system and a physical-world driving simulator while
considering interventions by the driver and safety features such as Automatic
Emergency Braking (AEB) and Forward Collision Warning (FCW). Experimental
results show that the proposed attack achieves 142.9x higher success rate in
causing accidents than random attacks and is mitigated 89.6% less by the safety
features while being stealthy and robust to real-world factors and dynamic
changes in the environment. This study provides insights into the role of human
operators and basic safety interventions in preventing attacks.

该研究评估了基于深度神经网络的自适应巡航控制系统在感知攻击下的安全性，通过注入摄像头数据扰动以引发前方碰撞，提出了一种基于知识和数据的结合方法，设计了一种上下文感知策略来选择攻击触发的最关键时间，并提出了一种实时自适应生成图像扰动的优化方法。通过实际驾驶数据集和仿真平台进行攻击效果评估，考虑驾驶员干预以及自动紧急制动和前方碰撞警示等安全功能，实验结果表明，相比于随机攻击，该攻击的事故成功率提高了 142.9 倍，并且在受到安全功能限制时减少了 89.6%，同时具有潜在性和对真实环境因素和动态变化的鲁棒性。该研究对人工操作员和基本安全干预在预防攻击中的作用提供了深入洞察。