Large Language Models (LLMs) are susceptible to Jailbreaking attacks, which
aim to extract harmful information by subtly modifying the attack query. As
defense mechanisms evolve, directly obtaining harmful information becomes
increasingly challenging for Jailbreaking attacks. In this work, inspired by
human practices of indirect context to elicit harmful information, we focus on
a new attack form called Contextual Interaction Attack. The idea relies on the
autoregressive nature of the generation process in LLMs. We contend that the
prior context--the information preceding the attack query--plays a pivotal role
in enabling potent Jailbreaking attacks. Specifically, we propose an approach
that leverages preliminary question-answer pairs to interact with the LLM. By
doing so, we guide the responses of the model toward revealing the 'desired'
harmful information. We conduct experiments on four different LLMs and
demonstrate the efficacy of this attack, which is black-box and can also
transfer across LLMs. We believe this can lead to further developments and
understanding of the context vector in LLMs.

大型语言模型对越狱攻击很容易受到攻击，本研究提出了一种基于上下文互动的攻击形式，通过操作模型的回应引导其透露有害信息。在四个不同的大型语言模型上进行实验证明了该攻击的有效性，并且该攻击可以在不同大型语言模型之间转移。

利用多轮互动增强上下文的越狱攻击

Leveraging the Context through Multi-Round Interactions for Jailbreaking  Attacks

The autoregressive nature of conventional large language models (LLMs)
inherently limits inference speed, as tokens are generated sequentially. While
speculative and parallel decoding techniques attempt to mitigate this, they
face limitations: either relying on less accurate smaller models for generation
or failing to fully leverage the base LLM's representations.
We introduce a novel architecture, Tandem transformers, to address these
issues. This architecture uniquely combines (1) a small autoregressive model
and (2) a large model operating in block mode (processing multiple tokens
simultaneously). The small model's predictive accuracy is substantially
enhanced by granting it attention to the large model's richer representations.
On the PaLM2 pretraining dataset, a tandem of PaLM2-Bison and PaLM2-Gecko
demonstrates a 3.3% improvement in next-token prediction accuracy over a
standalone PaLM2-Gecko, offering a 1.16x speedup compared to a PaLM2-Otter
model with comparable downstream performance. We further incorporate the tandem
model within the speculative decoding (SPEED) framework where the large model
validates tokens from the small model. This ensures that the Tandem of
PaLM2-Bison and PaLM2-Gecko achieves substantial speedup (around 1.14x faster
than using vanilla PaLM2-Gecko in SPEED) while maintaining identical downstream
task accuracy.

使用 Tandem transformers 架构，通过将小型自回归模型与以块模式操作的大型模型结合，以提高预测准确性并加快推理速度。在预训练数据集上，Tandem 模型显示出对下一个标记预测准确性的 3.3％改进，相比于性能相当的 PaLM2-Otter 模型，速度提升了 1.16 倍，同时在维持相同下游任务准确性的前提下，通过将 Tandem 模型引入到推测解码框架中，以实现大幅加速（比使用单独的 PaLM2-Gecko 模型快约 1.14 倍）。

用于推理高效 LLMs 的串联变压器

Tandem Transformers for Inference Efficient LLMs

Energy-based models (EBMs) have gained popularity for controlled text
generation due to their high applicability to a wide range of constraints.
However, sampling from EBMs is non-trivial, as it often requires a large number
of iterations to converge to plausible text, which slows down the decoding
process and makes it less practical for real-world applications. In this work,
we propose BOLT, which relies on tunable biases to directly adjust the language
model's output logits. Unlike prior work, BOLT maintains the generator's
autoregressive nature to assert a strong control on token-wise conditional
dependencies and overall fluency, and thus converges faster. When compared with
state-of-the-arts on controlled generation tasks using both soft constraints
(e.g., sentiment control) and hard constraints (e.g., keyword-guided topic
control), BOLT demonstrates significantly improved efficiency and fluency. On
sentiment control, BOLT is 7x faster than competitive baselines, and more
fluent in 74.4% of the evaluation samples according to human judges.

本文提出了一种名为 BOLT 的生成模型，在语言模型的输出 logits 直接调整的基础上，通过维护自回归性质和关注 token 级的条件依赖关系和整体流畅性来加强对文字生成的控制，优于竞争基准模型，其在情感控制上的速度比竞争基准快 7 倍，在人工评估的情感流畅度中有效提升。