Language models are prone to memorizing large parts of their training data,
making them vulnerable to extraction attacks. Existing research on these
attacks remains limited in scope, often studying isolated trends rather than
the real-world interactions with these models. In this paper, we revisit
extraction attacks from an adversarial perspective, exploiting the brittleness
of language models. We find significant churn in extraction attack trends,
i.e., even minor, unintuitive changes to the prompt, or targeting smaller
models and older checkpoints, can exacerbate the risks of extraction by up to
$2-4 \times$. Moreover, relying solely on the widely accepted verbatim match
underestimates the extent of extracted information, and we provide various
alternatives to more accurately capture the true risks of extraction. We
conclude our discussion with data deduplication, a commonly suggested
mitigation strategy, and find that while it addresses some memorization
concerns, it remains vulnerable to the same escalation of extraction risks
against a real-world adversary. Our findings highlight the necessity of
acknowledging an adversary's true capabilities to avoid underestimating
extraction risks.

利用对语言模型的脆弱性，我们从对抗性的角度重新审视了提取攻击，发现即使对提示进行微小、不符合直觉的更改，或者针对较小的模型和旧的检查点，也可以将提取风险增加 2-4 倍。此外，仅依赖广泛接受的逐字匹配低估了提取信息的真实范围，我们提供了其他准确捕捉提取风险的方法。我们总结了数据去重，这是一种常见的缓解策略，并发现它虽然解决了一些记忆问题，但仍对真实的对手提取风险的升级具有脆弱性。我们的发现凸显了认识对手的真实能力以避免低估提取风险的必要性。

朝着更现实的提取攻击：一种对抗性视角

Towards More Realistic Extraction Attacks: An Adversarial Perspective

The immense datasets used to develop Large Language Models (LLMs) often
include copyright-protected content, typically without the content creator's
consent. Copyright traps have been proposed to be injected into the original
content, improving content detectability in newly released LLMs. Traps,
however, rely on the exact duplication of a unique text sequence, leaving them
vulnerable to commonly deployed data deduplication techniques. We here propose
the generation of fuzzy copyright traps, featuring slight modifications across
duplication. When injected in the fine-tuning data of a 1.3B LLM, we show fuzzy
trap sequences to be memorized nearly as well as exact duplicates.
Specifically, the Membership Inference Attack (MIA) ROC AUC only drops from
0.90 to 0.87 when 4 tokens are replaced across the fuzzy duplicates. We also
find that selecting replacement positions to minimize the exact overlap between
fuzzy duplicates leads to similar memorization, while making fuzzy duplicates
highly unlikely to be removed by any deduplication process. Lastly, we argue
that the fact that LLMs memorize across fuzzy duplicates challenges the study
of LLM memorization relying on naturally occurring duplicates. Indeed, we find
that the commonly used training dataset, The Pile, contains significant amounts
of fuzzy duplicates. This introduces a previously unexplored confounding factor
in post-hoc studies of LLM memorization, and questions the effectiveness of
(exact) data deduplication as a privacy protection technique.

基于模糊复制品的版权陷阱在大型语言模型的记忆方面具有挑战性，这对 LLM 的记忆研究提出了一个前所未有的混淆因素，并对（精确的）数据去重作为隐私保护技术的有效性提出了质疑。

马赛克记忆：大语言模型中模糊复制的版权陷阱

Mosaic Memory: Fuzzy Duplication in Copyright Traps for Large Language  Models

This paper presents this http URL, a pre-trained German BERT model specifically
designed for the German medical domain. The model has been trained on a large
corpus of 4.7 Million German medical documents and has been shown to achieve
new state-of-the-art performance on eight different medical benchmarks covering
a wide range of disciplines and medical document types. In addition to
evaluating the overall performance of the model, this paper also conducts a
more in-depth analysis of its capabilities. We investigate the impact of data
deduplication on the model's performance, as well as the potential benefits of
using more efficient tokenization methods. Our results indicate that
domain-specific models such as this http URL are particularly useful for longer
texts, and that deduplication of training data does not necessarily lead to
improved performance. Furthermore, we found that efficient tokenization plays
only a minor role in improving model performance, and attribute most of the
improved performance to the large amount of training data. To encourage further
research, the pre-trained model weights and new benchmarks based on
radiological data are made publicly available for use by the scientific
community.

本文提出了一个针对德语医疗领域的预先训练的德语 BERT 模型，经过 4.7 百万德文医学文档的大规模语料库训练，取得了国内外八个医学基准测试的最新的最优表现。在评估模型整体性能的同时，本文对模型的能力进行了更深入的分析，探讨了数据去重对模型性能的影响，以及使用更高效的标记方法等的潜在好处，并证明域特定的模型对于长文本尤其有用，训练数据的去重并不一定会导致改善的性能，将更多的改善性能归因于大量的训练数据。此外，本文发现高效的标记只能在一定程度上提高模型的性能，并将权重和基于放射学数据的新基准公开提供给科学社区，鼓励进一步的研究。