Recent statements about the impressive capabilities of large language models
(LLMs) are usually supported by evaluating on open-access benchmarks.
Considering the vast size and wide-ranging sources of LLMs' training data, it
could explicitly or implicitly include test data, leading to LLMs being more
susceptible to data contamination. However, due to the opacity of training
data, the black-box access of models, and the rapid growth of synthetic
training data, detecting and mitigating data contamination for LLMs faces
significant challenges. In this paper, we propose CDD, which stands for
Contamination Detection via output Distribution for LLMs. CDD necessitates only
the sampled texts to detect data contamination, by identifying the peakedness
of LLM's output distribution. To mitigate the impact of data contamination in
evaluation, we also present TED: Trustworthy Evaluation via output
Distribution, based on the correction of LLM's output distribution. To
facilitate this study, we introduce two benchmarks, i.e., DetCon and ComiEval,
for data contamination detection and contamination mitigation evaluation tasks.
Extensive experimental results show that CDD achieves the average relative
improvements of 21.8\%-30.2\% over other contamination detection approaches in
terms of Accuracy, F1 Score, and AUC metrics, and can effectively detect
contamination caused by the variants of test data. TED significantly mitigates
performance improvements up to 66.9\% attributed to data contamination across
24 settings and 21 contamination degrees. In real-world applications, we reveal
that ChatGPT exhibits a high potential to suffer from data contamination on
HumanEval benchmark.

我们提出了基于 LLMs 输出分布的数据污染检测方法 CDD，并通过修正 LLMs 输出分布的方法 TED，有效地检测和减轻数据污染的影响。实验结果表明，CDD 在准确度、F1 得分和 AUC 指标方面相对其他方法平均提升了 21.8％-30.2％，TED 在 24 种设置和 21 种污染程度下成功地减轻数据污染引起的性能下降高达 66.9％。实际应用中，我们发现 ChatGPT 在 HumanEval 基准中存在受数据污染的高风险。