How capable are diffusion models of generating synthetics texts? Recent research shows their strengths, with performance reaching that of auto-regressive LLMs. But are they also good in generating synthetic data if the training was under differential privacy? Here the evidence is missing, yet the promises from private image generation look strong. In this paper we address this open question by extensive experiments. At the same time, we critically assess (and reimplement) previous works on synthetic private text generation with LLMs and reveal some unmet assumptions that might have led to violating the differential privacy guarantees. Our results partly contradict previous non-private findings and show that fully open-source LLMs outperform diffusion models in the privacy regime. Our complete source codes, datasets, and experimental setup is publicly available to foster future research.

本研究针对扩散模型在差分隐私条件下生成合成文本的能力进行了深入探讨。通过广泛的实验，我们发现之前关于LLM的合成私密文本生成的假设未能满足，从而可能导致差分隐私的保证受到影响。此外，我们的研究结果表明，完全开源的LLM在隐私保护方面优于扩散模型，为未来的研究提供了重要参考。

基于扩散模型的私密合成文本生成