Differentially private SGD (DPSGD) enables privacy-preserving training of language models, but often reduces utility, diversity, and linguistic quality. We introduce DPRefine, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs. This approach significantly outperforms vanilla DPSGD, with AlpacaEval preferring DPRefine's generations in 78.4% of cases across all datasets. Our analysis reveals that DPRefine reduces linguistic errors in generated text by 84.0%, mitigating grammar and spelling errors, commonly associated with DPSGD. It also reduces inconsistencies of non-private models, such as hallucinated details and misattributed quotes. We find that small models like GPT-2 can be effective for initialization and distillation, highlighting their potential in enabling scalable and efficient deployment of privacy-preserving language.

本研究解决了差分隐私随机梯度下降（DPSGD）在训练语言模型时引发的效用、多样性和语言质量下降的问题。提出了一种三阶段的方法DPRefine，通过小型预训练语言模型的数据合成进行初始化，在私有数据上进行差分隐私微调，并进行自我蒸馏以改善输出。研究表明，DPRefine在降语言错误方面有效，减少了常见的语法和拼写错误，并在所有数据集中78.4%的案例中被优选，展现了其在隐私保护语言模型部署中的潜力。

差分隐私学习需要更好的模型初始化和自我蒸馏