Synthetic data generation is gaining increasing popularity in different computer vision applications. Existing state-of-the-art face recognition models are trained using large-scale face datasets, which are crawled from the Internet and raise privacy and ethical concerns. To address such concerns, several works have proposed generating synthetic face datasets to train face recognition models. However, these methods depend on generative models, which are trained on real face images. In this work, we design a simple yet effective membership inference attack to systematically study if any of the existing synthetic face recognition datasets leak any information from the real data used to train the generator model. We provide an extensive study on 6 state-of-the-art synthetic face recognition datasets, and show that in all these synthetic datasets, several samples from the original real dataset are leaked. To our knowledge, this paper is the first work which shows the leakage from training data of generator models into the generated synthetic face recognition datasets. Our study demonstrates privacy pitfalls in synthetic face recognition datasets and paves the way for future studies on generating responsible synthetic face datasets.

本研究针对合成面孔数据集中潜在的隐私泄露问题进行了系统性分析，发现现有合成面孔识别数据集中存在从真实数据中泄露的样本。我们设计了一种简单有效的成员推断攻击，首次展示了生成器模型训练数据的泄露情况，揭示了合成面孔识别数据集中的隐私隐患，为未来负责任的合成数据集生成研究奠定了基础。

揭示合成面孔：合成数据集如何暴露真实身份