Evaluating the quality of synthetic data remains a key challenge for ensuring privacy and utility in data-driven research. In this work, we present an evaluation framework that quantifies how well synthetic data replicates original distributional properties while ensuring privacy. The proposed approach employs a holdout-based benchmarking strategy that facilitates quantitative assessment through low- and high-dimensional distribution comparisons, embedding-based similarity measures, and nearest-neighbor distance metrics. The framework supports various data types and structures, including sequential and contextual information, and enables interpretable quality diagnostics through a set of standardized metrics. These contributions aim to support reproducibility and methodological consistency in benchmarking of synthetic data generation techniques. The code of the framework is available at https://github.com/mostly-ai/mostlyai-qa.

本研究解决了合成数据质量评估这一关键挑战，以保障数据驱动研究中的隐私和效用。我们提出了一种评估框架，通过保持基准策略定量评估合成数据复制原始分布特性的效果，支持多种数据类型，并提供可解释的质量诊断。这些贡献旨在促进合成数据生成技术的可重现性和方法论一致性。

合成表格数据基准测试：一个多维评估框架