We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution $p$ belonging to a class $\mathcal Q$, with the goal of outputting an estimate of $p$ while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class $\mathcal Q$ is connected to the existence of a sample compression scheme for $\mathcal Q$, as well as to an intermediate notion we refer to as list learning. Leveraging this connection: (1) approximately recovers previous results on Gaussians over $\mathbb R^d$; and (2) leads to new ones, including sample complexity upper bounds for arbitrary $k$-mixtures of Gaussians over $\mathbb R^d$, results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in $\mathbb R^d$, at least $d$ public samples are necessary for private learnability, which is close to the known upper bound of $d+1$ public samples.

研究了具有公共数据访问的私人分布学习问题，通过使用公共和私有样本来输出一个对分布 p 的估计，同时满足纯差分隐私的隐私约束。结果显示，Q类的公共-私有可学习性与Q类的样本压缩方案以及中间概念列表学习的存在有关，并且将这种连接利用起来恢复了以前关于Gaussians和新的结果，包括关于高斯$k$混合物的样本复杂性上界、关于自适应和分布转移抵抗学习的结果，以及在承担混合和分布乘积时广义公共-私有学习的闭合特性。最后，利用与列表学习的联系，结果显示对于Gaussians在R^d中，至少需要d个公共样本进行私人可学习性，这接近已知的d+1个公共样本的上界。

对公共数据进行私密分布式学习：基于样本压缩的视角