We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator --the focal quantity of this work-- which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like SGD and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses enjoy a data-dependent (by the mean operator) form of noise robustness, in contrast with known negative results.

本文证明了大多数知名损失函数的经验风险因子可分为线性项，聚合所有标签和不涉及标签的项，并且可以进一步表示为损失的和。这适用于任何RKHS中的非光滑、非凸损失。通过估计平均操作符，本研究揭示了这种分解的变量的充分统计量，并将其应用于弱监督学习。最后，本文展示了大多数损失都享有一种依赖于数据的（通过平均算子）噪声鲁棒性。

损失分解、弱监督学习和标签噪声鲁棒性