We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We first prove the equivalence for four uncertainty sets: ellipsoidal, box, diamond, and budget, and provide closed-form formulations of the penalty term as a function of the sample size, feature size, as well as perturbation protection strength. We then show in synthetic datasets with different levels of perturbations, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. Importantly, as the perturbation level increases, the improvement increases, confirming our method's advantage in high-noise environments. We report similar improvements in the out-of-sample datasets in real-world regression problems obtained from UCI datasets.

我们提出了一种新的鲁棒回归的表述，通过整合不确定性集的所有实现并采用平均方法来获得普通最小二乘回归问题的最优解。我们证明了这个表述意外地恢复了岭回归，并在现有回归问题的鲁棒优化和均方误差方法之间建立了缺失的联系。我们首先证明了四种不确定性集的等价性：椭圆、盒子、钻石和预算，并提供了惩罚项的闭式表达方式，其是样本大小、特征大小以及扰动保护强度的函数。然后我们展示了在具有不同扰动水平的合成数据集中，平均表述比现有最坏情况表述在样本外性能上的一致改进。重要的是，随着扰动水平的增加，改进也增加，这证实了我们方法在高噪声环境中的优势。我们对从UCI数据集获得的真实回归问题的样本外数据集中报告了类似的改进。

基于平均不确定性的鲁棒回归