In contextual optimization, a decision-maker observes historical samples of
uncertain variables and associated concurrent covariates, without knowing their
joint distribution. Given an additional covariate observation, the goal is to
choose a decision that minimizes some operational costs. A prevalent issue here
is covariate shift, where the marginal distribution of the new covariate
differs from historical samples, leading to decision performance variations
with nonparametric or parametric estimators. To address this, we propose a
distributionally robust approach that uses an ambiguity set by the intersection
of two Wasserstein balls, each centered on typical nonparametric or parametric
distribution estimators. Computationally, we establish the tractable
reformulation of this distributionally robust optimization problem.
Statistically, we provide guarantees for our Wasserstein ball intersection
approach under covariate shift by analyzing the measure concentration of the
estimators. Furthermore, to reduce computational complexity, we employ a
surrogate objective that maintains similar generalization guarantees. Through
synthetic and empirical case studies on income prediction and portfolio
optimization, we demonstrate the strong empirical performance of our proposed
models.

在上下文优化中，通过观察不确定变量的历史样本和相关联的并发协变量，不知道它们的联合分布。在给定附加协变量观测情况下，目标是选择最小化某些操作成本的决策。这里的一个普遍问题是协变量偏移，其中新协变量的边际分布与历史样本不同，导致具有非参数或参数估计器的决策性能变化。为了解决这个问题，我们提出了一个分布鲁棒方法，使用两个以典型的非参数或参数分布估计器为中心的 Wasserstein 球的交集作为模糊集合。在计算上，我们建立了这个分布鲁棒优化问题的易于计算的改写形式。在统计上，通过分析估计器的测度集中性，我们提供了我们的 Wasserstein 球交集方法在协变量偏移下的保证。此外，为了减少计算复杂性，我们采用了一个保持类似泛化保证的替代目标。通过对收入预测和投资组合优化的合成和实证案例研究，我们展示了我们提出的模型的强大实证性能。

上下文优化在协变量漂移下的鲁棒方法：通过相交的 Wasserstein 球

Contextual Optimization under Covariate Shift: A Robust Approach by  Intersecting Wasserstein Balls

We introduce a distributionally robust approach that enhances the reliability
of offline policy evaluation in contextual bandits under general covariate
shifts. Our method aims to deliver robust policy evaluation results in the
presence of discrepancies in both context and policy distribution between
logging and target data. Central to our methodology is the application of
robust regression, a distributionally robust technique tailored here to improve
the estimation of conditional reward distribution from logging data. Utilizing
the reward model obtained from robust regression, we develop a comprehensive
suite of policy value estimators, by integrating our reward model into
established evaluation frameworks, namely direct methods and doubly robust
methods. Through theoretical analysis, we further establish that the proposed
policy value estimators offer a finite sample upper bound for the bias,
providing a clear advantage over traditional methods, especially when the shift
is large. Finally, we designed an extensive range of policy evaluation
scenarios, covering diverse magnitudes of shifts and a spectrum of logging and
target policies. Our empirical results indicate that our approach significantly
outperforms baseline methods, most notably in 90% of the cases under the policy
shift-only settings and 72% of the scenarios under the general covariate shift
settings.

我们介绍了一种分布健壮的方法，用于在背景变量移位下增强上下文赌博的离线策略评估的可靠性。通过应用分布健壮回归技术改进条件奖励分布的估计，我们开发出一套综合的策略价值评估器，并通过理论分析证明了该方法相对于传统方法在偏移较大时的有限样本上限优势。在广泛的策略评估场景中，我们的实证结果表明我们的方法明显优于基准方法。