Optimization models used to make discrete decisions often contain uncertain
parameters that are context-dependent and are estimated through prediction. To
account for the quality of the decision made based on the prediction,
decision-focused learning (end-to-end predict-then-optimize) aims at training
the predictive model to minimize regret, i.e., the loss incurred by making a
suboptimal decision. Despite the challenge of this loss function being possibly
non-convex and in general non-differentiable, effective gradient-based learning
approaches have been proposed to minimize the expected loss, using the
empirical loss as a surrogate. However, empirical regret can be an ineffective
surrogate because the uncertainty in the optimization model makes the empirical
regret unequal to the expected regret in expectation. To illustrate the impact
of this inequality, we evaluate the effect of aleatoric and epistemic
uncertainty on the accuracy of empirical regret as a surrogate. Next, we
propose three robust loss functions that more closely approximate expected
regret. Experimental results show that training two state-of-the-art
decision-focused learning approaches using robust regret losses improves
test-sample empirical regret in general while keeping computational time
equivalent relative to the number of training epochs.

优化模型中的不确定参数通过预测估计，为了评估基于预测的决策质量，决策焦点学习旨在通过训练预测模型来最小化后悔，提出了三种更接近预期后悔的鲁棒损失函数，实验证明使用鲁棒后悔损失训练决策焦点学习方法能够改善测试样本的经验后悔并保持计算时间等效。

决策焦点学习的强化损失函数

Robust Losses for Decision-Focused Learning

Mean field game facilitates analyzing multi-armed bandit (MAB) for a large
number of agents by approximating their interactions with an average effect.
Existing mean field models for multi-agent MAB mostly assume a binary reward
function, which leads to tractable analysis but is usually not applicable in
practical scenarios. In this paper, we study the mean field bandit game with a
continuous reward function. Specifically, we focus on deriving the existence
and uniqueness of mean field equilibrium (MFE), thereby guaranteeing the
asymptotic stability of the multi-agent system. To accommodate the continuous
reward function, we encode the learned reward into an agent state, which is in
turn mapped to its stochastic arm playing policy and updated using realized
observations. We show that the state evolution is upper semi-continuous, based
on which the existence of MFE is obtained. As the Markov analysis is mainly for
the case of discrete state, we transform the stochastic continuous state
evolution into a deterministic ordinary differential equation (ODE). On this
basis, we can characterize a contraction mapping for the ODE to ensure a unique
MFE for the bandit game. Extensive evaluations validate our MFE
characterization, and exhibit tight empirical regret of the MAB problem.

研究用连续奖励函数的均场自博弈，重点在于推导出均场平衡的存在和唯一性，并通过广泛的评估结果验证了 MAB 问题的实证后悔紧致性。