Online experimentation with interference is a common challenge in modern
applications such as e-commerce and adaptive clinical trials in medicine. For
example, in online marketplaces, the revenue of a good depends on discounts
applied to competing goods. Statistical inference with interference is widely
studied in the offline setting, but far less is known about how to adaptively
assign treatments to minimize regret. We address this gap by studying a
multi-armed bandit (MAB) problem where a learner (e-commerce platform)
sequentially assigns one of possible $\mathcal{A}$ actions (discounts) to $N$
units (goods) over $T$ rounds to minimize regret (maximize revenue). Unlike
traditional MAB problems, the reward of each unit depends on the treatments
assigned to other units, i.e., there is interference across the underlying
network of units. With $\mathcal{A}$ actions and $N$ units, minimizing regret
is combinatorially difficult since the action space grows as $\mathcal{A}^N$.
To overcome this issue, we study a sparse network interference model, where the
reward of a unit is only affected by the treatments assigned to $s$ neighboring
units. We use tools from discrete Fourier analysis to develop a sparse linear
representation of the unit-specific reward $r_n: [\mathcal{A}]^N \rightarrow
\mathbb{R} $, and propose simple, linear regression-based algorithms to
minimize regret. Importantly, our algorithms achieve provably low regret both
when the learner observes the interference neighborhood for all units and when
it is unknown. This significantly generalizes other works on this topic which
impose strict conditions on the strength of interference on a known network,
and also compare regret to a markedly weaker optimal action. Empirically, we
corroborate our theoretical findings via numerical simulations.

通过研究在线干预实验中的干扰问题，我们提出了基于线性回归算法的多臂赌博机策略，以最小化后悔并实现低后悔的任务分配。

具有网络干扰的多臂赌博机

Multi-Armed Bandits with Network Interference

The selection of the assumed effect size (AES) critically determines the
duration of an experiment, and hence its accuracy and efficiency.
Traditionally, experimenters determine AES based on domain knowledge. However,
this method becomes impractical for online experimentation services managing
numerous experiments, and a more automated approach is hence of great demand.
We initiate the study of data-driven AES selection in for online
experimentation services by introducing two solutions. The first employs a
three-layer Gaussian Mixture Model considering the heteroskedasticity across
experiments, and it seeks to estimate the true expected effect size among
positive experiments. The second method, grounded in utility theory, aims to
determine the optimal effect size by striking a balance between the
experiment's cost and the precision of decision-making. Through comparisons
with baseline methods using both simulated and real data, we showcase the
superior performance of the proposed approaches.

基于数据驱动的在线实验，提出了两种方法：使用考虑实验差异性的三层高斯混合模型来估计期望效应大小，以及基于效用理论来确定最佳效应大小，通过与基准方法的比较，表明了这些方法的卓越性能。

在线实验中持续时间推荐的效应大小估计：利用层次模型和客观效用方法

Effect Size Estimation for Duration Recommendation in Online  Experiments: Leveraging Hierarchical Models and Objective Utility Approaches

North star metrics and online experimentation play a central role in how
technology companies improve their products. In many practical settings,
however, evaluating experiments based on the north star metric directly can be
difficult. The two most significant issues are 1) low sensitivity of the north
star metric and 2) differences between the short-term and long-term impact on
the north star metric. A common solution is to rely on proxy metrics rather
than the north star in experiment evaluation and launch decisions. Existing
literature on proxy metrics concentrates mainly on the estimation of the
long-term impact from short-term experimental data. In this paper, instead, we
focus on the trade-off between the estimation of the long-term impact and the
sensitivity in the short term. In particular, we propose the Pareto optimal
proxy metrics method, which simultaneously optimizes prediction accuracy and
sensitivity. In addition, we give an efficient multi-objective optimization
algorithm that outperforms standard methods. We applied our methodology to
experiments from a large industrial recommendation system, and found proxy
metrics that are eight times more sensitive than the north star and
consistently moved in the same direction, increasing the velocity and the
quality of the decisions to launch new features.

论文提出 Pareto 最优代理指标方法，该方法同时优化了预测精度和灵敏度，并给出了一种有效的多目标优化算法用于实验评估和决策，大幅提高了工业推荐系统启动新功能的决策速度和质量。

帕累托最优代理指标

Pareto optimal proxy metrics

Firms implementing digital advertising campaigns face a complex problem in
determining the right match between their advertising creatives and target
audiences. Typical solutions to the problem have leveraged non-experimental
methods, or used "split-testing" strategies that have not explicitly addressed
the complexities induced by targeted audiences that can potentially overlap
with one another. This paper presents an adaptive algorithm that addresses the
problem via online experimentation. The algorithm is set up as a contextual
bandit and addresses the overlap issue by partitioning the target audiences
into disjoint, non-overlapping sub-populations. It learns an optimal creative
display policy in the disjoint space, while assessing in parallel which
creative has the best match in the space of possibly overlapping target
audiences. Experiments show that the proposed method is more efficient compared
to naive "split-testing" or non-adaptive "A/B/n" testing based methods. We also
describe a testing product we built that uses the algorithm. The product is
currently deployed on the advertising platform of JD.com, an eCommerce company
and a publisher of digital ads in China.

该论文介绍了一种在分割目标受众群体并在线实验中解决数字广告匹配问题的上下文强化学习算法，并在中国电商平台 JD.com 上实现