In many real-world causal inference applications, the primary outcomes
(labels) are often partially missing, especially if they are expensive or
difficult to collect. If the missingness depends on covariates (i.e.,
missingness is not completely at random), analyses based on fully-observed
samples alone may be biased. Incorporating surrogates, which are fully observed
post-treatment variables related to the primary outcome, can improve estimation
in this case. In this paper, we study the role of surrogates in estimating
continuous treatment effects and propose a doubly robust method to efficiently
incorporate surrogates in the analysis, which uses both labeled and unlabeled
data and does not suffer from the above selection bias problem. Importantly, we
establish asymptotic normality of the proposed estimator and show possible
improvements on the variance compared with methods that solely use labeled
data. Extensive simulations show our methods enjoy appealing empirical
performance.

在许多实际因果推断应用中，主要结果（标签）通常部分缺失，特别是如果它们昂贵或难以收集。本文研究了替代变量在估计连续性处理效应中的作用，并提出了一种双重稳健方法，以有效地将替代变量纳入分析中，该方法使用标记和未标记数据，并不受选择偏差问题的影响。重要的是，我们建立了所提估计量的渐近正态性，并展示了与仅使用标记数据的方法相比可能的方差改进。大量仿真实验显示我们的方法具有吸引人的经验性能。

利用替代性结果进行连续治疗效果评估

Continuous Treatment Effects with Surrogate Outcomes

We propose a new regret minimization algorithm for episodic sparse linear
Markov decision process (SMDP) where the state-transition distribution is a
linear function of observed features. The only previously known algorithm for
SMDP requires the knowledge of the sparsity parameter and oracle access to an
unknown policy. We overcome these limitations by combining the doubly robust
method that allows one to use feature vectors of \emph{all} actions with a
novel analysis technique that enables the algorithm to use data from all
periods in all episodes. The regret of the proposed algorithm is
$\tilde{O}(\sigma^{-1}_{\min} s_{\star} H \sqrt{N})$, where $\sigma_{\min}$
denotes the restrictive the minimum eigenvalue of the average Gram matrix of
feature vectors, $s_\star$ is the sparsity parameter, $H$ is the length of an
episode, and $N$ is the number of rounds. We provide a lower regret bound that
matches the upper bound up to logarithmic factors on a newly identified
subclass of SMDPs. Our numerical experiments support our theoretical results
and demonstrate the superior performance of our algorithm.

我们提出了一种新的遗憾最小化算法，用于具有稀疏线性马尔可夫决策过程（SMDP）的情节性问题，其中状态转移分布是观察特征的线性函数。

稀疏强化学习的双重稳健方法

A Doubly Robust Approach to Sparse Reinforcement Learning

Off-policy learning plays a pivotal role in optimizing and evaluating
policies prior to the online deployment. However, during the real-time serving,
we observe varieties of interventions and constraints that cause inconsistency
between the online and offline settings, which we summarize and term as runtime
uncertainty. Such uncertainty cannot be learned from the logged data due to its
abnormality and rareness nature. To assert a certain level of robustness, we
perturb the off-policy estimators along an adversarial direction in view of the
runtime uncertainty. It allows the resulting estimators to be robust not only
to observed but also unexpected runtime uncertainties. Leveraging this idea, we
bring runtime-uncertainty robustness to three major off-policy learning
methods: the inverse propensity score method, reward-model method, and doubly
robust method. We theoretically justify the robustness of our methods to
runtime uncertainty, and demonstrate their effectiveness using both the
simulation and the real-world online experiments.

论文提出了一种针对运行时不确定性的离线评估方法，该方法允许所得的估算器不仅对预期中的运行时不确定性具有鲁棒性，还对观察到的和意外的运行时不确定性具有鲁棒性，并且有效地证明其在仿真和现实世界在线实验中的鲁棒性。