Evaluating the causal impacts of possible interventions is crucial for
informing decision-making, especially towards improving access to opportunity.
However, if causal effects are heterogeneous and predictable from covariates,
personalized treatment decisions can improve individual outcomes and contribute
to both efficiency and equity. In practice, however, causal researchers do not
have a single outcome in mind a priori and often collect multiple outcomes of
interest that are noisy estimates of the true target of interest. For example,
in government-assisted social benefit programs, policymakers collect many
outcomes to understand the multidimensional nature of poverty. The ultimate
goal is to learn an optimal treatment policy that in some sense maximizes
multiple outcomes simultaneously. To address such issues, we present a
data-driven dimensionality-reduction methodology for multiple outcomes in the
context of optimal policy learning with multiple objectives. We learn a
low-dimensional representation of the true outcome from the observed outcomes
using reduced rank regression. We develop a suite of estimates that use the
model to denoise observed outcomes, including commonly-used index weightings.
These methods improve estimation error in policy evaluation and optimization,
including on a case study of real-world cash transfer and social intervention
data. Reducing the variance of noisy social outcomes can improve the
performance of algorithmic allocations.

通过降维回归模型，我们提出了一种数据驱动的方法，以多目标的最优政策学习为背景，从观测结果中学习出真实结果的低维度表示。我们的方法在政策评估和优化中降低了估计误差，通过降低噪音社会结果的方差，提高了算法分配的性能。

降维多目标策略学习与优化

Reduced-Rank Multi-objective Policy Learning and Optimization

This paper deals with optimal policy learning (OPL) with observational data,
i.e. data-driven optimal decision-making, in multi-action (or multi-arm)
settings, where a finite set of decision options is available. It is organized
in three parts, where I discuss respectively: estimation, risk preference, and
potential failures. The first part provides a brief review of the key
approaches to estimating the reward (or value) function and optimal policy
within this context of analysis. Here, I delineate the identification
assumptions and statistical properties related to offline optimal policy
learning estimators. In the second part, I delve into the analysis of decision
risk. This analysis reveals that the optimal choice can be influenced by the
decision maker's attitude towards risks, specifically in terms of the trade-off
between reward conditional mean and conditional variance. Here, I present an
application of the proposed model to real data, illustrating that the average
regret of a policy with multi-valued treatment is contingent on the
decision-maker's attitude towards risk. The third part of the paper discusses
the limitations of optimal data-driven decision-making by highlighting
conditions under which decision-making can falter. This aspect is linked to the
failure of the two fundamental assumptions essential for identifying the
optimal choice: (i) overlapping, and (ii) unconfoundedness. Some conclusions
end the paper.

该论文讨论了使用观察数据进行最优策略学习（OPL）的多行动（或多臂）设置下的数据驱动最优决策问题，分别从估计、风险偏好和潜在失败三个方面进行了讨论，并提出了关于线下最优策略学习估计器的识别假设和统计特性，以及决策风险分析和最优选择受决策者风险态度的影响，最后讨论了影响最优数据驱动决策的条件限制。