A key to causal inference with observational data is achieving balance in
predictive features associated with each treatment type. Recent literature has
explored representation learning to achieve this goal. In this work, we discuss
the pitfalls of these strategies - such as a steep trade-off between achieving
balance and predictive power - and present a remedy via the integration of
balancing weights in causal learning. Specifically, we theoretically link
balance to the quality of propensity estimation, emphasize the importance of
identifying a proper target population, and elaborate on the complementary
roles of feature balancing and weight adjustments. Using these concepts, we
then develop an algorithm for flexible, scalable and accurate estimation of
causal effects. Finally, we show how the learned weighted representations may
serve to facilitate alternative causal learning procedures with appealing
statistical features. We conduct an extensive set of experiments on both
synthetic examples and standard benchmarks, and report encouraging results
relative to state-of-the-art baselines.

本文介绍了在因果推断中使用加权重来实现预测功能的平衡，强调了确定合适的目标人群的重要性，并使用引理将平衡与倾向性评估的质量联系起来，最终展示了学习到的加权表示如何促进具有吸引力统计特征的替代因果学习过程。

使用平衡权重进行反事实表示学习

Counterfactual Representation Learning with Balancing Weights

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for
meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas.
First, we show that Q-learning is competitive with state-of-the-art meta-RL
algorithms if given access to a context variable that is a representation of
the past trajectory. Second, a multi-task objective to maximize the average
reward across the training tasks is an effective method to meta-train RL
policies. Third, past data from the meta-training replay buffer can be recycled
to adapt the policy on a new task using off-policy updates. MQL draws upon
ideas in propensity estimation to do so and thereby amplifies the amount of
available data for adaptation. Experiments on standard continuous-control
benchmarks suggest that MQL compares favorably with the state of the art in
meta-RL.

Meta-Q-Learning (MQL) 是一种新的离线策略算法，它建立在三个简单的思想之上：使用过去轨迹的表示作为上下文变量可以使 Q-learning 与最先进的元 RL 算法相竞争；最大化训练任务的平均奖励的多任务目标是元训练 RL 策略的有效方法；从元训练回放缓冲区中获取的过去数据可以通过非策略更新来适应新任务，MQL 借鉴了势估计的思想，从而增加了可用于适应的数据量。实验表明，与元 RL 的最新技术相比，MQL 在标准的连续控制基准测试中表现得更好。