Black-box heterogeneous treatment effect (HTE) models are increasingly being
used to create personalized policies that assign individuals to their optimal
treatments. However, they are difficult to understand, and can be burdensome to
maintain in a production environment. In this paper, we present a scalable,
interpretable personalized experimentation system, implemented and deployed in
production at Meta. The system works in a multiple treatment, multiple outcome
setting typical at Meta to: (1) learn explanations for black-box HTE models;
(2) generate interpretable personalized policies. We evaluate the methods used
in the system on publicly available data and Meta use cases, and discuss
lessons learnt during the development of the system.

本文介绍了 Meta 公司实现和部署生产环境中可扩展、可解释的个性化实验系统，用于学习对黑盒异质性处理效果模型的解释和生成可解释的个性化政策，并在公共数据和 Meta 使用案例上评估系统中使用的方法并讨论开发系统过程中的经验教训。

可解释的个性化实验

Interpretable Personalized Experimentation

We consider offline reinforcement learning (RL) with heterogeneous agents
under severe data scarcity, i.e., we only observe a single historical
trajectory for every agent under an unknown, potentially sub-optimal policy. We
find that the performance of state-of-the-art offline and model-based RL
methods degrade significantly given such limited data availability, even for
commonly perceived "solved" benchmark settings such as "MountainCar" and
"CartPole". To address this challenge, we propose PerSim, a model-based offline
RL approach which first learns a personalized simulator for each agent by
collectively using the historical trajectories across all agents, prior to
learning a policy. We do so by positing that the transition dynamics across
agents can be represented as a latent function of latent factors associated
with agents, states, and actions; subsequently, we theoretically establish that
this function is well-approximated by a "low-rank" decomposition of separable
agent, state, and action latent functions. This representation suggests a
simple, regularized neural network architecture to effectively learn the
transition dynamics per agent, even with scarce, offline data. We perform
extensive experiments across several benchmark environments and RL methods. The
consistent improvement of our approach, measured in terms of both state
dynamics prediction and eventual reward, confirms the efficacy of our framework
in leveraging limited historical data to simultaneously learn personalized
policies across agents.

本文提出基于模型的离线强化学习方法 PerSim 来解决数据稀缺性问题，通过学习每个智能体的个性化模拟器来提高性能并同时学习个性化策略。