BriefGPT.xyz
Sep, 2023
基于Wasserstein分配鲁棒性的上下文强化学习策略评估与学习
Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits
HTML
PDF
Yi Shen, Pan Xu, Michael M. Zavlanos
TL;DR
提出了一种利用Wasserstein距离的分布鲁棒优化方法,用于解决环境不匹配的问题,并提供了理论分析和实证验证。
Abstract
Without
direct interaction
with the
environment
. Often, the
environment
in which the data are collected differs from the
→