BriefGPT.xyz
Feb, 2024
POTEC:基于两阶段策略分解的大动作空间离线学习
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
HTML
PDF
Yuta Saito, Jihan Yao, Thorsten Joachims
TL;DR
通过提出一种新的两阶段算法 POTEC,利用动作空间的聚类和基于策略以及回归的方法,研究了大规模离散动作空间中上下文赌博机政策的离线学习问题,结果显示 POTEC 在特别是大且结构化的动作空间中显著提高了离线学习的效果。
Abstract
We study
off-policy learning
(OPL) of
contextual bandit policies
in large discrete action spaces where existing methods -- most of which rely crucially on reward-regression models or importance-weighted policy gr
→