BriefGPT.xyz
Jun, 2022
学习排序的悲观离线策略优化
Pessimistic Off-Policy Optimization for Learning to Rank
HTML
PDF
Matej Cief, Branislav Kveton, Michal Kompan
TL;DR
本文研究基于数据采集的“离线学习”在推荐系统中的应用,提出了基于点击模型的悲观离线排序学习方法,经过实验和分析表明其优越性和通用性。
Abstract
off-policy learning
is a framework for optimizing policies without deploying them, using data collected by another policy. In
recommender systems
, this is especially challenging due to the imbalance in logged dat
→