学习排序的悲观离线策略优化

Jun, 2022

Pessimistic Off-Policy Optimization for Learning to Rank

Matej Cief, Branislav Kveton, Michal Kompan

TL;DR本文研究基于数据采集的“离线学习”在推荐系统中的应用，提出了基于点击模型的悲观离线排序学习方法，经过实验和分析表明其优越性和通用性。

Abstract

off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged dat