基于多臂老虎机的新闻文章推荐算法的无偏离线评估

Mar, 2010

基于多臂老虎机的新闻文章推荐算法的无偏离线评估

An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms

Lihong Li, Wei Chu, John Langford

TL;DR本文介绍了一种基于数据驱动的回放方法，用于在线推荐系统中上下文匹配算法的离线评估，解决了传统基于模拟器的方法中数据建模困难且存在偏差的问题，并在Yahoo!等大规模新闻文章推荐数据集表现出的神经网络在在线离线策略上的准确性和效率。

Abstract

offline evaluation of reinforcement learning algorithms based on collected data (state transitions and rewards) has remained a challenging problem. Common practice is to create a simulator based on collected data and then run the algorithm against this simulator. Such an approach invol