We present a novel recommender systems dataset that records the sequential
interactions between users and an online marketplace. The users are
sequentially presented with both recommendations and search results in the form
of ranked lists of items, called slates, from the marketplace. The dataset
includes the presented slates at each round, whether the user clicked on any of
these items and which item the user clicked on. Although the usage of exposure
data in recommender systems is growing, to our knowledge there is no open
large-scale recommender systems dataset that includes the slates of items
presented to the users at each interaction. As a result, most articles on
recommender systems do not utilize this exposure information. Instead, the
proposed models only depend on the user's click responses, and assume that the
user is exposed to all the items in the item universe at each step, often
called uniform candidate sampling. This is an incomplete assumption, as it
takes into account items the user might not have been exposed to. This way
items might be incorrectly considered as not of interest to the user. Taking
into account the actually shown slates allows the models to use a more natural
likelihood, based on the click probability given the exposure set of items, as
is prevalent in the bandit and reinforcement learning literature.
\cite{Eide2021DynamicSampling} shows that likelihoods based on uniform
candidate sampling (and similar assumptions) are implicitly assuming that the
platform only shows the most relevant items to the user. This causes the
recommender system to implicitly reinforce feedback loops and to be biased
towards previously exposed items to the user.

介绍了一个包括推荐系统中展现给用户的物品 slate、用户是否点击事件及点击的物品的顺序等顺序交互数据的数据集，并利用该数据集证明了使用 slate 数据的概率模型能够更准确的评估用户的点击率并避免偏差。

FINN.no Slates 数据集：一个记录交互、所有已查看项目和点击响应 / 未点击的新的序列数据集，用于推荐系统研究

FINN.no Slates Dataset: A new Sequential Dataset Logging Interactions,  allViewed Items and Click Responses/No-Click for Recommender Systems Research

Users of music streaming, video streaming, news recommendation, and
e-commerce services often engage with content in a sequential manner. Providing
and evaluating good sequences of recommendations is therefore a central problem
for these services. Prior reweighting-based counterfactual evaluation methods
either suffer from high variance or make strong independence assumptions about
rewards. We propose a new counterfactual estimator that allows for sequential
interactions in the rewards with lower variance in an asymptotically unbiased
manner. Our method uses graphical assumptions about the causal relationships of
the slate to reweight the rewards in the logging policy in a way that
approximates the expected sum of rewards under the target policy. Extensive
experiments in simulation and on a live recommender system show that our
approach outperforms existing methods in terms of bias and data efficiency for
the sequential track recommendations problem.

该研究提出了一种图形假设的因果关系方法，以重新加权日志策略中的奖励，从而近似于目标策略下的奖励和，以解决串行互动推荐问题。在模拟和实际推荐系统中进行的广泛实验表明，该方法在偏差和数据效率方面优于现有方法。

使用顺序奖励交互的对比评估编号推荐

Counterfactual Evaluation of Slate Recommendations with Sequential  Reward Interactions

Recommender systems play a crucial role in mitigating the problem of
information overload by suggesting users' personalized items or services. The
vast majority of traditional recommender systems consider the recommendation
procedure as a static process and make recommendations following a fixed
strategy. In this paper, we propose a novel recommender system with the
capability of continuously improving its strategies during the interactions
with users. We model the sequential interactions between users and a
recommender system as a Markov Decision Process (MDP) and leverage
Reinforcement Learning (RL) to automatically learn the optimal strategies via
recommending trial-and-error items and receiving reinforcements of these items
from users' feedback. Users' feedback can be positive and negative and both
types of feedback have great potentials to boost recommendations. However, the
number of negative feedback is much larger than that of positive one; thus
incorporating them simultaneously is challenging since positive feedback could
be buried by negative one. In this paper, we develop a novel approach to
incorporate them into the proposed deep recommender system (DEERS) framework.
The experimental results based on real-world e-commerce data demonstrate the
effectiveness of the proposed framework. Further experiments have been
conducted to understand the importance of both positive and negative feedback
in recommendations.

文章提出了一种利用强化学习学习交互过程中优化策略的推荐系统，通过深度学习框架将正反馈同时整合到系统中进行优化，证明了该方法可以提高推荐精度。