We consider the problem of online collaborative filtering in the online
setting, where items are recommended to the users over time. At each time step,
the user (selected by the environment) consumes an item (selected by the agent)
and provides a rating of the selected item. In this pa