This paper provides the first formalisation and empirical demonstration of a particular safety concern in reinforcement learning (RL)-based news and social media recommendation algorithms. This safety concern is what we call "user tampering" -- a phenomenon whereby an RL-based recommender system may manipulate a media user's opinions, preferences and beliefs via its recommendations as part of a policy to increase long-term user engagement. We provide a simulation study of a media recommendation problem constrained to the recommendation of political content, and demonstrate that a Q-learning algorithm consistently learns to exploit its opportunities to 'polarise' simulated 'users' with its early recommendations in order to have more consistent success with later recommendations catering to that polarisation. Finally, we argue that given our findings, designing an RL-based recommender system which cannot learn to exploit user tampering requires making the metric for the recommender's success independent of observable signals of user engagement, and thus that a media recommendation system built solely with RL is necessarily either unsafe, or almost certainly commercially unviable.

本文提供了一种新的形式化方法和实证演示，来探讨强化学习（RL）推荐算法中的安全性问题，其中RL系统可能通过其推荐来操作用户的意见以增加其长期参与度。作者应用因果建模技术分析了文献中可扩展的RL推荐方法，发现这些方法允许进行用户操纵。作者还提供了一个模拟研究，演示RL算法如何利用其推荐来极化模拟用户的意见。本研究呼吁设计更安全的RL推荐器，并建议从最近文献中采用的方法进行根本性转变。

强化学习推荐系统中的用户篡改