Motivated by the phenomenon of strategic agents gaming a recommender system to maximize the number of times they are recommended to users, we study a strategic variant of the linear contextual bandit problem, where the arms can strategically misreport their privately observed contexts to the learner. We treat the algorithm design problem as one of mechanism design under uncertainty and propose the Optimistic Grim Trigger Mechanism (OptGTM) that incentivizes the agents (i.e., arms) to report their contexts truthfully while simultaneously minimizing regret. We also show that failing to account for the strategic nature of the agents results in linear regret. However, a trade-off between mechanism design and regret minimization appears to be unavoidable. More broadly, this work aims to provide insight into the intersection of online learning and mechanism design.

通过研究策略性代理商操控推荐系统以最大化推荐次数的现象，我们针对线性上下文赌博问题的策略变体进行研究，其中，策略可以误报私有观察到的上下文给学习者。我们将算法设计问题视为不确定性下的机制设计问题，并提出了乐观的致命开关机制（OptGTM），激励代理商（即臂）真实报告上下文，同时最小化遗憾。我们还表明，如果不考虑代理商的策略性质，将导致线性遗憾。然而，在机制设计和遗憾最小化之间存在一种权衡，这个研究旨在提供对在线学习和机制设计交叉领域的洞察。

战略线性上下文强盗