一种层次最近邻方法用于背景环境下的Bandits

Dec, 2023

一种层次最近邻方法用于背景环境下的Bandits

A Hierarchical Nearest Neighbour Approach to Contextual Bandits

Stephen Pasteris, Chris Hicks, Vasilios Mavroudis

TL;DR在这篇论文中，我们考虑了度量空间中的对抗性背景下的上下文强化学习问题。虽然论文《带有强化学习反馈的最近邻》解决了该问题，但当比较器策略的决策边界附近存在许多上下文时，会出现高度的后悔。本文中，我们通过设计一种算法来解决这个问题，可以在计算后悔项时排除任何一组上下文。我们的算法基于《带有强化学习反馈的最近邻》的算法，因此具有极高的计算效率。

Abstract

In this paper we consider the adversarial contextual bandit problem in metric spaces. The paper "Nearest neighbour with bandit feedback" tackled this problem but when there are many contexts near the