We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated to the context. The goal is to minimize all the underlying functions for the received contexts, leading to a dynamic (contextual) notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are H\"older with respect to the contexts, we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear dynamic regret. We further study the case of strongly convex and smooth functions when the observations are noisy. Inspired by the interior point method and employing self-concordant barriers, we propose an algorithm achieving a sub-linear dynamic regret. Lastly, we present a minimax lower bound, implying two key facts. First, no algorithm can achieve sub-linear dynamic regret over functions that are not continuous with respect to the context. Second, for strongly convex and smooth functions, the algorithm that we propose achieves, up to a logarithmic factor, the minimax optimal rate of dynamic regret as a function of the number of queries.

我们研究了上下文连续性强化学习问题，证明了任何达到次线性静态遗憾的算法都可以扩展到达到次线性动态遗憾，我们提出了一种算法，通过自协调屏障和内点法实现了次线性动态遗憾，并且得出两个关键事实：首先，对于上下文不连续的函数，没有算法可以达到次线性动态遗憾；其次，对于强凸和光滑函数，我们提出的算法达到了最小极大动态遗憾速率的最优值，仅相差对数因子。

上下文连续型强化学习:静态对动态遗憾的比较