In many online sequential decision-making scenarios, a learner's choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.

在许多在线顺序决策场景中，学习者的选择不仅影响当前的成本，还影响未来的成本。本文研究了一种特殊情况，其中成本依赖于过去决策的时间平均值，我们提出了一种新颖的算法Follow-The-Adaptively-Regularized-Leader (FTARL)，该算法通过历史决策来动态调整正则化项，从而在满足阶段性约束的同时确保最小遗憾值。我们还讨论了历史视野长度对于无遗憾算法设计的影响，并在完全学习视野下给出了一些不可能性的结果。

历史平均相关成本的在线决策