We propose an algorithm for tabular episodic reinforcement learning with
constraints. We provide a modular analysis with strong theoretical guarantees
for settings with concave rewards and convex constraints, and for settings with
hard constraints (knapsacks). Most of the previous work in constrained
reinforcement learning is limited to linear constraints, and the remaining work
focuses on either the feasibility question or settings with a single episode.
Our experiments demonstrate that the proposed algorithm significantly
outperforms these approaches in existing constrained episodic environments.

我们提出了一个算法，用于带有约束的表格式状态机器学习，并提供了强有力的理论保证，适用于具有凹收益和凸约束或具有纯硬约束（背包）的情况。我们的实验表明，所提出的算法在现有的约束性情境中明显优于以前的工作，且超过线性约束和只有一个情节的简单情境。

凸凹和背包约束下的有约束情节增强学习

Constrained episodic reinforcement learning in concave-convex and  knapsack settings

We design a new provably efficient algorithm for episodic reinforcement
learning with generalized linear function approximation. We analyze the
algorithm under a new expressivity assumption that we call "optimistic
closure," which is strictly weaker than assumptions from prior analyses for the
linear setting. With optimistic closure, we prove that our algorithm enjoys a
regret bound of $\tilde{O}(\sqrt{d^3 T})$ where $d$ is the dimensionality of
the state-action features and $T$ is the number of episodes. This is the first
statistically and computationally efficient algorithm for reinforcement
learning with generalized linear functions.

本论文提出了一种新的基于广义线性函数逼近的回合式强化学习算法，并在乐观闭合假设下分析其性能，证明了其具有更低的复杂度，并且是强化学习中第一个具有统计和计算效率的基于广义线性函数的算法。