We present a novel algorithm that efficiently computes near-optimal
deterministic policies for constrained reinforcement learning (CRL) problems.
Our approach combines three key ideas: (1) value-demand augmentation, (2)
action-space approximate dynamic programming, and (3) time-space rounding.
Under mild reward assumptions, our algorithm constitutes a fully
polynomial-time approximation scheme (FPTAS) for a diverse class of cost
criteria. This class requires that the cost of a policy can be computed
recursively over both time and (state) space, which includes classical
expectation, almost sure, and anytime constraints. Our work not only provides
provably efficient algorithms to address real-world challenges in
decision-making but also offers a unifying theory for the efficient computation
of constrained deterministic policies.

我们提出了一种新颖的算法，能够高效计算约束强化学习问题的近似最优确定性策略。该算法通过三个关键思想进行组合：（1）价值需求增强，（2）动作空间的近似动态规划，以及（3）时间空间的取整。在较弱的奖励假设下，我们的算法构成了一个对多样化成本准则的全多项式时间近似方案。该类准则要求以递归方式计算策略的成本，涉及时间和状态空间，包括经典期望、几乎确定和实时约束。我们的工作不仅为解决实际决策中的挑战提供了经过证明的高效算法，还为高效计算约束性确定性策略提供了统一的理论。

多项式时间下的受限强化学习确定性策略

Deterministic Policies for Constrained Reinforcement Learning in  Polynomial-Time

Boolean Matrix Factorization (BMF) aims to find an approximation of a given
binary matrix as the Boolean product of two low-rank binary matrices. Binary
data is ubiquitous in many fields, and representing data by binary matrices is
common in medicine, natural language processing, bioinformatics, computer
graphics, among many others. Unfortunately, BMF is computationally hard and
heuristic algorithms are used to compute Boolean factorizations. Very recently,
the theoretical breakthrough was obtained independently by two research groups.
Ban et al. (SODA 2019) and Fomin et al. (Trans. Algorithms 2020) show that BMF
admits an efficient polynomial-time approximation scheme (EPTAS). However,
despite the theoretical importance, the high double-exponential dependence of
the running times from the rank makes these algorithms unimplementable in
practice. The primary research question motivating our work is whether the
theoretical advances on BMF could lead to practical algorithms.
The main conceptional contribution of our work is the following. While EPTAS
for BMF is a purely theoretical advance, the general approach behind these
algorithms could serve as the basis in designing better heuristics. We also use
this strategy to develop new algorithms for related $\mathbb{F}_p$-Matrix
Factorization. Here, given a matrix $A$ over a finite field GF($p$) where $p$
is a prime, and an integer $r$, our objective is to find a matrix $B$ over the
same field with GF($p$)-rank at most $r$ minimizing some norm of $A-B$. Our
empirical research on synthetic and real-world data demonstrates the advantage
of the new algorithms over previous works on BMF and $\mathbb{F}_p$-Matrix
Factorization.

基于针对实践的 heuristics 我们提出了一种新的算法，它基于 BMF 的最新的理论进展，用于在有限域上找到 GF （p）-Matrix 分解的有效多项式时间逼近方案，并通过人工合成和现实世界数据的实证研究证明了我们算法的优越性。