We study the problem of offline decision making, which focuses on learning decisions from datasets only partially correlated with the learning objective. While previous research has extensively studied specific offline decision making problems like offline reinforcement learning (RL) and off-policy evaluation (OPE), a unified framework and theory remain absent. To address this gap, we introduce a unified framework termed Decision Making with Offline Feedback (DMOF), which captures a wide range of offline decision making problems including offline RL, OPE, and offline partially observable Markov decision processes (POMDPs). For the DMOF framework, we introduce a hardness measure called the Offline Estimation Coefficient (OEC), which measures the learnability of offline decision making problems and is also reflected in the derived minimax lower bounds. Additionally, we introduce an algorithm called Empirical Decision with Divergence (EDD), for which we establish both an instance-dependent upper bound and a minimax upper bound. The minimax upper bound almost matches the lower bound determined by the OEC. Finally, we show that EDD achieves a fast convergence rate (i.e., a rate scaling as $1/N$, where $N$ is the sample size) for specific settings such as supervised learning and Markovian sequential problems~(e.g., MDPs) with partial coverage.

我们研究了离线决策问题，通过从仅与学习目标部分相关的数据集中学习决策。为填补现有研究在离线决策问题的统一框架和理论方面的不足，我们引入了一个统一框架，称为带离线反馈的决策制定（DMOF），它包括离线强化学习、离策略评估和离线部分可观测马尔可夫决策过程等一系列离线决策问题。对于DMOF框架，我们引入了一个称为离线估计系数（OEC）的难度衡量标准，用于衡量离线决策问题的可学习性，并且该标准也反映在导出的极小极大下界中。此外，我们还引入了一种称为经验决策与差异（EDD）的算法，我们为其建立了一个实例相关的上界和极小极大上界，该极小极大上界几乎与OEC确定的下界相匹配。最后，我们展示了EDD在特定设置下（如监督学习和具有部分覆盖的马尔可夫序列问题（例如MDPs））实现了快速收敛速度（即按照$1/N$的比例缩放的速度，其中$N$是样本大小）。

离线决策的可学习性理论