线性实现最优动作价值函数的MDPs规划的指数下界

Oct, 2020

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Gellert Weisz, Philip Amortila, Csaba Szepesvári

TL;DR本文研究了在具有线性函数逼近和生成模型的固定时间段和折扣式马尔可夫决策过程中的本地规划问题，并结合限制的特征映射来回答是否存在只需多项式数量查询的可靠规划器的问题，并提出了最小二乘值迭代算法用于计算优化方案。

Abstract

We consider the problem of local planning in fixed-horizon markov decision processes (MDPs) with linear function approximation and a