We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) the optimal value function $v^\star$ or (ii) the optimal action-value function $q^\star$ lie in the linear span of some features; or (iii) both $v^\star$ and $q^\star$ lie in the linear span when restricted to the states reachable from the starting state. Recently, Weisz et al. (2021b) showed that under (ii) the minimax query complexity of any planning algorithm is at least exponential in the horizon $H$ or in the feature dimension $d$ when the size $A$ of the action set can be chosen to be exponential in $\min(d,H)$. On the other hand, for the setting (i), Weisz et al. (2021a) introduced TensorPlan, a planner whose query cost is polynomial in all relevant quantities when the number of actions is fixed. Among other things, these two works left open the question whether polynomial query complexity is possible when $A$ is subexponential in $min(d,H)$. In this paper we answer this question in the negative: we show that an exponentially large lower bound holds when $A=\Omega(\min(d^{1/4},H^{1/2}))$, under either (i), (ii) or (iii). In particular, this implies a perhaps surprising exponential separation of query complexity compared to the work of Du et al. (2021) who prove a polynomial upper bound when (iii) holds for all states. Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).

本研究考虑了在线规划中基于生成模型的固定时标马尔可夫决策过程（MDP）中的极小化查询复杂度，特别关注线性函数逼近的情况，并基于先前的研究，都采用了广泛的问题类别，提出了TensorPlan，可在动作数量固定的情况下实现所有相关数量的多项式查询成本，但在本文中，我们在(ii)及(iii)情况下证明了当动作集大小可以选择为指数函数时查询复杂度为指数级，这意味着相对于对所有状态情况(iii)成立的Du等人的工作，查询复杂度有惊人的指数级分离，并且我们还证明了TensorPlan的上界可用于(iii)的情况，并且，对于具有确定性转换和随机奖励的MDP，TensorPlan的上界也可用于(ii)情况。

基于线性实现最优值函数的MDP计划的张量计划及少动作下限