In this paper we discuss policy iteration methods for approximate solution of
a finite-state discounted Markov decision problem, with a focus on
feature-based aggregation methods and their connection with deep reinforcement
learning schemes. We introduce features of the states of the original problem,
and we formulate a smaller "aggregate" Markov decision problem, whose states
relate to the features. We discuss properties and possible implementations of
this type of aggregation, including a new approach to approximate policy
iteration. In this approach the policy improvement operation combines
feature-based aggregation with feature construction using deep neural networks
or other calculations. We argue that the cost function of a policy may be
approximated much more accurately by the nonlinear function of the features
provided by aggregation, than by the linear function of the features provided
by neural network-based reinforcement learning, thereby potentially leading to
more effective policy improvement.

本文介绍了针对有限状态折扣马尔可夫决策问题的近似解法 - 政策迭代方法，重点关注基于特征聚合的方法以及它们与深度强化学习方案的关系。本文提出了原问题状态的特征并且制定了一个更小的 “聚合” 的马尔可夫决策问题，其状态与特征相关。我们讨论了这种聚合的性质和可能的实现，其中包括一种利用深度神经网络或其他计算进行特征构建的新近似政策迭代方法。我们认为，通过聚合提供的特征的非线性函数比神经网络强化学习提供的特征的线性函数更能够准确地逼近策略的成本函数，从而潜在地导致更有效的政策改进。