We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem.

本文介绍了一种新的用于价值函数逼近的方法，它将线性时间差分强化学习与子空间识别相结合，并使用一个新的算法Predictive State Temporal Difference(PSTD) learning。该方法可以将含有大量特征的状态向量进行线性投影，得到Preditive State向量，同时使用Bellman递归方法对其价值函数进行估计。我们研究了PSTD方法的RL和SSID的建立联系，证明了PSTD的统计意义，并在一个困难的最优停止问题上展示了其潜力。

预测状态临时差异学习