We propose a novel value approximation method, namely Eigensubspace
Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated
by an analysis of the dynamics of Q-value approximation error in the
Temporal-Difference (TD) method, which follows a path defined by the
1-eigensubspace of the transition kernel associated with the Markov Decision
Process (MDP). It reveals a fundamental property of TD learning that has
remained unused in previous deep RL approaches. In ERC, we propose a
regularizer that guides the approximation error tending towards the
1-eigensubspace, resulting in a more efficient and stable path of value
approximation. Moreover, we theoretically prove the convergence of the ERC
method. Besides, theoretical analysis and experiments demonstrate that ERC
effectively reduces the variance of value functions. Among 26 tasks in the
DMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides,
it shows significant advantages in Q-value approximation and variance
reduction. Our code is available at this https URL

提出了一种新的深度强化学习的价值估计方法：Eigensubspace Regularized Critic (ERC)，该方法可以更高效、更稳定地进行价值估计，并在 DMControl 基准测试中，ERC 优于其他先进方法在 20 个任务上，具备在 Q 值估计和方差降低方面的显着优势。

时差动力学的特征子空间及其在强化学习中改善价值估计的应用

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value  Approximation in Reinforcement Learning

In Reinforcement Learning (RL), Convolutional Neural Networks(CNNs) have been
successfully applied as function approximators in Deep Q-Learning algorithms,
which seek to learn action-value functions and policies in various
environments. However, to date, there has been little work on the learning of
symmetry-transformation equivariant representations of the input environment
state. In this paper, we propose the use of Equivariant CNNs to train RL agents
and study their inductive bias for transformation equivariant Q-value
approximation. We demonstrate that equivariant architectures can dramatically
enhance the performance and sample efficiency of RL agents in a highly
symmetric environment while requiring fewer parameters. Additionally, we show
that they are robust to changes in the environment caused by affine
transformations.

本文提出使用 Equivariant CNNs 训练强化学习智能体并研究其在对称变换方面的归纳偏差，结果表明在高度对称的环境中，使用 Equivariant CNNs 可以显著提高智能体的性能和样本效率，同时还需要更少的参数，而且它们对仿射变换引起的环境变化具有鲁棒性。