Extensive utilization of deep reinforcement learning (DRL) policy networks in diverse continuous control tasks has raised questions regarding performance degradation in expansive state spaces where the input state norm is larger than that in the training environment. This paper aims to uncover the underlying factors contributing to such performance deterioration when dealing with expanded state spaces, using a novel analysis technique known as state division. In contrast to prior approaches that employ state division merely as a post-hoc explanatory tool, our methodology delves into the intrinsic characteristics of DRL policy networks. Specifically, we demonstrate that the expansion of state space induces the activation function $\tanh$ to exhibit saturability, resulting in the transformation of the state division boundary from nonlinear to linear. Our analysis centers on the paradigm of the double-integrator system, revealing that this gradual shift towards linearity imparts a control behavior reminiscent of bang-bang control. However, the inherent linearity of the division boundary prevents the attainment of an ideal bang-bang control, thereby introducing unavoidable overshooting. Our experimental investigations, employing diverse RL algorithms, establish that this performance phenomenon stems from inherent attributes of the DRL policy network, remaining consistent across various optimization algorithms.

利用深度强化学习（DRL）策略网络在各种连续控制任务中的广泛应用引发了关于在输入状态规范大于训练环境中的状态规范的广泛状态空间中性能下降的问题。本文旨在使用一种称为状态划分的新型分析技术揭示处理扩展状态空间时导致性能恶化的潜在因素，与之前仅将状态划分作为事后解释工具的方法相比，我们的方法深入研究了DRL策略网络的内在特性。具体而言，我们证明状态空间的扩展会导致激活函数tanh表现出饱和性，从而使状态划分边界从非线性变为线性。我们的分析以双积分器系统为中心，揭示了这种逐渐向线性偏移的控制行为类似于鲍姆-鲍姆控制。然而，划分边界的固有线性性阻止了理想鲍姆-鲍姆控制的实现，从而引入了不可避免的过冲。我们的实验研究采用了各种强化学习算法，确定了这种性能现象源于DRL策略网络的固有属性，在各种优化算法中保持一致。

政策网络的泛化分析：双积分器的案例