In recent years, Deep Reinforcement Learning has made impressive advances in solving several important benchmark problems for sequential decision making. Many control applications use a generic multilayer perceptron (MLP) for non-vision parts of the policy network. In this work, we propose a new neural network architecture for the policy network representation that is simple yet effective. The proposed Structured Control Net (SCN) splits the generic MLP into two separate sub-modules: a nonlinear control module and a linear control module. Intuitively, the nonlinear control is for forward-looking and global control, while the linear control stabilizes the local dynamics around the residual of global control. We hypothesize that this will bring together the benefits of both linear and nonlinear policies: improve training sample efficiency, final episodic reward, and generalization of learned policy, while requiring a smaller network and being generally applicable to different training methods. We validated our hypothesis with competitive results on simulations from OpenAI MuJoCo, Roboschool, Atari, and a custom 2D urban driving environment, with various ablation and generalization tests, trained with multiple black-box and policy gradient training methods. The proposed architecture has the potential to improve upon broader control tasks by incorporating problem specific priors into the architecture. As a case study, we demonstrate much improved performance for locomotion tasks by emulating the biological central pattern generators (CPGs) as the nonlinear part of the architecture.

本文提出了一种名为Structured Control Net的新型神经网络架构，将通用的MLP拆分为非线性控制模块和线性控制模块，以利用线性和非线性策略的两者优点并改善训练样本效率、最终奖励和学习策略的泛化能力。该结构在来自OpenAI MuJoCo、Roboschool、Atari和自定义的2D城市驾驶环境的竞争性模拟测试中有竞争力的结果，并具有将特定问题先验导入网络架构来改进广泛控制任务的潜力。

深度强化学习的结构控制网络