In this paper we propose a hybrid architecture of actor-critic algorithms for
reinforcement learning in parameterized action space, which consists of
multiple parallel sub-actor networks to decompose the structured action space
into simpler action spaces along with a critic network to guide the training of
all sub-actor networks. While this paper is mainly focused on parameterized
action space, the proposed architecture, which we call hybrid actor-critic, can
be extended for more general action spaces which has a hierarchical structure.
We present an instance of the hybrid actor-critic architecture based on
proximal policy optimization (PPO), which we refer to as hybrid proximal policy
optimization (H-PPO). Our experiments test H-PPO on a collection of tasks with
parameterized action space, where H-PPO demonstrates superior performance over
previous methods of parameterized action reinforcement learning.

介绍了一种混合体结构的深度强化学习算法，其包含多个并行的子演员网络和一个评论家网络，可以将结构化的行动空间分解为更简单的行动空间，并指导所有子演员网络的训练。该算法在参数化行动空间中展示了出色的表现。

参数化动作空间中的混合演员 - 评论家强化学习

Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space

We explore Deep Reinforcement Learning in a parameterized action space.
Specifically, we investigate how to achieve sample-efficient end-to-end
training in these tasks. We propose a new compact architecture for the tasks
where the parameter policy is conditioned on the output of the discrete action
policy. We also propose two new methods based on the state-of-the-art
algorithms Trust Region Policy Optimization (TRPO) and Stochastic Value
Gradient (SVG) to train such an architecture. We demonstrate that these methods
outperform the state of the art method, Parameterized Action DDPG, on test
domains.

本文提出了一种用于在参数化操作空间中进行强化学习的新型紧凑架构，并探讨了如何使用现有算法（TRPO, SVG）进行训练，结果表明这些方法在测试时优于当前最先进的方法 Parameterized Action DDPG。