Discrete reinforcement learning (RL) algorithms have demonstrated exceptional
performance in solving sequential decision tasks with discrete action spaces,
such as Atari games. However, their effectiveness is hindered when applied to
continuous control problems due to the challenge of dimensional explosion. In
this paper, we present the Soft Decomposed Policy-Critic (SDPC) architecture,
which combines soft RL and actor-critic techniques with discrete RL methods to
overcome this limitation. SDPC discretizes each action dimension independently
and employs a shared critic network to maximize the soft $Q$-function. This
novel approach enables SDPC to support two types of policies: decomposed actors
that lead to the Soft Decomposed Actor-Critic (SDAC) algorithm, and decomposed
$Q$-networks that generate Boltzmann soft exploration policies, resulting in
the Soft Decomposed-Critic Q (SDCQ) algorithm. Through extensive experiments,
we demonstrate that our proposed approach outperforms state-of-the-art
continuous RL algorithms in a variety of continuous control tasks, including
Mujoco's Humanoid and Box2d's BipedalWalker. These empirical results validate
the effectiveness of the SDPC architecture in addressing the challenges
associated with continuous control.

这篇论文介绍了 SDPC 架构，它将软强化学习和演员 - 评论家技术与离散强化学习方法相结合，以克服连续控制问题的挑战，实现了在多个连续控制任务中优于当前最先进的方法的表现。