Soft actor-critic (SAC) in reinforcement learning is expected to be one of
the next-generation robot control schemes. Its ability to maximize policy
entropy would make a robotic controller robust to noise and perturbation, which
is useful for real-world robot applications. However, the priority of
maximizing the policy entropy is automatically tuned in the current
implementation, the rule of which can be interpreted as one for equality
constraint, binding the policy entropy into its specified target value. The
current SAC is therefore no longer maximize the policy entropy, contrary to our
expectation. To resolve this issue in SAC, this paper improves its
implementation with a slack variable for appropriately handling the inequality
constraint to maximize the policy entropy. In Mujoco and Pybullet simulators,
the modified SAC achieved the higher robustness and the more stable learning
than before while regularizing the norm of action. In addition, a real-robot
variable impedance task was demonstrated for showing the applicability of the
modified SAC to real-world robot control.

本篇论文在软性演员批评的强化学习中加入松弛变量，以适当处理不等式约束，最大化策略熵，从而实现了更高的稳定性和更稳定的学习，适用于真实世界的机器人控制。