Patrick Nadeem Ward, Ariella Smofsky, Avishek Joey Bose
TL;DR该研究提出了一种基于 Soft Actor Critic 算法的正态流策略分布模型,增加了模型的表达能力以提高稳定性和适应稀疏奖励环境下的探索能力。
Abstract
deep reinforcement learning (DRL) algorithms for continuous action spaces are
known to be brittle toward hyperparameters as well as \cut{being}sample
inefficient. soft actor critic (SAC) proposes an off-policy de