For a control problem with multiple conflicting objectives, there exists a
set of Pareto-optimal policies called the Pareto set instead of a single
optimal policy. When a multi-objective control problem is continuous and
complex, traditional multi-objective reinforcement learning (MORL) algorithms
search for many Pareto-optimal deep policies to approximate the Pareto set,
which is quite resource-consuming. In this paper, we propose a simple and
resource-efficient MORL algorithm that learns a continuous representation of
the Pareto set in a high-dimensional policy parameter space using a single
hypernet. The learned hypernet can directly generate various well-trained
policy networks for different user preferences. We compare our method with two
state-of-the-art MORL algorithms on seven multi-objective continuous robot
control problems. Experimental results show that our method achieves the best
overall performance with the least training parameters. An interesting
observation is that the Pareto set is well approximated by a curved line or
surface in a high-dimensional parameter space. This observation will provide
insight for researchers to design new MORL algorithms.

多目标控制问题中，我们提出了一种简单高效的多目标强化学习算法，通过单独的超网络在高维策略参数空间中学习连续的 Pareto 解集，实现了不同用户偏好下的多种优化策略网络的直接生成，并在多个连续机器人控制问题上取得了最佳性能以及最少训练参数。