Covering option discovery has been developed to improve the exploration of
reinforcement learning in single-agent scenarios with sparse reward signals,
through connecting the most distant states in the embedding space provided by
the Fiedler vector of the state transition graph. However, these option
discovery methods cannot be directly extended to multi-agent scenarios, since
the joint state space grows exponentially with the number of agents in the
system. Thus, existing researches on adopting options in multi-agent scenarios
still rely on single-agent option discovery and fail to directly discover the
joint options that can improve the connectivity of the joint state space of
agents. In this paper, we show that it is indeed possible to directly compute
multi-agent options with collaborative exploratory behaviors among the agents,
while still enjoying the ease of decomposition. Our key idea is to approximate
the joint state space as a Kronecker graph -- the Kronecker product of
individual agents' state transition graphs, based on which we can directly
estimate the Fiedler vector of the joint state space using the Laplacian
spectrum of individual agents' transition graphs. This decomposition enables us
to efficiently construct multi-agent joint options by encouraging agents to
connect the sub-goal joint states which are corresponding to the minimum or
maximum values of the estimated joint Fiedler vector. The evaluation based on
multi-agent collaborative tasks shows that the proposed algorithm can
successfully identify multi-agent options, and significantly outperforms prior
works using single-agent options or no options, in terms of both faster
exploration and higher cumulative rewards.

本文提出了一种基于 Kronecker 图的多智能体协同探索的选项发现方法，通过鼓励智能体连接相应的最小或最大 Fiedler 向量，构建多智能体共同目标状态，从而在多智能体任务中实现更快的探索和更高的累积奖励。

使用因子图为基础的表格强化学习中学习多智能体选项

Learning Multi-agent Options for Tabular Reinforcement Learning using Factor Graphs

Representation learning and option discovery are two of the biggest
challenges in reinforcement learning (RL). Proto-value functions (PVFs) are a
well-known approach for representation learning in MDPs. In this paper we
address the option discovery problem by showing how PVFs implicitly define
options. We do it by introducing eigenpurposes, intrinsic reward functions
derived from the learned representations. The options discovered from
eigenpurposes traverse the principal directions of the state space. They are
useful for multiple tasks because they are discovered without taking the
environment's rewards into consideration. Moreover, different options act at
different time scales, making them helpful for exploration. We demonstrate
features of eigenpurposes in traditional tabular domains as well as in Atari
2600 games.

本文介绍了如何通过引入特征奇异目的 (intrinsic reward functions) 从学习过后的 PVFs 中隐式定义选项来解决选项发现问题，从而在强化学习中同时解决了表示学习和选项发现两大难题。