Generative Flow Networks (GFlowNets) are amortized sampling methods for
learning a stochastic policy to sequentially generate compositional objects
with probabilities proportional to their rewards. GFlowNets exhibit a
remarkable ability to generate diverse sets of high-reward objects, in contrast
to standard return maximization reinforcement learning approaches, which often
converge to a single optimal solution. Recent works have arisen for learning
goal-conditioned GFlowNets to acquire various useful properties, aiming to
train a single GFlowNet capable of achieving different goals as the task
specifies. However, training a goal-conditioned GFlowNet poses critical
challenges due to extremely sparse rewards, which is further exacerbated in
large state spaces. In this work, we propose a novel method named Retrospective
Backward Synthesis (RBS) to address these challenges. Specifically, RBS
synthesizes a new backward trajectory based on the backward policy in GFlowNets
to enrich training trajectories with enhanced quality and diversity, thereby
efficiently solving the sparse reward problem. Extensive empirical results show
that our method improves sample efficiency by a large margin and outperforms
strong baselines on various standard evaluation benchmarks.

通过回顾性逆向合成（RBS）方法，我们提出一种应对稀疏奖励问题的新方法，用于训练目标条件下的生成流网络（GFlowNets），并在各类标准评估基准上显著提高样本效率并优于强基准模型。

回顾性反向综合：面向目标条件 GFlowNets 的回顾性反向合成

Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned  GFlowNets

Multiagent reinforcement learning (MARL) can solve complex cooperative tasks.
However, the efficiency of existing MARL methods relies heavily on well-defined
reward functions. Multiagent tasks with sparse reward feedback are especially
challenging not only because of the credit distribution problem, but also due
to the low probability of obtaining positive reward feedback. In this paper, we
design a graph network called Cooperation Graph (CG). The Cooperation Graph is
the combination of two simple bipartite graphs, namely, the Agent Clustering
subgraph (ACG) and the Cluster Designating subgraph (CDG). Next, based on this
novel graph structure, we propose a Cooperation Graph Multiagent Reinforcement
Learning (CG-MARL) algorithm, which can efficiently deal with the sparse reward
problem in multiagent tasks. In CG-MARL, agents are directly controlled by the
Cooperation Graph. And a policy neural network is trained to manipulate this
Cooperation Graph, guiding agents to achieve cooperation in an implicit way.
This hierarchical feature of CG-MARL provides space for customized
cluster-actions, an extensible interface for introducing fundamental
cooperation knowledge. In experiments, CG-MARL shows state-of-the-art
performance in sparse reward multiagent benchmarks, including the anti-invasion
interception task and the multi-cargo delivery task.

本文提出了一种基于 Cooperation Graph 结构的 Multiagent Reinforcement Learning（CG-MARL）算法，通过设计一个网络结构来有效处理多智能体领域中的稀疏奖励问题，并在实验中展示出全面领先的性能表现。