Generative flow networks (GFlowNets) are a family of algorithms that learn a
generative policy to sample discrete objects $x$ with non-negative reward
$R(x)$. Learning objectives guarantee the GFlowNet samples $x$ from the target
distribution $p^*(x) \propto R(x)$ when loss is globally minimized over all
states or trajectories, but it is unclear how well they perform with practical
limits on training resources. We introduce an efficient evaluation strategy to
compare the learned sampling distribution to the target reward distribution. As
flows can be underdetermined given training data, we clarify the importance of
learned flows to generalization and matching $p^*(x)$ in practice. We
investigate how to learn better flows, and propose (i) prioritized replay
training of high-reward $x$, (ii) relative edge flow policy parametrization,
and (iii) a novel guided trajectory balance objective, and show how it can
solve a substructure credit assignment problem. We substantially improve sample
efficiency on biochemical design tasks.

本文介绍了基于 GFlowNets 算法的生成模型策略，探究了如何在实际训练资源限制下实现更好的样本效率和匹配目标分布，提出了优先回放、相对边流策略参数化和新的引导轨迹平衡目标等方法来提高样本效率，有效解决了一些结构学分配问题。

探究与改进 GFlowNet 的训练

Towards Understanding and Improving GFlowNet Training

Experience replay is a key technique behind many recent advances in deep
reinforcement learning. Allowing the agent to learn from earlier memories can
speed up learning and break undesirable temporal correlations. Despite its
wide-spread application, very little is understood about the properties of
experience replay. How does the amount of memory kept affect learning dynamics?
Does it help to prioritize certain experiences? In this paper, we address these
questions by formulating a dynamical systems ODE model of Q-learning with
experience replay. We derive analytic solutions of the ODE for a simple
setting. We show that even in this very simple setting, the amount of memory
kept can substantially affect the agent's performance. Too much or too little
memory both slow down learning. Moreover, we characterize regimes where
prioritized replay harms the agent's learning. We show that our analytic
solutions have excellent agreement with experiments. Finally, we propose a
simple algorithm for adaptively changing the memory buffer size which achieves
consistently good empirical performance.

本研究提出了一种使用经验回放的深度强化学习模型，并通过 ODE 模型及实验进行研究发现，在适当的记忆大小下，可以加速学习并提高代理人的表现，但当记忆容量偏大或偏小时，学习反而变慢，同时也证明了优先重放经验不一定有助于提高代理人的学习效果。最后，我们提出了一种自适应调整记忆缓冲区大小的算法，其表现良好。