BriefGPT.xyz
May, 2025
Flow-GRPO:通过在线强化学习训练流匹配模型
Flow-GRPO: Training Flow Matching Models via Online RL
HTML
PDF
Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu...
TL;DR
本研究针对流匹配模型训练中的效率问题,提出了Flow-GRPO方法,该方法将在线强化学习与流匹配模型相结合。通过ODE到SDE的转换和去噪降维策略,我们显著提升了采样效率和生成质量,在文本生成和图像生成任务中表现出色,将GenEval准确率从63%提升至95%。
Abstract
We propose Flow-GRPO, the first method integrating online
Reinforcement Learning
(RL) into
Flow Matching
models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic
→