BriefGPT.xyz
Oct, 2023
将文本到图像扩散模型与奖励反向传播对齐
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
HTML
PDF
Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki
TL;DR
AlignProp是一种用于将扩散模型与下游奖励函数对齐的方法,通过反向传播奖励梯度穿越去噪过程,它在较少的训练步骤中实现了更高的奖励,且概念上更简单,因此对于优化不同iable reward functions感兴趣的扩散模型来说是一个直观的选择。
Abstract
text-to-image
diffusion models
have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their
→