BriefGPT.xyz
Jul, 2023
基于奖励的条件扩散: 可证明的分布估计与奖励优化
Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
HTML
PDF
Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, Mengdi Wang
TL;DR
探索基于条件扩散模型的奖励定向生成方法和理论。此生成器可有效地学习和采样奖励条件的数据分布,并且生成新的群体移向用户指定的目标奖励值,通过实证研究验证这一理论并探究外推强度与样本质量之间的关系。
Abstract
We explore the methodology and theory of
reward-directed generation
via
conditional diffusion models
. Directed generation aims to generate samples with desired properties as measured by a reward function, which h
→