3D generation has rapidly accelerated in the past decade owing to the progress in the field of generative modeling. Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent. Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrated to improve the 2D diffusion models using an aesthetic scoring function. We first show that this aesthetic scorer acts as a strong guide for a variety of SDS-based methods and demonstrates its effectiveness in text-to-3D synthesis. Further, we leverage the DDPO approach to improve the quality of the 3D rendering obtained from 2D diffusion models. Our approach, DDPO3D, employs the policy gradient method in tandem with aesthetic scoring. To the best of our knowledge, this is the first method that extends policy gradient methods to 3D score-based rendering and shows improvement across SDS-based methods such as DreamGaussian, which are currently driving research in text-to-3D synthesis. Our approach is compatible with score distillation-based methods, which would facilitate the integration of diverse reward functions into the generative process. Our project page can be accessed via https://ddpo3d.github.io.

3D生成在过去十年中迅速发展，得益于生成建模领域的进步。得分蒸馏采样（SDS）渲染大大提高了3D资源生成的水平。此外，最近的溯源扩散策略优化（DDPO）工作表明扩散过程与策略梯度方法兼容，并已通过美学评分函数改进了2D扩散模型。我们首先展示了这个美学评分器在各种基于SDS的方法中作为强有力的指导，并展示了它在文本到3D合成中的有效性。此外，我们利用DDPO方法改进了从2D扩散模型获得的3D渲染质量。我们的方法DDPO3D采用了策略梯度方法和美学评分。据我们所知，这是第一种将策略梯度方法扩展到基于得分的3D渲染的方法，并显示了对SDS-based方法（如DreamGaussian）的改进。我们的方法与基于得分蒸馏的方法兼容，可以将各种奖励函数融入生成过程中。您可以通过此网址访问我们的项目页面 https URL。

RL梦想：基于得分传递的三维生成的策略梯度优化