We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose Stochastic Control Guidance (SCG), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, please visit our project site: https://scg-rule-guided-music.github.io/.

我们研究了符号音乐生成中的非可微分规则引导问题，提出了一种名为Stochastic Control Guidance (SCG)的新型引导方法，以插拔方式与预训练的扩散模型结合使用，实现了对非可微分规则的无训练引导。我们还引入了具有高时间分辨率的潜在扩散架构，可与SCG以插拔方式相结合。与符号音乐生成的标准基线相比，该框架在音乐质量和基于规则的控制性能方面表现出明显的进步，优于当前各种场景下的最先进生成器。

用非可微分规则引导扩散进行符号音乐生成