Recent progress in music generation has been remarkably advanced by the
state-of-the-art MusicLM, which comprises a hierarchy of three LMs,
respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet,
sampling with the MusicLM requires processing through these LMs one