text-to-motion generation has gained increasing attention, but most existing
methods are limited to generating short-term motions that correspond to a
single sentence describing a single action. However, when a text stream
describes a sequence of continuous motions, the generated motio