autoregressive models (ARMs) currently hold state-of-the-art performance in
likelihood-based modeling of image and audio data. Generally, neural network
based ARMs are designed to allow fast inference, but sampling from these models
is impractically slow. In this paper, we introduce th