Recent advances in large language models have shown that autoregressive
modeling can generate complex and novel sequences that have many real-world
applications. However, these models must generate outputs autoregressively,
which becomes time-consuming when dealing with long sequences.