Despite considerable advancements with deep neural language models, the
enigma of neural text degeneration persists when these models are tested as
text generators. The counter-intuitive empirical observation is that even
though the use of likelihood as training objective leads to high