Generating high-quality text with sufficient diversity is essential for a wide range of natural language generation (NLG) tasks. Maximum-Likelihood (MLE) models trained with teacher forcing have constantly been reported as weak baselines, where poor performance is attributed to