Despite its wide use, recent studies have revealed unexpected and undesirable
properties of neural autoregressive sequence models trained with maximum
likelihood, such as an unreasonably high affinity to short sequences after
training and to infinitely long sequences at decoding time.