Introducing variability while maintaining coherence is a core task in
learning to generate utterances in conversation. Standard neural
encoder-decoder models and their extensions using conditional variational
autoencoder often result in either trivial or digressive responses. To overco