neural dialogue models, despite their successes, still suffer from lack of relevance, diversity, and in many cases coherence in their generated responses. These issues have been attributed to reasons including (1) short-range model architectures that capture limited temporal dependenci