While end-to-end neural conversation models have led to promising advances in
reducing hand-crafted features and errors induced by the traditional complex
system architecture, they typically require an enormous amount of data due to
the lack of modularity. Previous studies adopted a hy