Ubuntu对话语料库：一份用于非结构化多轮对话系统研究的大型数据集

Jun, 2015

Ubuntu对话语料库：一份用于非结构化多轮对话系统研究的大型数据集

The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

Ryan Lowe, Nissan Pow, Iulian Serban, Joelle Pineau

TL;DR介绍Ubuntu对话语料库，包含近100万个多轮对话，可以用于建立基于神经语言模型的对话管理器，同时提供适用于此数据集的两种神经学习架构，并在选择最佳下一个响应的任务上提供了基准表现。

Abstract

This paper introduces the ubuntu dialogue corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building →