Most prior work in dialogue modeling has been on written conversations mostly because of existing data sets. However, written dialogues are not sufficient to fully capture the nature of spoken conversations as well as the potential speech recognition errors in practical spoken dialogue systems. This work presents a new benchmark on spoken task-oriented conversations, which is intended to study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling. We report that the existing state-of-the-art models trained on written conversations are not performing well on our spoken data, as expected. Furthermore, we observe improvements in task performances when leveraging n-best speech recognition hypotheses such as by combining predictions based on individual hypotheses. Our data set enables speech-based benchmarking of task-oriented dialogue systems.

研究口语任务导向对话状态跟踪和基于知识的对话建模，提出使用已有数据集不足的问题，借助n-best语音识别假设，改善任务绩效，并说明现有模型在口语数据方面存在不足，研究结果呈现有利于基于语音的任务导向对话系统的基准测试数据集。

评估基于任务的对话系统在口语交流中的鲁棒性：“你有多强大？