Most language understanding models in dialog systems are trained on a small amount of annotated training data, and evaluated in a small set from the same distribution. However, these models can lead to system failure or undesirable outputs when being exposed to natural perturbation in practice. In this paper, we conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models, and introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural perturbation for testing the robustness issues in dialog systems. Four data augmentation approaches covering the three aspects are assembled in LAUG, which reveals critical robustness issues in state-of-the-art models. The augmented dataset through LAUG can be used to facilitate future research on the robustness testing of language understanding in dialog systems.

本研究针对自然语言理解模型在实际对话系统的应用中容易出现的波动和变化问题，提出了一种模型无关的工具箱LAUG，涵盖语言变体，语音特性和噪声扰动三个方面的四种数据增强方法，揭示了现有模型中的严重鲁棒性问题，提供了一种使用LAUG生成的增强数据集来促进语言理解测试鲁棒性的方法。

面向任务对话的语言理解鲁棒性测试