Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like distributed representations (GloVe) and purely supervised approaches. We introduce the Dual Intent and Entity Transformer (DIET) architecture, and study the effectiveness of different pre-trained representations on intent and entity prediction, two common dialogue language understanding tasks. DIET advances the state of the art on a complex multi-domain NLU dataset and achieves similarly high performance on other simpler datasets. Surprisingly, we show that there is no clear benefit to using large pre-trained models for this task, and in fact DIET improves upon the current state of the art even in a purely supervised setup without any pre-trained embeddings. Our best performing model outperforms fine-tuning BERT and is about six times faster to train.

DIET架构研究了不同预训练表示对意向和实体预测的有效性，并在多领域NLU数据集上取得了最新的技术水平，没有明显的使用大规模预训练模型的好处，实际上DIET即使在没有预先训练嵌入的情况下，也改进了现有技术水平，效果最佳的模型优于Fine-tuning BERT并且训练速度快六倍。

DIET: 对话系统的轻量级语言理解