This paper introduces a model for incomplete utterance restoration (IUR). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the context contribute to restoration. From this, we construct a Picker that identifies the omitted tokens. To support the picker, we design two label creation methods (soft and hard labels), which can work in cases of no annotation of the omitted tokens. The restoration is done by using a Generator with the help of the Picker on joint learning. Promising results on four benchmark datasets in extraction and abstraction scenarios show that our model is better than the pretrained T5 and non-generative language model methods in both rich and limited training data settings. The code will be also available.

本文介绍了一种针对不完整语音恢复的模型，名为JET（联合学习令牌提取和文本生成）。我们设计了一种简单但有效的模型，可同时适用于提取或抽象数据集的情况。通过使用Picker来识别省略的token，我们构建一个模型模拟IUR的本质，其中上下文中省略的token有助于恢复。我们设计了两种标签创建方法（软标签和硬标签），以支持Picker。通过建模和训练，本文得出在四个基准数据集上的不错结果，表明相比于预训练的T5和非生成语言模型方法，在富数据或有限数据训练环境下，我们的模型表现更佳。

联合学习Token抽取和文本生成来增强不完整话语恢复