This work demonstrates that substantial gains in zero-shot dialogue state tracking (DST) accuracy can be achieved by increasing the diversity of training data using synthetic data generation techniques. Current DST training resources are severely limited in the number of application domains and slot types they cover due to the high costs of data collection, resulting in limited adaptability to new domains. The presented work overcomes this challenge using a novel, fully automatic data generation approach to create synthetic zero-shot DST training resources. Unlike previous approaches for generating DST data, the presented approach generates entirely new application domains to generate dialogues, complete with silver dialogue state annotations and slot descriptions. This approach is used to create the D0T dataset for training zero-shot DST models, which covers an unprecedented 1,000+ domains. Experiments performed on the MultiWOZ benchmark indicate that training models on diverse synthetic data yields a performance improvement of +6.7% Joint Goal Accuracy, achieving results competitive with much larger models.

通过增加使用合成数据生成技术来提高零-shot对话状态追踪（DST）准确性的多样性的训练数据，本研究证明了可实现的巨大收益。该研究通过使用一种新颖的全自动数据生成方法来创建合成的零-shot DST训练资源，克服了当前DST训练资源在应用领域和槽类型方面的严重限制，该方法生成全新的应用领域以及具备银标注和槽描述的对话。该方法用于创建D0T数据集以训练零-shot DST模型，该数据集涵盖了1000多个领域。在MultiWOZ基准测试中进行的实验表明，使用多样化合成数据训练模型可以改善+6.7%的联合目标准确率，达到与更大模型相竞争的结果。

利用多样数据生成实现可调适的零样本对话状态追踪