We introduce a novel setup for low-resource task-oriented semantic parsing which incorporates several constraints that may arise in real-world scenarios: (1) lack of similar datasets/models from a related domain, (2) inability to sample useful logical forms directly from a grammar, and (3) privacy requirements for unlabeled natural utterances. Our goal is to improve a low-resource semantic parser using utterances collected through user interactions. In this highly challenging but realistic setting, we investigate data augmentation approaches involving generating a set of structured canonical utterances corresponding to logical forms, before simulating corresponding natural language and filtering the resulting pairs. We find that such approaches are effective despite our restrictive setup: in a low-resource setting on the complex SMCalFlow calendaring dataset (Andreas et al., 2020), we observe 33% relative improvement over a non-data-augmented baseline in top-1 match.

本文介绍了一种应用于低资源任务导向语义解析的新方法，其结合了现实场景中可能出现的多个限制条件，包括缺少相关领域的相似数据集/模型，无法直接从语法中采样有用的逻辑形式以及对未标记的自然语言要求保密性等。我们的目标是通过用户交互收集一些话语来改进低资源语义解析器。在这个高度具有挑战性但现实的设置中，我们探讨了涉及生成一组与逻辑形式相对应的结构化规范话语、模拟相应自然语言并过滤结果对的数据增强方法。我们发现，这种方法在我们的严格设置下也是有效的：在复杂的SMCalFlow日历数据集（Andreas et al.，2020）的低资源情况下，我们观察到与非数据增强基线相比，在前1个匹配中有33%的相对改进。

通过数据增强解决语义解析中的资源和隐私限制