通过连通性和内容相关性过滤嘈杂的对话语料库

Apr, 2020

通过连通性和内容相关性过滤嘈杂的对话语料库

Utterance Pair Scoring for Noisy Dialogue Data Filtering

Reina Akama, Sho Yokoi, Jun Suzuki, Kentaro Inui

TL;DR本文提出了一种基于对话和语言学研究共同发现的相关性和连接性评分方法，用于评估大规模对话数据集中话语对的质量，并过滤掉潜在的不可接受的话语对，以提高神经对话代理的响应生成能力。

Abstract

Filtering noisy training data is one of the key approaches to improving the quality of neural network-based language generation. The dialogue research community especially suffers from a lack of less-noisy and sufficiently large data. In this work, we propose a scoring function that is specifically designed to identify low-quality utterance--response pairs t