Filtering noisy training data is one of the key approaches to improving the quality of neural network-based language generation. The dialogue research community especially suffers from a lack of less-noisy and sufficiently large data. In this work, we propose a scoring function that is specifically designed to identify low-quality utterance--response pairs t