BriefGPT.xyz
Dec, 2022
自然语言生成数据集中数据错误的追踪和清除
Tracing and Removing Data Errors in Natural Language Generation Datasets
HTML
PDF
Faisal Ladhak, Esin Durmus, Tatsunori Hashimoto
TL;DR
该研究提出了一种框架,利用基于对比度的算法识别和清除训练数据中的一些低质量样本,从而实现减少自然语言生成任务中的幻觉和不忠实输出的目的。
Abstract
Recent work has identified noisy and
misannotated data
as a core cause of hallucinations and unfaithful outputs in
natural language generation
(NLG) tasks. Consequently, identifying and removing these examples is
→