Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find that noisier datasets do indeed lead to more hallucination. We argue that the ability of forward and reverse models trained on a dataset to cyclically regenerate source KG or text is a proxy for the equivalence between the KG and the text in the dataset. Using cyclic evaluation we find that manually created WebNLG is much better than automatically created TeKGen and T-REx. Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation. We also construct two synthetic datasets using large language models (LLMs), and observe that these are conducive to models that perform significantly well on cyclic generation of text, but less so on cyclic generation of KGs, probably because of a lack of a consistent underlying ontology.

使用知识图谱和文本配对的数据集可以训练生成文本和生成知识图谱的前向和反向神经模型，但在配对不等效的数据集上训练的模型可能会导致更多的错误生成和较差的回想。本文通过生成具有不同噪声水平的数据集并进行实证验证了这一点，并通过循环评估发现人工创建的WebNLG比自动生成的TeKGen和T-REx要好。基于这些观察，我们构建了一个名为LAGRANGE的新的改进数据集，使用旨在改善知识图谱和文本之间等效性的启发式方法，并展示了每个启发式方法对循环评估的影响。我们还使用大型语言模型构建了两个合成数据集，并观察到这些数据集有助于在文本的循环生成上取得显著性能，但在知识图谱的循环生成上不够有效，可能是因为缺乏一致的底层本体论。

基于循环评估的配对知识图谱-文本数据集构建