TL;DR本研究探讨了PLM基础检索模型中存在的源偏差问题,揭示了其对低困惑度文档的偏好现象。通过因果图分析,我们提出了Causal Diagnosis and Correction(CDC)作为一种新的去偏方法,显著提升了检索的有效性,确保文档质量评估的公正性。
Abstract
Previous studies have found that PLM-Based Retrieval models exhibit a preference for LLM-generated content, assigning higher relevance scores to these documents even when their semantic quality is comparable to human-written ones. This phenomenon, known as →