Text retrieval plays a crucial role in incorporating factual knowledge for
decision making into language processing pipelines, ranging from chat-based web
search to question answering systems. Current state-of-the-art text retrieval
models leverage pre-trained large language models (LLMs) to achieve competitive
performance, but training LLM-based retrievers via typical contrastive losses
requires intricate heuristics, including selecting hard negatives and using
additional supervision as learning signals. This reliance on heuristics stems
from the fact that the contrastive loss itself is heuristic and does not
directly optimize the downstream metrics of decision quality at the end of the
processing pipeline. To address this issue, we introduce Neural PG-RANK, a
novel training algorithm that learns to rank by instantiating a LLM as a
Plackett-Luce ranking policy. Neural PG-RANK provides a principled method for
end-to-end training of retrieval models as part of larger decision systems via
policy gradient, with little reliance on complex heuristics, and it effectively
unifies the training objective with downstream decision-making quality. We
conduct extensive experiments on various text retrieval benchmarks. The results
demonstrate that when the training objective aligns with the evaluation setup,
Neural PG-RANK yields remarkable in-domain performance improvement, with
substantial out-of-domain generalization to some critical datasets employed in
downstream question answering tasks.

通过利用大规模预训练语言模型，我们引入了一种名为 Neural PG-RANK 的新型训练算法，该算法通过实例化一个语言模型为 Plackett-Luce 排序策略，为检索模型的端到端训练提供了一种合理的方法，并有效地将训练目标与下游决策质量相统一。实验证明，当训练目标与评估设置一致时，Neural PG-RANK 在领域内表现出卓越的性能提升，并在下游问答任务中对一些关键数据集进行了实质性的跨领域泛化。