Recent approaches towards passage retrieval have successfully employed representations from pretrained Language Models(LMs) with large effectiveness gains. However, due to high computational cost those approaches are usually limited to re-ranking scenarios. The candidates in such a scenario are typically retrieved by scalable bag-of-words retrieval models such as BM25. Although BM25 has proven decent performance as a first-stage ranker, it tends to miss relevant passages. In this context we propose CoRT, a framework and neural first-stage ranking model that leverages contextual representations from transformer-based language models to complement candidates from term-based ranking functions while causing no significant delay. Using the MS MARCO dataset, we show that CoRT significantly increases first-stage ranking quality and recall by complementing BM25 with missing candidates. Consequently, we found subsequent re-rankers achieve superior results while requiring less candidates to saturate ranking quality. Finally, we demonstrate that with CoRT a representation-focused retrieval at web-scale can be realized with latencies as low as BM25.

本论文提出了一种简单的神经第一阶段排序模型CoRT，通过利用预训练语言模型（如BERT）的上下文表示来补充基于术语的排名函数，从而在不影响查询时间的情况下提高候选集的召回率。使用MS MARCO数据集，展示了CoRT能够显著提高候选集的召回率，从而使得后续的重新排名器可以通过更少的候选集获得更好的结果。此外，我们还展示了使用CoRT进行段落检索具有惊人的低延迟。

CoRT: 基于Transformer的互补排名