Recent approaches to Open-domain Question Answering refer to an external
knowledge base using a retriever model, optionally rerank passages with a
separate reranker model and generate an answer using another reader model.
Despite performing related tasks, the models have separate parameters and are
weakly-coupled during training. We propose casting the retriever and the
reranker as internal passage-wise attention mechanisms applied sequentially
within the transformer architecture and feeding computed representations to the
reader, with the hidden representations progressively refined at each stage.
This allows us to use a single question answering model trained end-to-end,
which is a more efficient use of model capacity and also leads to better
gradient flow. We present a pre-training method to effectively train this
architecture and evaluate our model on the Natural Questions and TriviaQA open
datasets. For a fixed parameter budget, our model outperforms the previous
state-of-the-art model by 1.0 and 0.7 exact match scores.

本研究提出了一种基于 transformer 架构的内部逐层注意力机制，将知识库搜索模型和重排模型整合为统一的模型，并进行端到端训练，以实现高效利用模型容量并提高梯度流量。该模型可以在固定参数预算内显著优于现有模型，达到 1.0 和 0.7 的精准匹配得分。