Sep, 2023

细粒度迟期交互多模态检索用于检索增强视觉问答

TL;DRFine-grained Late-interaction Multi-modal Retrieval (FLMR) significantly improves knowledge retrieval in Retrieval-Augmented Visual Question Answering (RA-VQA) by addressing limitations in the retriever and achieving approximately 8% improvement in PRRecall@5. Equipped with state-of-the-art models, RA-VQA achieves around 61% VQA score in the OK-VQA dataset.