BriefGPT.xyz
Nov, 2023
在多语言多层次检索中,利用LLMs合成训练数据
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
HTML
PDF
Nandan Thakur, Jianmo Ni, Gustavo Hernández Ábrego, John Wieting, Jimmy Lin...
TL;DR
通过使用SWIM-IR合成训练数据集,我们研究了多语言密集检索模型的能力,并在三个检索基准上对其进行了全面评估,发现SWIM-IR可以以较低成本替代昂贵的人工标记检索训练数据。
Abstract
dense retrieval models
have predominantly been studied for English, where models have shown great success, due to the availability of human-labeled training pairs. However, there has been limited success for
multilingua
→