Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or only effective in specific settings, such as for particular retrieval models, dataset domains, or query types. To answer this, we conduct the first comprehensive analysis of LM-based expansion. We find that there exists a strong negative correlation between retriever performance and gains from expansion: expansion improves scores for weaker models, but generally harms stronger models. We show this trend holds across a set of eleven expansion techniques, twelve datasets with diverse distribution shifts, and twenty-four retrieval models. Through qualitative error analysis, we hypothesize that although expansions provide extra information (potentially improving recall), they add additional noise that makes it difficult to discern between the top relevant documents (thus introducing false positives). Our results suggest the following recipe: use expansions for weaker models or when the target dataset significantly differs from training corpus in format; otherwise, avoid expansions to keep the relevance signal clear.

使用大型语言模型进行查询或文档扩展可以提高信息检索的泛化能力，但是否普遍有益或仅在特定环境下有效仍不清楚。本研究通过第一次全面分析基于语言模型的扩展，发现存在查找模型表现与扩展收益之间的强负相关性。我们的结果建议：对于较弱的模型或目标数据集与训练语料库在格式上存在显著差异的情况下，使用扩展；否则，避免扩展以保持相关性信号清晰。

生成查询和文档扩展何时失败？跨方法、检索器和数据集的综合研究