Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or exploiting cross-modal correspondence for more accurate retrieval, in this paper we aim to leverage the knowledge transfer between peer branches in a boosting manner to seek a more powerful matching model. Specifically, we propose a brand-new Deep Boosting Learning (DBL) algorithm, where an anchor branch is first trained to provide insights into the data properties, with a target branch gaining more advanced knowledge to develop optimal features and distance metrics. Concretely, an anchor branch initially learns the absolute or relative distance between positive and negative pairs, providing a foundational understanding of the particular network and data distribution. Building upon this knowledge, a target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples. Extensive experiments validate that our DBL can achieve impressive and consistent improvements based on various recent state-of-the-art models in the image-text matching field, and outperform related popular cooperative strategies, e.g., Conventional Distillation, Mutual Learning, and Contrastive Learning. Beyond the above, we confirm that DBL can be seamlessly integrated into their training scenarios and achieve superior performance under the same computational costs, demonstrating the flexibility and broad applicability of our proposed method. Our code is publicly available at: https://github.com/Paranioar/DBL.

图像-文本匹配仍然是一项具有挑战性的任务，由于模态之间异构的语义多样性和三元组内不足的距离可分性。与之前的方法不同，我们旨在通过增强聚类方法中的知识转移来寻求更强大的匹配模型。具体地说，我们提出了一种全新的深度增强学习（DBL）算法，其中锚点分支首先被训练以提供对数据属性的洞察，而目标分支获取更先进的知识以开发出最佳特征和距离度量。通过实验证实，我们的DBL能够在图像-文本匹配领域的各种最新先进模型的基础上取得令人印象深刻且一致的改进，并且优于相关的普遍合作策略，例如常规蒸馏、互联学习和对应学习。此外，我们证实DBL可以无缝集成到它们的训练场景中，并在相同的计算成本下实现卓越性能，从而展示了我们提出的方法的灵活性和广泛适用性。我们的代码可以在此https URL上公开获取。

深度增强学习：一种全新的图像文本匹配协作方法