In the past few years, cross-modal image-text retrieval (ITR) has experienced
increased interest in the research community due to its excellent research
value and broad real-world application. It is designed for the scenarios where
the queries are from one modality and the retrieval ga