text-to-image person re-identification (TIReID) is a compelling topic in the
cross-modal community, which aims to retrieve the target person based on a
textual query. Although numerous TIReID methods have been proposed and achieved
promising performance, they implicitly assume the trai
Composed Image Retrieval (CIR) using zero-shot setting and CLIP encoders can be improved by reducing task discrepancy through novel target-anchored contrastive learning for text encoders.