The key challenge in learning dense correspondences lies in the lack of
ground-truth matches for real image pairs. While photometric consistency losses
provide unsupervised alternatives, they struggle with large
Dense visual correspondence is established between images using Doduo, which learns general dense visual correspondence without ground truth supervision. The method incorporates semantic priors and self-supervised flow training to produce accurate correspondence robust to dynamic scene changes, resulting in superior performance for point-level correspondence estimation.