Movatterモバイル変換

Taihong Xiao¹,
Sifei Liu²,
Shalini De Mello²,
Zhiding Yu²,
Jan Kautz² &
…
Ming-Hsuan Yang ORCID:orcid.org/0000-0003-4848-2304^1,3

1163Accesses
1Altmetric
Explore all metrics

ACorrection to this article was published on 13 April 2022

This article has beenupdated

Abstract

Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale. Most existing methods focus on designing various matching modules using fully-supervised ImageNet pretrained networks. On the other hand, while a variety of self-supervised approaches are proposed to explicitly measure image-level similarities, correspondence matching the pixel level remains under-explored. In this work, we propose a multi-level contrastive learning approach for semantic matching, which does not rely on any ImageNet pretrained model. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects, while the performance can be further enhanced by regularizing cross-instance cycle-consistency at intermediate feature levels. Experimental results on the PF-PASCAL, PF-WILLOW, and SPair-71k benchmark datasets demonstrate that our method performs favorably against the state-of-the-art approaches.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Grownbb: Gromov–Wasserstein learning of neural best buddies for cross-domain correspondence

Article12 February 2024

Semi-supervised Semantic Matching

Semantic-Aware Fine-Grained Correspondence

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Change history

13 April 2022
A Correction to this paper has been published:https://doi.org/10.1007/s11263-022-01614-8

References

Bristow, H., Valmadre, J., & Lucey, S. (2015). Dense semantic correspondence where every pixel is a classifier. IEEE International Conference on Computer Vision (ICCV) pp 4024–4031
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. International Conference on Machine Learning (ICML)
Chen, Y. C., Huang, P. H., Yu, L. Y., Huang, J. B., Yang, M. H., & Lin, Y. Y. (2018). Deep semantic matching with foreground detection and cycle-consistency. In: Asian Conference on Computer Vision (ACCV), Springer, pp 347–362
Choy, C. B., Gwak, J., Savarese, S., & Chandraker, M. (2016). Universal correspondence network. Neural Information Processing Systems (NeurIPS)
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Dale, K., Johnson, M. K., Sunkavalli, K., Matusik, W., & Pfister, H. (2009). Image restoration using online photo collections. IEEE International Conference on Computer Vision (ICCV) pp 2217–2224
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 248–255
Doersch, C., Gupta, A., & Efros, A. A. (2015). Unsupervised visual representation learning by context prediction. IEEE International Conference on Computer Vision (ICCV) pp 1422–1430
Duchenne, O., Joulin, A., & Ponce, J. (2011). A graph-matching kernel for object categorization. In: IEEE International Conference on Computer Vision (ICCV), pp 1792–1799
Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2014). The pascal visual object classes challenge: A retrospective.International Journal on Computer Vision (IJCV),111, 98–136.
Article Google Scholar
Gidaris, S., Singh, P., & Komodakis, N. (2018). Unsupervised representation learning by predicting image rotations. International Conference on Learning Representations (ICLR)
Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., Doersch, C., Pires, B. A., Guo, Z. D., & Azar, M. G., et al. (2020).. Bootstrap your own latent: A new approach to self-supervised learning. Neural Information Processing Systems (NeurIPS)
Ham, B., Cho, M., Schmid, C., & Ponce, J. (2016). Proposal flow. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3475–3484
Ham, B., Cho, M., Schmid, C., & Ponce, J. (2018). Proposal flow: Semantic correspondences from object proposals.IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI),40, 1711–1725.
Article Google Scholar
Han, K., Rezende, R. S., Ham, B., Wong, K.Y.K., Cho, M., Schmid, C., & Ponce, J. (2017). Scnet: Learning semantic correspondence. In: IEEE International Conference on Computer Vision (ICCV)
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 770–778
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 9729–9738
Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., & Bengio, Y. (2019). Learning deep representations by mutual information estimation and maximization. International Conference on Learning Representations (ICLR)
Huang, S., Wang, Q., Zhang, S., Yan, S., & He, X. (2019). Dynamic context correspondence network for semantic alignment. IEEE International Conference on Computer Vision (ICCV) pp 2010–2019
Hur, J., Lim, H., Park, C., & Chul Ahn, S. (2015). Generalized deformable spatial pyramid: Geometry-preserving dense correspondence estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1392–1400
Jabri, A., Owens, A., & Efros, A.A. (2020). Space-time correspondence as a contrastive random walk. Neural Information Processing Systems (NeurIPS)
Jeon, S., Kim, S., Min, D., & Sohn, K. (2018). Parn: Pyramidal affine regression networks for dense semantic correspondence. In: European Conference on Computer Vision (ECCV)
Kanazawa, A., Jacobs, D. W., & Chandraker, M. (2016). Warpnet: Weakly supervised matching for single-view reconstruction. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 3253–3261
Kang, G., Wei, Y., Yang, Y., Zhuang, Y., & Hauptmann, A. G. (2020). Pixel-level cycle association: A new perspective for domain adaptive semantic segmentation. In: Neural Information Processing Systems (NeurIPS)
Kim, J., Liu, C., Sha, F., & Grauman, K. (2013). Deformable spatial pyramid matching for fast dense correspondences. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 2307–2314
Kim, S., Min, D., Lin, S., & Sohn, K. (2017). Dctm: Discrete-continuous transformation matching for semantic flow. In: IEEE International Conference on Computer Vision (ICCV), pp 4529–4538
Kim, S., Lin, S., Jeon, S. R., Min, D., & Sohn, K. (2018). Recurrent transformer networks for semantic correspondence.Neural Information Processing Systems (NeurIPS),31, 6126–6136.
Google Scholar
Kim, S., Min, D., Ham, B., Jeon, S., Lin, S., & Sohn, K. (2019). Fcss: Fully convolutional self-similarity for dense semantic correspondence.IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI),41, 581–595.
Article Google Scholar
Lee, J., Kim, D., Ponce, J., & Ham, B. (2019). Sfnet: Learning object-aware semantic correspondence. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2278–2287
Li, X., Liu, S., Mello, S. D., Wang, X., Kautz, J., & Yang, M. H. (2019). Joint-task self-supervised learning for temporal correspondence. Neural Information Processing Systems (NeurIPS)
Liu, C., Yuen, J., & Torralba, A. (2011). Sift flow: Dense correspondence across scenes and its applications.IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI),33, 978–994.
Article Google Scholar
Liu, P., King, I., Lyu, M. R., & Xu, J. (2019). Ddflow: Learning optical flow with unlabeled data distillation.Association for the Advancement of Artificial Intelligence (AAAI),33, 8770–8777.
Google Scholar
Liu, Y., Zhu, L., Yamada, M., & Yang, Y. (2020). Semantic correspondence as an optimal transport problem. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4463–4472
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision (IJCV)
Meister, S., Hur, J., & Roth, S. (2018). Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In: Association for the Advancement of Artificial Intelligence (AAAI)
Min, J., Lee, J., Ponce, J., & Cho, M. (2019a). Hyperpixel flow: Semantic correspondence with multi-layer neural features. IEEE International Conference on Computer Vision (ICCV) pp 3394–3403
Min, J., Lee, J., Ponce, J., & Cho, M. (2019b). Spair-71k: A large-scale benchmark for semantic correspondence.arXiv:1908.10543
Min, J., Lee, J., Ponce, J., & Cho, M. (2020). Learning to compose hypercolumns for visual correspondence. In: European Conference on Computer Vision (ECCV)
Misra, I., Zitnick, C. L., & Hebert, M. (2016). Shuffle and learn: Unsupervised learning using temporal order verification. In: European Conference on Computer Vision (ECCV)
Munkres, J. (1957). Algorithms for the assignment and transportation problems.Journal of The Society for Industrial and Applied Mathematics,10, 196–210.
MathSciNet MATH Google Scholar
Noroozi, M., & Favaro, P. (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision (ECCV)
Novotny, D., Larlus, D., & Vedaldi, A. (2017). Anchornet: A weakly supervised network to learn geometry-sensitive features for semantic matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5277–5286
Oord Avd, Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., & Kavukcuoglu, K. (2016). Conditional image generation with pixelcnn decoders. Neural Information Processing Systems (NeurIPS)
Oord Avd, Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding.arXiv:1807.03748
Pathak, D., Girshick, R. B., Dollár, P., Darrell, T., & Hariharan, B. (2017). Learning features by watching objects move. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 6024–6033
Pinheiro, P. O., Almahairi, A., Benmaleck, R. Y., Golemo, F., & Courville, A. (2020). Unsupervised learning of dense visual representations. Neural Information Processing Systems (NeurIPS)
Rocco, I., Arandjelovic, R., & Sivic, J. (2017). Convolutional neural network architecture for geometric matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6148–6157
Rocco, I., Arandjelović, R., & Sivic, J. (2018a). End-to-end weakly-supervised semantic alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6917–6925
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., & Sivic, J. (2018b). Neighbourhood consensus networks. Neural Information Processing Systems (NeurIPS)
Seo, P. H., Lee, J., Jung, D., Han, B., & Cho, M. (2018). Attentive semantic alignment with offset-aware correlation kernels. In: European Conference on Computer Vision (ECCV), pp 349–364
Sinkhorn, R. (1967). Diagonal equivalence to matrices with prescribed row and column sums.American Mathematical Monthly,74, 402.
Article MathSciNet Google Scholar
Taniai, T., Sinha, S. N., & Sato, Y. (2016). Joint recovery of dense correspondence and cosegmentation in two images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 4246–4255
Tola, E., Lepetit, V., & Fua, P. (2010). Daisy: An efficient dense descriptor applied to wide-baseline stereo.IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI),32, 815–830.
Article Google Scholar
Van Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel recurrent neural networks. International Conference on Machine Learning (ICML)
Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008). Extracting and composing robust features with denoising autoencoders. In: International Conference on Machine Learning (ICML)
Vondrick, C., Shrivastava, A., Fathi, A., Guadarrama, S., & Murphy, K. (2018). Tracking emerges by colorizing videos. In: ECCV
Wang, X., & Gupta, A. (2015). Unsupervised learning of visual representations using videos. In: IEEE International Conference on Computer Vision (ICCV), pp 2794–2802
Wang, X., Jabri, A., & Efros, A. A. (2019). Learning correspondence from the cycle-consistency of time. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 2561–2571
Wang, X., Zhang, R., Shen, C., Kong, T., & Li, L. (2020). Dense contrastive learning for self-supervised visual pre-training. ArXiv
Xiao, T., Hong, J., & Ma, J. (2018a). Dna-gan: Learning disentangled representations from multi-attribute images. International Conference on Learning Representations Workshop (ICLRW)
Xiao, T., Hong, J., & Ma, J. (2018b). Elegant: Exchanging latent encodings with gan for transferring multiple face attributes. In: European Conference on Computer Vision (ECCV), pp 172–187
Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., & Hu, H. (2021). Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 16684–16693
Yang, H., Lin, W.Y., & Lu, J. (2014). Daisy filter flow: A generalized discrete approach to dense correspondences. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 3406–3413
Zhang R, Isola P, & Efros, A. A. (2016). Colorful image colorization. In: European Conference on Computer Vision (ECCV)
Zhou, S., Xiao, T., Yang, Y., Feng, D., He, Q., & He, W. (2017). Genegan: Learning object transfiguration and attribute subspace from unpaired data. In: British Machine Vision Conference (BMVC)
Zhou, T., Lee, Y. J., Yu, S. X., & Efros, A. A. (2015a). Flowweb: Joint image set alignment by weaving consistent, pixel-wise correspondences. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1191–1200
Zhou, T., Krähenbühl, P., Aubry, M., Huang, Q., & Efros, A. A. (2016). Learning dense correspondence via 3d-guided cycle consistency. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 117–126
Zhou, X., Zhu, M., & Daniilidis, K. (2015b). Multi-image matching via fast alternating minimization. In: IEEE International Conference on Computer Vision (ICCV), pp 4032–4040
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision (ICCV), pp 2223–2232

Download references

Acknowledgements

T. Xiao and M.-H. Yang are supported in part by NSF CAREER grant 1149783.

Author information

Authors and Affiliations

University of California, Merced, CA, USA
Taihong Xiao & Ming-Hsuan Yang
Nvidia, Santa Clara, CA, USA
Sifei Liu, Shalini De Mello, Zhiding Yu & Jan Kautz
Yonsei University, Seoul, Korea
Ming-Hsuan Yang

Authors

Taihong Xiao
View author publications
You can also search for this author inPubMed Google Scholar
Sifei Liu
View author publications
You can also search for this author inPubMed Google Scholar
Shalini De Mello
View author publications
You can also search for this author inPubMed Google Scholar
Zhiding Yu
View author publications
You can also search for this author inPubMed Google Scholar
Jan Kautz
View author publications
You can also search for this author inPubMed Google Scholar
Ming-Hsuan Yang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toMing-Hsuan Yang.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, T., Liu, S., De Mello, S.et al. Learning Contrastive Representation for Semantic Correspondence.Int J Comput Vis130, 1293–1309 (2022). https://doi.org/10.1007/s11263-022-01602-y

Download citation

Received:22 September 2021
Accepted:21 February 2022
Published:24 March 2022
Issue Date:May 2022
DOI:https://doi.org/10.1007/s11263-022-01602-y

Movatterモバイル変換

Learning Contrastive Representation for Semantic Correspondence

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Grownbb: Gromov–Wasserstein learning of neural best buddies for cross-domain correspondence

Semi-supervised Semantic Matching

Semantic-Aware Fine-Grained Correspondence

Explore related subjects

Change history

13 April 2022

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Access this article

Subscribe and save

Buy Now