- Xavier Boix1,3,
- Josep M. Gonfaus1,2,
- Joost van de Weijer1,2,
- Andrew D. Bagdanov1,
- Joan Serrat1,2 &
- …
- Jordi Gonzàlez1,2
848Accesses
79Citations
Abstract
The Hierarchical Conditional Random Field (HCRF) model have been successfully applied to a number of image labeling problems, including image segmentation. However, existing HCRF models of image segmentation do not allow multiple classes to be assigned to a single region, which limits their ability to incorporate contextual information across multiple scales. At higher scales in the image, this representation yields an oversimplified model since multiple classes can be reasonably expected to appear within large regions. This simplified model particularly limits the impact of information at higher scales. Since class-label information at these scales is usually more reliable than at lower, noisier scales, neglecting this information is undesirable. To address these issues, we propose a new consistency potential for image labeling problems, which we call theharmony potential. It can encode any possible combination of labels, penalizing only unlikely combinations of classes. We also propose an effective sampling strategy over this expanded label set that renders tractable the underlying optimization problem. Our approach obtains state-of-the-art results on two challenging, standard benchmark datasets for semantic image segmentation: PASCAL VOC 2010, and MSRC-21.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adelson, E. H. (2001). On seeing stuff: the perception of materials by humans and machines. InProceedings of the SPIE: human vision and electronic imaging VI.
Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision.IEEE Transactions on Pattern Analysis and Machine Intelligence,26(9), 1124–1137.
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts.IEEE Transactions on Pattern Analysis and Machine Intelligence,23(11), 1222–1239.
Carreira, J., & Sminchisescu, C. (2010). Constrained parametric min-cuts for automatic object segmentation. InProc. computer vision and pattern recognition.
Comaniciu, D., & Meer, P. (2002). Mean shift: a robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence,24(5), 603–619.
Coughlan, J. M., & Ferreira, S. J. (2002). Finding deformable shapes using loopy belief propagation. InProc. European conf. on computer vision.
Csurka, G., & Perronnin, F. (2010). An efficient approach to semantic segmentation.International Journal of Computer Vision doi:10.1007/s11263-010-0344-8.
Delong, A., Osokin, A., Isack, H. N., & Boykov, Y. (2010). Fast approximate energy minimization with label costs. InProc. computer vision and pattern recognition.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge.International Journal of Computer Vision,88(2), 303–338.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation.International Journal of Computer Vision,59(2), 167–181.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models.IEEE Transactions on Pattern Analysis and Machine Intelligence,32(9), 1627–1645.
Freeman, W. T., Pasztor, E. C., & Carmichael, O. T. (2000). Learning low-level vision.International Journal of Computer Vision,40(1), 25–47.
Frey, B., & MacKay, D. (1998). A revolution: belief propagation in graphs with cycles. InAdvances in neural information processing systems.
Fulkerson, B., Vedaldi, A., & Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods. InProc. IEEE int. conf. on computer vision.
Galleguillos, C., & Belongie, S. (2010). Context based object categorization: a critical survey.Computer Vision and Image Understanding,114, 712–722.
Gonfaus, J., Boix, X., van de Weijer, J., Bagdanov, A., Serrat, J., & Gonzàlez, J. (2010). Harmony potentials for joint classification and segmentation. InProc. computer vision and pattern recognition.
Gould, S., Gao, T., & Koller, D. (2009). Region-based segmentation and object detection. InAdvances in neural information processing systems.
Hammersley, J. M., & Clifford, P. (1971). Markov fields on finite graphs and lattices. Unpublished.
Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image.International Journal of Computer Vision,75(1), 151–172.
Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective.International Journal of Computer Vision80(1), 3–15.
Ihler, A., & McAllester, D. (2009). Particle belief propagation. InProc. int. conf. on artificial intelligence and statistics.
Ishikawa, H. (2009). Higher-order clique reduction in binary graph cut. InProc. computer vision and pattern recognition.
Jain, A., Gupta, A., & Davis, L. (2010). Learning what and how of contextual models for scene labeling. InProc. European conf. on computer vision.
Jiang, J., & Tu, Z. (2009). Efficient scale space auto-context for image segmentation and labeling. InProc. computer vision and pattern recognition.
Kohli, P., & Kumar, M. P. (2010). Energy minimization for linear envelope MRFs. InProc. computer vision and pattern recognition.
Kohli, P., Kumar, M. P., & Torr, P. H. (2009a). P3 and beyond: move making algorithms for solving higher order functions.IEEE Transactions on Pattern Analysis and Machine Intelligence,31(9), 1645–1656.
Kohli, P., Ladický, L., & Torr, P. H. (2009b). Robust higher order potentials for enforcing label consistency.International Journal of Computer Vision,82(3), 302–324.
Koller, D., Lerner, U., & Angelov, D. (1999). A general algorithm for approximate inference and its application to hybrid Bayes nets. InProc. annual conference on uncertainty in artificial intelligence.
Kumar, M. P., Torr, P., & Zisserman, A. (2005). Obj cut. InProc. computer vision and pattern recognition.
Kumar, S., & Hebert, M. (2005). A hierarchical field framework for unified context-based classification. InProc. IEEE int. conf. on computer vision.
Ladicky, L., Russell, C., Kohli, P., & Torr, P. (2009). Associative hierarchical crfs for object class image segmentation. InProc. IEEE int. conf. on computer vision.
Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2010a). Graph cut based inference with co-occurrence statistics. InProc. European conf. on computer vision.
Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. H. S. (2010b). What, where & how many? combining object detectors and crfs. InProc. European conf. on computer vision.
Lauritzen, S. L. (1996).Graphical models.Oxford statistical science series. London: Oxford University Press.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. InProc. computer vision and pattern recognition.
Lee, Y., & Grauman, K. (2010). Object-graphs for context-aware category discovery. InProc. computer vision and pattern recognition.
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation.International Journal of Computer Vision,77(1–3), 259–289.
Lempitsky, V., Kohli, P., Rother, C., & Sharp, T. (2009). Image segmentation with a bounding box prior. InProc. IEEE int. conf. on computer vision.
Levin, A., & Weiss, Y. (2009). Learning to combine bottom-up and top-down segmentation.International Journal of Computer Vision,81(1), 1645–1656.
Li, F., Carreira, J., & Sminchisescu, C. (2010). Object recognition as ranking holistic figure-ground hypotheses. InProc. computer vision and pattern recognition.
Li, Y., & Huttenlocher, D. P. (2008). Sparse long-range random field and its application to image denoising. InProc. European conf. on computer vision.
Lim, J. J., Arbelaez, P., Gu, C., & Malik, J. (2009). Context by region ancestry. InProc. IEEE int. conf. on computer vision.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.International Journal of Computer Vision,60(2), 91–110.
Maji, S., Berg, A. C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. InProc. computer vision and pattern recognition.
Marr, D. (1982).Vision: a computational investigation into the human representation and processing of visual information. San Francisco: Freeman.
Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. InProc. IEEE int. conf. on computer vision.
Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues.IEEE Transactions on Pattern Analysis and Machine Intelligence,26(5), 530–549.
Mori, G., Ren, X., Efros, A. A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. InProc. computer vision and pattern recognition.
Munoz, D., Bagnell, J. A., Vandapel, N., & Hebert, M. (2009). Contextual classification with functional max-margin Markov networks. InProc. computer vision and pattern recognition.
Munoz, D., Bagnell, J. A., & Hebert, M. (2010). Stacked hierarchical labeling. InProc. European conf. on computer vision.
Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. InProc. European conf. on computer vision.
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns.IEEE Transactions on Pattern Analysis and Machine Intelligence,24(7), 971–987.
Oliva, A., & Torralba, A. (2007). The role of context in object recognition.Trends in Cognitive Sciences,11(12), 520–527.
Pantofaru, C., Schmid, C., & Hebert, M. (2008). Object recognition by integrating multiple image segmentations. InProc. European conf. on computer vision.
Plath, N., Toussaint, M., & Nakajima, S. (2009). Multi-class image segmentation using conditional random fields and global classification. InProc. international conference on machine learning.
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. InAdvances in large margin classifiers.
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. InProc. IEEE int. conf. on computer vision.
Ramalingam, S., Kohli, P., Alahari, K., & Torr, P. H. S. (2008). Exact inference in multi-label crfs with higher order cliques. InProc. computer vision and pattern recognition.
Roth, S., & Black, M. J. (2009). Fields of experts.International Journal of Computer Vision,82(2), 205–229.
Rother, C., Kohli, P., Feng, W., & Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. InProc. computer vision and pattern recognition.
Russell, C., Ladicky, L., Kohli, P., & Torr, P. H. (2010). Exact and approximate inference in associative hierarchical random fields using graph-cuts. InProc. annual conference on uncertainty in artificial intelligence.
van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence,32(10), 1582–1596.
Schmid, C., & Mohr, R. (1997). Local greyvalue invariants for image retrieval.IEEE Transactions on Pattern Analysis and Machine Intelligence,19(5), 530–535.
Shahbaz Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. InProc. IEEE int. conf. on computer vision.
Shechtman, E., & Irani, M. (2007). Matching local self-similarities across images and videos. InProc. computer vision and pattern recognition.
Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. InProc. computer vision and pattern recognition.
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context.International Journal of Computer Vision,81(1), 2–23.
Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. InProc. IEEE int. conf. on computer vision.
Sudderth, E. B., Ihler, A. T., Ihler, E. T., Freeman, W. T., & Willsky, A. S. (2002). Nonparametric belief propagation. InProc. computer vision and pattern recognition.
Tu, Z., & Zhu, S. C. (2002). Image segmentation by data-driven Markov chain Monte Carlo.IEEE Transactions on Pattern Analysis and Machine Intelligence,24(5), 657–673.
Tu, Z., Chen, X., Yuille, AL, & Zhu, S. C. (2005). Image parsing: unifying segmentation, detection, and recognition.International Journal of Computer Vision,63(2), 18–25.
Vazquez, E., Baldrich, R., van de Weijer, J., & Vanrell, M. (2011). Describing reflectances for colour segmentation robust to shadows, highlights and textures.IEEE Transactions on Pattern Analysis and Machine Intelligence,33(5), 917–930.
Vedaldi, A., & Soatto, S. (2008). Quick shift and kernel methods for mode seeking. InProc. European conf. on computer vision.
Verbeek, J., & Triggs, B. (2008). Scene segmentation with crfs learned from partially labeled images. InAdvances in neural information processing systems.
Wainwright, M. J., & Jordan, M. I. (2008).Graphical models, exponential families, and variational inference. Hanover: Now Publishers Inc.
van de Weijer, J., Schmid, C., Verbeek, J., & Larlus, D. (2009). Learning color names for real-world applications.IEEE Transactions on Image Processing,18(7), 1512–1523.
Winn, J., & Jojic, N. (2005). Locus: learning object classes with unsupervised segmentation. InProc. IEEE int. conf. on computer vision.
Woodford, O., Torr, P. H., Reid, I., & Fitzgibbon, A. (2009). Global stereo reconstruction under second-order smoothness priors.IEEE Transactions on Pattern Analysis and Machine Intelligence,31(12), 2115–2128.
Yang, J., Yuz, K., Gongz, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. InProc. computer vision and pattern recognition.
Yang, L., Meer, P., & Foran, D. J. (2007). Multiple class segmentation using a unified framework over mean-shift patches. InProc. computer vision and pattern recognition.
Yang, Y., Hallman, S., Ramanan, D., & Fowlkes, C. (2010). Layered object detection for multi-class segmentation. InProc. computer vision and pattern recognition.
Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study.International Journal of Computer Vision,73(2), 213–238.
Zhu, L., Chen, Y., Lin, Y., Lin, C., & Yuille, A. L. (2008). Recursive segmentation and recognition templates for 2D parsing. InAdvances in neural information processing systems.
Author information
Authors and Affiliations
Centre de Visió per Computador, Barcelona, Spain
Xavier Boix, Josep M. Gonfaus, Joost van de Weijer, Andrew D. Bagdanov, Joan Serrat & Jordi Gonzàlez
Department of Computer Science, Universitat Autònoma de Barcelona, Barcelona, Spain
Josep M. Gonfaus, Joost van de Weijer, Joan Serrat & Jordi Gonzàlez
Computer Vision Laboratory, ETH Zurich, Zurich, Switzerland
Xavier Boix
- Xavier Boix
You can also search for this author inPubMed Google Scholar
- Josep M. Gonfaus
You can also search for this author inPubMed Google Scholar
- Joost van de Weijer
You can also search for this author inPubMed Google Scholar
- Andrew D. Bagdanov
You can also search for this author inPubMed Google Scholar
- Joan Serrat
You can also search for this author inPubMed Google Scholar
- Jordi Gonzàlez
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toXavier Boix.
Additional information
Both authors contributed equally to this work.
Rights and permissions
About this article
Cite this article
Boix, X., Gonfaus, J.M., van de Weijer, J.et al. Harmony Potentials.Int J Comput Vis96, 83–102 (2012). https://doi.org/10.1007/s11263-011-0449-8
Received:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative