Movatterモバイル変換

1814Accesses
61Citations
6Altmetric
Explore all metrics

Abstract

We consider the problem of semantic segmentation, i.e. assigning each pixel in an image to a set of pre-defined semantic object categories. State-of-the-art semantic segmentation algorithms typically consist of three components: a local appearance model, a local consistency model and a global consistency model. These three components are generally integrated into a unified probabilistic framework. While it enables at training time a joint estimation of the model parameters and while it ensures at test time a globally consistent labeling of the pixels, it also comes at a high computational cost.

We propose a simple approach to semantic segmentation where the three components are decoupled (this journal submission is an extended version of the following conference paper: G. Csurka and F. Perronnin, “A simple high performance approach to semantic segmentation”, BMVC, 2008). For the local appearance model, we make use of the Fisher kernel. While this framework was shown to lead to high accuracy for image classification, to our best knowledge this is its first application to the segmentation problem. The semantic segmentation process is then guided by a low-level segmentation which enforces local consistency. Finally, to enforce image-level consistency we use global image classifiers: if an image as a whole is unlikely to contain an object class, then the corresponding class is not considered in the segmentation pipeline.

The decoupling of the components makes our system very efficient both at training and test time. An efficient training enables to estimate the model parameters on large quantities of data. Especially, we explain how our system can leverage weakly labeled data, i.e. images for which we do not have pixel-level labels but either object bounding boxes or even only image-level labels.

We believe that an important contribution of this paper is to show that even a simple decoupled system can provide state-of-the-art performance on the PASCAL VOC 2007, PASCAL VOC 2008 and MSRC 21 datasets.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Borenstein, E., & Malik, J. (2006). Shape-guided object segmentation. InCVPR, 2006
Cao, L., & Fei-Fei, L. (2007). Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. InICCV, 2007.
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. InECCV workshop on statistical learning for computer vision, 2004.
Edge Detection and Image Segmentation (EDISON) System (2003).http://www.caip.rutgers.edu/riul/research/code/EDISON/index.html.
Everingham, M., Gool, L. V., Williams, C., Winn, J., & Zisserman, A. (2007a). The pascal visual object classes challenge 2007 (voc2007): Part 1—challenge & classification task.http://www.pascal-network.org/challenges/VOC/voc2007/workshop/everingham_cls.pdf.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007b). The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results.http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html.
Farquhar, J., Szedmak, S., Meng, H., & Shawe-Taylor, J. (2005).Improving “bag-of-keypoints” image categorisation (Tech. rep.). University of Southampton.
He, X., Zemel, R., & Nán, M. C. P. (2004). Multiscale conditional random fields for image labeling. InCVPR, 2004.
Jaakkola, T., & Haussler, D. (1999). Exploiting generative models in discriminative classifiers. InNIPS, 1999.
Krishnapuram, B., & Hartemink, A. J. (2005). Sparse multinomial logistic regression: fast algorithms and generalization bounds.IEEE Transactions on Patern Analysis and Machine Intelligence,27(6), 957–968.
Article Google Scholar
Kumar, M. P., Torr, P., & Zisserman, A. (2005). Obj cut. InCVPR, 2005.
Kumar, S., & Hebert, M. (2005). A hierarchical field framework for unified context-based classification. InICCV, 2005.
Larlus, D., & Jurie, F. (2007). Category level object segmentation—learning to segment objects with latent aspect models. InVISAPP (Vol. 2).
Larlus, D., Verbeek, J., & Jurie, F. (2009). Category level object segmentation by combining bag-of-words models with Dirichlet processes and random fields. InIJCV, 2009.
Lazebnik, S. (2009). An empirical Bayes approach to contextual region classification. InCVPR, 2009.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. InECCV workshop on statistical learning for computer vision, 2004.
Li, L. J., Socher, R., & Fei-Fei, L. (2009). Towards total scene understanding: classification, annotation and segmentation in an automatic framework. InCVPR, 2009.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints.International Journal of Compute Vision,60(2), 91–110.
Article Google Scholar
Maji, S., Berg, A., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. InCVPR, 2008.
Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. InCVPR, 2007.
Perronnin, F., Dance, C., Csurka, G., & Bressan, M. (2006). Adapted vocabularies for generic visual categorization. InECCV, 2006.
Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: interactive foreground extraction using iterated graph cuts. InSIGRAPH, 2004.
Sheikh, Y. A., Khan, E. A., & Kanade, T. (2007). Mode-seeking via medoidshifts. InICCV, 2007.
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. InECCV, 2006.
Sivic, J., & Zisserman, A. (2003). Video google: a text retrieval approach to object matching in videos. InICCV (Vol. 2, pp. 1470–1477).
Vedaldi, A., & Soatto, S. (2008). Quick shift and kernel methods for mode seeking. InECCV, 2008.
Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. InICCV, 2009.
Verbeek, J., & Triggs, B. (2007a). Region classification with Markov field aspects models. InCVPR, 2007.
Verbeek, J., & Triggs, B. (2007b). Scene segmentation with crfs learned from partially labeled images. InNIPS, 2007.
Wang, G., Hoiem, G., & Forsyth, D. (2009). Learning image similarity from Flickr groups using stochastic intersection kernel machines. InICCV, 2009.
Winn, J., & Jojic, N. (2005). Locus: Learning object classes with unsupervised segmentation. InICCV, 2005.
Yang, L., Meer, P., & Foran, D. (2007). Multiple class segmentation using a unified framework over mean-shift patches. InCVPR, 2007.

Download references

Author information

Authors and Affiliations

Xerox Research Centre Europe, 6, chemin de Maupertuis, 38240, Meylan, France
Gabriela Csurka & Florent Perronnin

Authors

Gabriela Csurka
View author publications
You can also search for this author inPubMed Google Scholar
Florent Perronnin
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toFlorent Perronnin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Csurka, G., Perronnin, F. An Efficient Approach to Semantic Segmentation.Int J Comput Vis95, 198–212 (2011). https://doi.org/10.1007/s11263-010-0344-8

Download citation

Received:28 August 2009
Accepted:13 April 2010
Published:30 April 2010
Issue Date:November 2011
DOI:https://doi.org/10.1007/s11263-010-0344-8

Movatterモバイル変換

An Efficient Approach to Semantic Segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Employing Multi-estimations for Weakly-Supervised Semantic Segmentation

Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation

Weakly supervised semantic segmentation with segments and neighborhood classifiers

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Access this article

Subscribe and save

Buy Now