Part of the book series:Lecture Notes in Computer Science ((LNIP,volume 9003))
Included in the following conference series:
2167Accesses
Abstract
Image classification is an important topic in computer vision. As a key procedure, encoding the local features to get a compact representation for image affects the final classification accuracy largely. There is no doubt that encoding procedure leads to information loss, due to the existence of quantization error. The residual vector, defined as the difference between the local image feature and its corresponding visual word, is the chief culprit that should be responsible for the quantization error. Many previous algorithms consider it as a coding issue, and focus on reducing the quantization error by reconstructing the feature with more than one visual words, or by the so-called soft-assignment strategy. In this paper, we consider the problem from a different view, and propose an effective and efficient model, which is called Multiple Stage Residual Model (MSRM), to make full use of the residual vector to generate a multiple stage code. Our proposed model is a generic framework, which can be built upon many coding algorithms and improves the image classification performance of the coding algorithms significantly. The experimental results on the image classification benchmarks, such as UIUC 8-Sport, Scene-15, Caltech-101 image dataset, confirm the validity of MSRM.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jégou, H., Zisserman, A., et al.: Triangulation embedding and democratic aggregation for image search. In: CVPR (2014)
Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: coupled multi-index for accurate image retrieval. In: CVPR (2014)
Kosala, R., Blockeel, H.: Web mining research: a survey. ACM Sigkdd Explor. Newslett.2, 1–15 (2000)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV (2004)
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV60(2), 91–110 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory28(2), 129–137 (1982)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Zhang, T., Ghanem, B., Liu, S., Xu, C., Ahuja, N.: Low-rank sparse coding for image classification. In: ICCV (2013)
Shabou, A., Le Borgne, H.: Locality-constrained and spatially regularized coding for scene categorization. In: CVPR (2012)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)
Huang, Y., Huang, K., Yu, Y., Tan, T.: Salient coding for image classification. In: CVPR (2011)
van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Shaban, A., Rabiee, H.R., Farajtabar, M., Ghazvininejad, M.: From local similarity to global coding: an application to image classification. In: CVPR (2013)
Yu, K., Zhang, T., Gong, Y.: Nonlinear learning using local coordinate coding. Adv. Neural Inf. Process. Syst.22, 2223–2231 (2009)
Shen, W., Deng, K., Bai, X., Leyvand, T., Guo, B., Tu, Z.: Exemplar-based human action pose correction. IEEE Trans. Cybern.44, 1053–1066 (2014)
Zheng, L., Wang, S., Tian, Q.: Coupled binary embedding for large-scale image retrieval. IEEE Trans. Image Process.23, 3368–3380 (2014)
Shen, W., Deng, K., Bai, X., Leyvand, T., Guo, B., Tu, Z.: Exemplar-based human action pose correction and tagging. In: CVPR, pp. 1784–1791 (2012)
Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: ICML (2010)
Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR (2010)
Koniusz, P., Yan, F., Mikolajczyk, K.: Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. CVIU117(5), 479–492 (2013)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)
Huang, Y., Wu, Z., Wang, L., Tan, T.: Feature coding in image classification: a comprehensive study. PAMI35(8), 1798–1828 (2013)
Arandjelovic, R., Zisserman, A.: All about VLAD. In: CVPR (2013)
McCann, S., Lowe, D.G.: Spatially local coding for object recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 204–217. Springer, Heidelberg (2013)
Wang, X., Bai, X., Liu, W., Latecki, L.J.: Feature context for image classification and object detection. In: CVPR, IEEE, pp. 961–968 (2011)
Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z.: Max-margin multiple-instance dictionary learning. In: ICML (2013)
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A., et al.: Supervised dictionary learning. In: NIPS (2008)
Yang, J., Yu, K., Huang, T.: Supervised translation-invariant sparse coding. In: CVPR (2010)
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst.106(1), 59–70 (2007)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008).http://www.vlfeat.org/
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (2011). Software available athttp://www.csie.ntu.edu.tw/~cjlin/libsvm
Acknowledgement
This work was primarily supported by National Natural Science Foundation of China (NSFC) (No. 61222308), and in part by NSFC (No. 61173120), Program for New Century Excellent Talents in University (No. NCET-12-0217), Fundamental Research Funds for the Central Universities (No. HUST 2013TS115). X.Wang was supported by Microsoft Research Asia Fellowship 2012.
Author information
Authors and Affiliations
Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan, People’s Republic of China
Song Bai, Xinggang Wang, Cong Yao & Xiang Bai
- Song Bai
You can also search for this author inPubMed Google Scholar
- Xinggang Wang
You can also search for this author inPubMed Google Scholar
- Cong Yao
You can also search for this author inPubMed Google Scholar
- Xiang Bai
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toXiang Bai.
Editor information
Editors and Affiliations
Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bai, S., Wang, X., Yao, C., Bai, X. (2015). Multiple Stage Residual Model for Accurate Image Classification. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_28
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-319-16864-7
Online ISBN:978-3-319-16865-4
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative