- Quan Cui12,13,
- Bingchen Zhao12,14,
- Zhao-Min Chen12,15,
- Borui Zhao12,
- Renjie Song12,
- Boyan Zhou16,
- Jiajun Liang12 &
- …
- Osamu Yoshie13
Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 13686))
Included in the following conference series:
3479Accesses
Abstract
This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task,i.e., image classification. By a comprehensive temporal analysis, we observe a trade-off between these two properties. The discriminability keeps increasing with the training progressing while the transferability intensely diminishes in the later training period. From the perspective of information-bottleneck theory, we reveal that the incompatibility between discriminability and transferability is attributed to the over-compression of input information. More importantly, we investigate why and how the InfoNCE loss can alleviate the over-compression, and further present a learning framework, named contrastive temporal coding (CTC), to counteract the over-compression and alleviate the incompatibility. Extensive experiments validate that CTC successfully mitigates the incompatibility, yielding discriminative and transferable representations. Noticeable improvements are achieved on the image classification task and challenging transfer learning tasks. We hope that this work will raise the significance of the transferability property in the conventional supervised learning setting.
Q. Cui and B. Zhao—Equal contributions.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 12583
- Price includes VAT (Japan)
- Softcover Book
- JPY 15729
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Representations refer to the outputs of the backbone, which are processed with a global average pooling in popular models [17].
- 2.
We use the Mutual Information Neural Estimation (MINE) [2] method to calculate the mutual information between continuous variables.
- 3.
Proofs are attached in the appendix A.1.
References
Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck. In: ICLR (2017)
Belghazi, M.I., et al.: MINE: mutual information neural estimation.arXiv:1801.04062 (2018)
Chen, C., Zheng, Z., Ding, X., Huang, Y., Dou, Q.: Harmonizing transferability and discriminability for adapting object detectors. In: CVPR (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. In: NeurIPS (2020)
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning.arXiv:2003.04297 (2020)
Chen, X., Wang, S., Long, M., Wang, J.: Transferability vs. discriminability: batch spectral penalization for adversarial domain adaptation. In: ICML (2019)
Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS (2011)
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation strategies from data. In: CVPR (2019)
Darlow, L.N., Crowley, E.J., Antoniou, A., Storkey, A.J.: CINIC-10 is not ImageNet or CIFAR-10.arXiv:1810.03505 (2018)
Feng, Y., Jiang, J., Tang, M., Jin, R., Gao, Y.: Rethinking supervised pre-training for better downstream transferring. In: ICLR (2022)
Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR (2017)
Furlanello, T., Lipton, Z., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks. In: ICML (2018)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
He, K., Girshick, R., Dollár, P.: Rethinking ImageNet pre-training. In: ICCV (2019)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network.arXiv:1503.02531 (2015)
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: ICLR (2019)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Kang, B., et al.: Decoupling representation and classifier for long-tailed recognition. In: ICLR (2020)
Khosla, P., et al.: Supervised contrastive learning. arXiv preprintarXiv:2004.11362 (2020)
Kornblith, S., Chen, T., Lee, H., Norouzi, M.: Why do better loss functions lead to less transferable features? In: NeurIPS (2021)
Kornblith, S., Shlens, J., Le, Q.V.: Do better ImageNet models transfer better? In: CVPR (2019)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: ECCV (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: ICML (2015)
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft.arXiv:1306.5151 (2013)
Mao, H., Chen, X., Fu, Q., Du, L., Han, S., Zhang, D.: Neuron campaign for initialization guided by information bottleneck theory. In: CIKM (2021)
Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding.arXiv:1807.03748 (2018)
Park, T., Efros, A.A., Zhang, R., Zhu, J.Y.: Contrastive learning for unpaired image-to-image translation. In: ECCV (2020)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS (2015)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. In: IJCV (2015)
Sariyildiz, M.B., Kalantidis, Y., Larlus, D., Alahari, K.: Concept generalization in visual representation learning. In: ICCV (2021)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)
Shao, J., Wen, X., Zhao, B., Xue, X.: Temporal context aggregation for video retrieval with contrastive learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2021)
Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information.arXiv:1703.00810 (2017)
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding.arXiv:1906.05849 (2019)
Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? In: NeurIPS (2020)
Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: ITW (2015)
Tripuraneni, N., Jordan, M., Jin, C.: On the theory of transfer learning: The importance of task diversity. In: NeurIPS (2020)
Van Horn, G., et al.: The INaturalist species classification and detection dataset. In: CVPR (2018)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training.arXiv:2011.09157 (2020)
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: ECCV (2016)
Wu, M., Zhuang, C., Mosse, M., Yamins, D., Goodman, N.: On mutual information in contrastive learning for visual representations.arXiv:2005.13149 (2020)
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)
Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: CVPR (2017)
Yalniz, I.Z., Jégou, H., Chen, K., Paluri, M., Mahajan, D.: Billion-scale semi-supervised learning for image classification.arXiv:1905.00546 (2019)
You, K., Liu, Y., Wang, J., Long, M.: LogME: practical assessment of pre-trained models for transfer learning. In: ICML (2021)
Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: CVPR (2022)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: ICCV (2017)
Zhou, B., Cui, Q., Wei, X.S., Chen, Z.M.: BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: CVPR (2020)
Zhu, R., Zhao, B., Liu, J., Sun, Z., Chen, C.W.: Improving contrastive learning by visualizing feature transformation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Zoph, B., et al.: Rethinking pre-training and self-training. In: NeurIPS (2020)
Acknowledgement
This work was supported in part by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ22F020006. We thank anonymous reviewers from ECCV 2022 for insightful comments.
Author information
Authors and Affiliations
MEGVII Technology, Beijing, China
Quan Cui, Bingchen Zhao, Zhao-Min Chen, Borui Zhao, Renjie Song & Jiajun Liang
Waseda University, Tokyo, Japan
Quan Cui & Osamu Yoshie
University of Edinburgh, Edinburgh, UK
Bingchen Zhao
Wenzhou University, Wenzhou, China
Zhao-Min Chen
ByteDance, Beijing, China
Boyan Zhou
- Quan Cui
You can also search for this author inPubMed Google Scholar
- Bingchen Zhao
You can also search for this author inPubMed Google Scholar
- Zhao-Min Chen
You can also search for this author inPubMed Google Scholar
- Borui Zhao
You can also search for this author inPubMed Google Scholar
- Renjie Song
You can also search for this author inPubMed Google Scholar
- Boyan Zhou
You can also search for this author inPubMed Google Scholar
- Jiajun Liang
You can also search for this author inPubMed Google Scholar
- Osamu Yoshie
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toRenjie Song.
Editor information
Editors and Affiliations
Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner
1Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cui, Q.et al. (2022). Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13686. Springer, Cham. https://doi.org/10.1007/978-3-031-19809-0_2
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-031-19808-3
Online ISBN:978-3-031-19809-0
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative