Movatterモバイル変換

Zhong Xiang ORCID:orcid.org/0000-0003-3046-6170¹,
Chenglin Zhu¹,
Miao Qian¹,
Yujia Shen¹ &
…
Yizhou Shao¹

743Accesses
1Altmetric
Explore all metrics

Abstract

Clothing image segmentation is a method to predict the clothing category label of each pixel in the input image. We reduced the influence of the variability of image shots, the similarity of clothing categories, and the complexity of boundaries on the segmentation accuracy of clothing images by developing an advanced ResNet50-based semantic segmentation model in this study whose primary structure is the encoder–decoder. An improved spatial pyramid pooling module combined with a global feature extraction branch of a large convolution kernel is developed to achieve multi-scale feature fusion and improve the model’s ability to identify clothing and its boundary features in different shots. Furthermore, to balance the clothing shape and category information in the model, a spatial and semantic information enhancement module is proposed, which can enhance the circulation of the information between different stages of the network through cross-stage connection technology. The model was finally trained and tested on the Deepfashion2 dataset. The comparison experiment demonstrates that the proposed model obtained the highest mIoU and Boundary IoU of 74.55% and 57.51%, respectively, compared with the DeepLabv3+, PSPNet, and other networks.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Clothing image instance segmentation algorithm based on improved Mask R-CNN

Article02 April 2025

Superpixels Features Extractor Network (SP-FEN) for Clothing Parsing Enhancement

Article18 January 2020

Segmentation of ethnic clothing patterns with fusion of multiple attention mechanisms

ArticleOpen access20 May 2024

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The raw and processed data required to reproduce these findings can be obtained as follows: DeepFashion2 dataset can be downloaded fromhttps://github.com/switchablenorms/DeepFashion2.

Code availability

The code is available athttps://github.com/justdoit-lin/FashionSegNet.

References

Zhao, L.H., Liu, S.L., Zhao, X.M.: Big data and digital design models for fashion design. J. Eng. Fibers Fabrics. 16, (2021).https://doi.org/10.1177/15589250211019023
Chen, F., Chen, Z., Du, Y., Wu, Z., Li, Y., Hu, Q.: Two-dimensional virtual try-on algorithm and application research for personalized dressing. Int. J. Intell. Syst. (2022).https://doi.org/10.1002/int.23086
Article PubMed PubMed Central Google Scholar
Kim, M., Cheeyong, K.: Augmented reality fashion apparel simulation using a magic mirror. Int. J. Smart Home9, 169–178 (2015)
Article Google Scholar
Al-Amri, S.S., Kalyankar, N.V.: Image segmentation by using threshold techniques. Preprint athttps://arxiv.org/abs/1005.4020 (2010)
Muthukrishnan, R., Radha, M.: Edge detection techniques for image segmentation. Int. J. Comput. Sci. Inform. Technol.3, 259 (2011)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015).https://doi.org/10.1109/cvpr.2015.7298965
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)9351, pp. 234–241, (2015)https://doi.org/10.1007/978-3-319-24574-4_28
Zhao, H.S., Shi, J.P., Qi, X.J., Wang, X.G., Jia, J.Y.: Pyramid scene parsing network. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017).https://doi.org/10.1109/cvpr.2017.660
Chen, L.C.E., Zhu, Y.K., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. 15th European Conference on Computer Vision (ECCV)11211, pp. 833–851 (2018).https://doi.org/10.1007/978-3-030-01234-2_49
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis.88, 303–338 (2010).https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. 13th European Conference on Computer Vision (ECCV)8693, pp. 740–755 (2014).https://doi.org/10.1007/978-3-319-10602-1_48
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016).https://doi.org/10.1109/cvpr.2016.350
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vis.127, 302–321 (2019).https://doi.org/10.1007/s11263-018-1140-0
Article Google Scholar
Martinsson, J., Mogren, O.: Semantic segmentation of fashion images using feature pyramid networks. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3133–3136 (2019).https://doi.org/10.1109/iccvw.2019.00382
Mameli, M., Paolanti, M., Pietrini, R., Pazzaglia, G., Frontoni, E., Zingaretti, P.: Deep learning approaches for fashion knowledge extraction from social media: a review. Ieee Access10, 1545–1576 (2022).https://doi.org/10.1109/access.2021.3137893
Article Google Scholar
He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016).https://doi.org/10.1109/cvpr.2016.90
Liang, X.D., Liu, S., Shen, X.H., Yang, J.C., Liu, L.Q., Dong, J., Lin, L., Yan, S.C.: Deep human parsing with active template regression. IEEE Trans. Pattern Anal. Mach. Intell.37, 2402–2414 (2015).https://doi.org/10.1109/tpami.2015.2408360
Article PubMed Google Scholar
Dang, A.H., Kameyama, W.: Robust semantic segmentation for street fashion photos. 22nd IEEE International Conference on Advanced Communication Technology (ICACT), pp. 1248–1257 (2000).https://doi.org/10.23919/ICACT48636.2020.9061408
Vozarikova, G., Stana, R., Semanisin, G.: Clothing parsing using extended u-net. VISIGRAPP (5: VISAPP), pp. 15–24 (2021).https://doi.org/10.5220/0010177700150024
Cheng, Z.M., Qu, A.P., He, X.F.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. The Visual Computer36, pp. 749–762 (2022)
Xia, Z.Y., Kim, J.: Mixed spatial pyramid pooling for semantic segmentation. Appl. Soft Comput.91, 106209 (2020).https://doi.org/10.1016/j.asoc.2020.106209
Article Google Scholar
Wang, J., Wan, X., Li, L., Wang, J.: An improved deeplab model for clothing image segmentation. 2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE), pp. 49–54 (2021)
Fu, Y.P., Chen, Q.Q., Zhao, H.F.: Cgfnet: cross-guided fusion network for rgb-thermal semantic segmentation. Vis. Comput.38, 3243–3252 (2022).https://doi.org/10.1007/s00371-022-02559-2
Article Google Scholar
Chen, G.S., Li, C., Wei, W., Jing, W.P., Wozniak, M., Blazauskas, T., Damasevicius, R.: Fully convolutional neural network with augmented atrous spatial pyramid pool and fully connected fusion path for high resolution remote sensing image segmentation. Appl. Sci.9(9), 1816 (2019).https://doi.org/10.3390/app9091816
Article Google Scholar
Yan, L., Fan, B., Liu, H.M., Huo, C.L., Xiang, S.M., Pan, C.H.: Triplet adversarial domain adaptation for pixel-level classification of vhr remote sensing images. IEEE Trans. Geosci. Remote Sens.58, 3558–3573 (2020).https://doi.org/10.1109/TGRS.2019.2958123
Article ADS Google Scholar
Gao, H., Guo, J.C., Wang, G.L., Zhang, Q.: Cross-domain correlation distillation for unsupervised domain adaptation in nighttime semantic segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9903–9913 (2022).https://doi.org/10.1109/CVPR52688.2022.00968
Peng, C., Zhang, X.Y., Yu, G., Luo, G.M., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1743–1751 (2017).https://doi.org/10.1109/cvpr.2017.189
Lin, G.S., Milan, A., Shen, C.H., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5168–5177 (2017).https://doi.org/10.1109/cvpr.2017.549
Yang, Q.R., Ku, T., Hu, K.Y.: Efficient attention pyramid network for semantic segmentation. Ieee Access9, 18867–18875 (2021).https://doi.org/10.1109/access.2021.3053316
Article Google Scholar
Wu, Y., Jiang, J.Y., Huang, Z.M., Tian, Y.L.: Fpanet: feature pyramid aggregation network for real-time semantic segmentation. Appl. Intell.52, 3319–3336 (2022).https://doi.org/10.1007/s10489-021-02603-z
Article Google Scholar
Ding, X., Zhang, X., Han, J., Ding, G.: Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11963–11975 (2022)
Ge, Y.Y., Zhang, R.M., Wang, X.G., Tang, X.O., Luo, P.: Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5332–5340 (2019).https://doi.org/10.1109/cvpr.2019.00548
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: A large-scale hierarchical image database. IEEE-Computer-Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 248–255 (2009).https://doi.org/10.1109/cvpr.2009.5206848
Lin, T.Y., Goyal, P., Girshick, R., He, K.M., Dollar, P.: Focal loss for dense object detection. 16th IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017).https://doi.org/10.1109/iccv.2017.324
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. 4th IEEE International Conference on 3D Vision (3DV), pp. 565–571 (2016).https://doi.org/10.1109/3dv.2016.79
Jouanneau, W., Bugeau, A., Palyart, M., Papadakis, N., Vezard, L.: Where are my clothes? a multi-level approach for evaluating deep instance segmentation architectures on fashion images. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3946–3950 (2000).https://doi.org/10.1109/cvprw53098.2021.00443
Cheng, B.W., Girshick, R., Dollar, P., Berg, A.C., Kirillov, A.: Boundary iou: Improving object-centric image segmentation evaluation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15329–15337 (2021).https://doi.org/10.1109/cvpr46437.2021.01508
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems24 (2011)
Huang, Z.L., Wang, X.G., Huang, L.C., Huang, C., Wei, Y.C., Liu, W.Y.: Ccnet: Criss-cross attention for semantic segmentation. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 603–612 (2019).https://doi.org/10.1109/iccv.2019.00069
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.39, 2481–2495 (2017).https://doi.org/10.1109/tpami.2016.2644615
Article PubMed Google Scholar
Fu, J., Liu, J., Tian, H.J., Li, Y., Bao, Y.J., Fang, Z.W., Lu, H.Q.: Dual attention network for scene segmentation. 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3141–3149 (2019).https://doi.org/10.1109/cvpr.2019.00326
Kirillov, A., Girshick, R., He, K.M., Dollar, P.: Panoptic feature pyramid networks. 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6392–6401 (2019).https://doi.org/10.1109/cvpr.2019.00656
Cao, Y., Xu, J.R., Lin, S., Wei, F.Y., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1971–1980 (2019).https://doi.org/10.1109/iccvw.2019.00246
He, J.J., Deng, Z.Y., Zhou, L., Wang, Y.L., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7511–7520 (2019).https://doi.org/10.1109/cvpr.2019.00770
Yu, C.Q., Gao, C.X., Wang, J.B., Yu, G., Shen, C.H., Sang, N.: Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vision129, 3051–3068 (2021).https://doi.org/10.1007/s11263-021-01515-2
Article Google Scholar
Li, X., Zhong, Z.S., Wu, J.L., Yang, Y.B., Lin, Z.C., Liu, H.: Expectation-maximization attention networks for semantic segmentation. IEEE International Conference on Computer Vision (ICCV), pp. 9166–9175 (2019).https://doi.org/10.1109/ICCV.2019.00926
Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3182–3189 (2014).https://doi.org/10.1109/CVPR.2014.407
Liu, S., Feng, J.S., Domokos, C., Xu, H., Huang, J.S., Hu, Z.Z., Yan, S.C.: Fashion parsing with weak color-category labels. IEEE Trans. Multimed.16, 253–265 (2014).https://doi.org/10.1109/TMM.2013.2285526
Article Google Scholar
Gong, K., Liang, X.D., Li, Y.C., Chen, Y.M., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. Proc. Eur. conf. Comput vis (ECCV)11208, 805–822 (2018).https://doi.org/10.1007/978-3-030-01225-0_47
Article Google Scholar
Luo, P., Wang, X.G., Tang, X.O.: Pedestrian parsing via deep decompositional network. IEEE International Conference on Computer Vision (ICCV), pp. 2648–2655 (2013).https://doi.org/10.1109/ICCV.2013.329
Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashion photographs. 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2012).https://doi.org/10.1109/cvpr.2012.6248101
Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Retrieving similar styles to parse clothing. IEEE Trans. Pattern Anal. Mach. Intell.37, 1028–1040 (2015).https://doi.org/10.1109/TPAMI.2014.2353624
Article PubMed Google Scholar
Ji, W., Li, X., Wu, F., Pan, Z.J., Zhuang, Y.T.: Human-centric clothing segmentation via deformable semantic locality-preserving network. IEEE Trans. Circ. Syst. Video Technol.30, 4837–4848 (2020).https://doi.org/10.1109/TCSVT.2019.2962216
Article Google Scholar
Zhao, R.L., Xue, Y.B., Cai, J., Gao, Z.: Parsing human image by fusing semantic and spatial features: a deep learning approach. Information Processing57 (2020).https://doi.org/10.1016/j.ipm.2020.102306
Wang, F., Zhao, Y.Q., Yin, B.L., Xu, T.: Parsing fashion image into mid-level semantic parts based on chain-conditional random fields. IET Image Proc.10, 456–463 (2016).https://doi.org/10.1049/iet-ipr.2015.0507
Article Google Scholar
Ihsan, A.M., Loo, C.K., Naji, S.A., Seera, M.: Superpixels features extractor network (sp-fen) for clothing parsing enhancement. Neural Process. Lett.51, 2245–2263 (2020).https://doi.org/10.1007/s11063-019-10173-y
Article Google Scholar
Li, H.C., Xiong, P.F., An, J., Wang, L.X.: Pyramid attention network for semantic segmentation. Preprint athttps://arxiv.org/abs/1805.10180 (2018)
Ji, W., Li, X., Zhuang, Y.T., Bourahla, O.E., Ji, Y.X., Li, S.A., Cui, J.B.: Semantic locality-aware deformable network for clothing segmentation. IJCAI, pp .764–770 (2018)
Liang, X.D., Gong, K., Shen, X.H., Lin, L.: Look into person: joint body parsing and pose estimation network and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell.41, 871–885 (2019).https://doi.org/10.1109/TPAMI.2018.2820063
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

Funding

This work was supported by the National Natural Science Foundation of China [Grant Number 51605443].

Author information

Authors and Affiliations

School of Mechanical Engineering, Zhejiang Sci-Tech University, 928/2 Main Street, Xiasha Higher Education Park, Hangzhou, 310018, Zhejiang, People’s Republic of China
Zhong Xiang, Chenglin Zhu, Miao Qian, Yujia Shen & Yizhou Shao

Authors

Zhong Xiang
View author publications
You can also search for this author inPubMed Google Scholar
Chenglin Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Miao Qian
View author publications
You can also search for this author inPubMed Google Scholar
Yujia Shen
View author publications
You can also search for this author inPubMed Google Scholar
Yizhou Shao
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

ZX contributed to supervision, conceptualization, methodology, data curation, writing—original draft preparation, and revision. CZ contributed to visualization, investigation, writing, and software. MQ and YS performed writing. YS provided methodology.

Corresponding author

Correspondence toZhong Xiang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiang, Z., Zhu, C., Qian, M.et al. FashionSegNet: a model for high-precision semantic segmentation of clothing images.Vis Comput40, 1711–1727 (2024). https://doi.org/10.1007/s00371-023-02881-3

Download citation

Accepted:17 April 2023
Published:18 May 2023
Issue Date:March 2024
DOI:https://doi.org/10.1007/s00371-023-02881-3

Movatterモバイル変換

FashionSegNet: a model for high-precision semantic segmentation of clothing images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Clothing image instance segmentation algorithm based on improved Mask R-CNN

Superpixels Features Extractor Network (SP-FEN) for Clothing Parsing Enhancement

Segmentation of ethnic clothing patterns with fusion of multiple attention mechanisms

Explore related subjects

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Access this article

Subscribe and save

Buy Now