Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Springer Nature Link
Log in

FashionSegNet: a model for high-precision semantic segmentation of clothing images

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Clothing image segmentation is a method to predict the clothing category label of each pixel in the input image. We reduced the influence of the variability of image shots, the similarity of clothing categories, and the complexity of boundaries on the segmentation accuracy of clothing images by developing an advanced ResNet50-based semantic segmentation model in this study whose primary structure is the encoder–decoder. An improved spatial pyramid pooling module combined with a global feature extraction branch of a large convolution kernel is developed to achieve multi-scale feature fusion and improve the model’s ability to identify clothing and its boundary features in different shots. Furthermore, to balance the clothing shape and category information in the model, a spatial and semantic information enhancement module is proposed, which can enhance the circulation of the information between different stages of the network through cross-stage connection technology. The model was finally trained and tested on the Deepfashion2 dataset. The comparison experiment demonstrates that the proposed model obtained the highest mIoU and Boundary IoU of 74.55% and 57.51%, respectively, compared with the DeepLabv3+, PSPNet, and other networks.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The raw and processed data required to reproduce these findings can be obtained as follows: DeepFashion2 dataset can be downloaded fromhttps://github.com/switchablenorms/DeepFashion2.

Code availability

References

  1. Zhao, L.H., Liu, S.L., Zhao, X.M.: Big data and digital design models for fashion design. J. Eng. Fibers Fabrics. 16, (2021).https://doi.org/10.1177/15589250211019023

  2. Chen, F., Chen, Z., Du, Y., Wu, Z., Li, Y., Hu, Q.: Two-dimensional virtual try-on algorithm and application research for personalized dressing. Int. J. Intell. Syst. (2022).https://doi.org/10.1002/int.23086

    Article PubMed PubMed Central  Google Scholar 

  3. Kim, M., Cheeyong, K.: Augmented reality fashion apparel simulation using a magic mirror. Int. J. Smart Home9, 169–178 (2015)

    Article  Google Scholar 

  4. Al-Amri, S.S., Kalyankar, N.V.: Image segmentation by using threshold techniques. Preprint athttps://arxiv.org/abs/1005.4020 (2010)

  5. Muthukrishnan, R., Radha, M.: Edge detection techniques for image segmentation. Int. J. Comput. Sci. Inform. Technol.3, 259 (2011)

    Google Scholar 

  6. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015).https://doi.org/10.1109/cvpr.2015.7298965

  7. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)9351, pp. 234–241, (2015)https://doi.org/10.1007/978-3-319-24574-4_28

  8. Zhao, H.S., Shi, J.P., Qi, X.J., Wang, X.G., Jia, J.Y.: Pyramid scene parsing network. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017).https://doi.org/10.1109/cvpr.2017.660

  9. Chen, L.C.E., Zhu, Y.K., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. 15th European Conference on Computer Vision (ECCV)11211, pp. 833–851 (2018).https://doi.org/10.1007/978-3-030-01234-2_49

  10. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis.88, 303–338 (2010).https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  11. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. 13th European Conference on Computer Vision (ECCV)8693, pp. 740–755 (2014).https://doi.org/10.1007/978-3-319-10602-1_48

  12. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016).https://doi.org/10.1109/cvpr.2016.350

  13. Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vis.127, 302–321 (2019).https://doi.org/10.1007/s11263-018-1140-0

    Article  Google Scholar 

  14. Martinsson, J., Mogren, O.: Semantic segmentation of fashion images using feature pyramid networks. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3133–3136 (2019).https://doi.org/10.1109/iccvw.2019.00382

  15. Mameli, M., Paolanti, M., Pietrini, R., Pazzaglia, G., Frontoni, E., Zingaretti, P.: Deep learning approaches for fashion knowledge extraction from social media: a review. Ieee Access10, 1545–1576 (2022).https://doi.org/10.1109/access.2021.3137893

    Article  Google Scholar 

  16. He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016).https://doi.org/10.1109/cvpr.2016.90

  17. Liang, X.D., Liu, S., Shen, X.H., Yang, J.C., Liu, L.Q., Dong, J., Lin, L., Yan, S.C.: Deep human parsing with active template regression. IEEE Trans. Pattern Anal. Mach. Intell.37, 2402–2414 (2015).https://doi.org/10.1109/tpami.2015.2408360

    Article PubMed  Google Scholar 

  18. Dang, A.H., Kameyama, W.: Robust semantic segmentation for street fashion photos. 22nd IEEE International Conference on Advanced Communication Technology (ICACT), pp. 1248–1257 (2000).https://doi.org/10.23919/ICACT48636.2020.9061408

  19. Vozarikova, G., Stana, R., Semanisin, G.: Clothing parsing using extended u-net. VISIGRAPP (5: VISAPP), pp. 15–24 (2021).https://doi.org/10.5220/0010177700150024

  20. Cheng, Z.M., Qu, A.P., He, X.F.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. The Visual Computer36, pp. 749–762 (2022)

  21. Xia, Z.Y., Kim, J.: Mixed spatial pyramid pooling for semantic segmentation. Appl. Soft Comput.91, 106209 (2020).https://doi.org/10.1016/j.asoc.2020.106209

    Article  Google Scholar 

  22. Wang, J., Wan, X., Li, L., Wang, J.: An improved deeplab model for clothing image segmentation. 2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE), pp. 49–54 (2021)

  23. Fu, Y.P., Chen, Q.Q., Zhao, H.F.: Cgfnet: cross-guided fusion network for rgb-thermal semantic segmentation. Vis. Comput.38, 3243–3252 (2022).https://doi.org/10.1007/s00371-022-02559-2

    Article  Google Scholar 

  24. Chen, G.S., Li, C., Wei, W., Jing, W.P., Wozniak, M., Blazauskas, T., Damasevicius, R.: Fully convolutional neural network with augmented atrous spatial pyramid pool and fully connected fusion path for high resolution remote sensing image segmentation. Appl. Sci.9(9), 1816 (2019).https://doi.org/10.3390/app9091816

    Article  Google Scholar 

  25. Yan, L., Fan, B., Liu, H.M., Huo, C.L., Xiang, S.M., Pan, C.H.: Triplet adversarial domain adaptation for pixel-level classification of vhr remote sensing images. IEEE Trans. Geosci. Remote Sens.58, 3558–3573 (2020).https://doi.org/10.1109/TGRS.2019.2958123

    Article ADS  Google Scholar 

  26. Gao, H., Guo, J.C., Wang, G.L., Zhang, Q.: Cross-domain correlation distillation for unsupervised domain adaptation in nighttime semantic segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9903–9913 (2022).https://doi.org/10.1109/CVPR52688.2022.00968

  27. Peng, C., Zhang, X.Y., Yu, G., Luo, G.M., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1743–1751 (2017).https://doi.org/10.1109/cvpr.2017.189

  28. Lin, G.S., Milan, A., Shen, C.H., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5168–5177 (2017).https://doi.org/10.1109/cvpr.2017.549

  29. Yang, Q.R., Ku, T., Hu, K.Y.: Efficient attention pyramid network for semantic segmentation. Ieee Access9, 18867–18875 (2021).https://doi.org/10.1109/access.2021.3053316

    Article  Google Scholar 

  30. Wu, Y., Jiang, J.Y., Huang, Z.M., Tian, Y.L.: Fpanet: feature pyramid aggregation network for real-time semantic segmentation. Appl. Intell.52, 3319–3336 (2022).https://doi.org/10.1007/s10489-021-02603-z

    Article  Google Scholar 

  31. Ding, X., Zhang, X., Han, J., Ding, G.: Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11963–11975 (2022)

  32. Ge, Y.Y., Zhang, R.M., Wang, X.G., Tang, X.O., Luo, P.: Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5332–5340 (2019).https://doi.org/10.1109/cvpr.2019.00548

  33. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: A large-scale hierarchical image database. IEEE-Computer-Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 248–255 (2009).https://doi.org/10.1109/cvpr.2009.5206848

  34. Lin, T.Y., Goyal, P., Girshick, R., He, K.M., Dollar, P.: Focal loss for dense object detection. 16th IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017).https://doi.org/10.1109/iccv.2017.324

  35. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. 4th IEEE International Conference on 3D Vision (3DV), pp. 565–571 (2016).https://doi.org/10.1109/3dv.2016.79

  36. Jouanneau, W., Bugeau, A., Palyart, M., Papadakis, N., Vezard, L.: Where are my clothes? a multi-level approach for evaluating deep instance segmentation architectures on fashion images. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3946–3950 (2000).https://doi.org/10.1109/cvprw53098.2021.00443

  37. Cheng, B.W., Girshick, R., Dollar, P., Berg, A.C., Kirillov, A.: Boundary iou: Improving object-centric image segmentation evaluation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15329–15337 (2021).https://doi.org/10.1109/cvpr46437.2021.01508

  38. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems24 (2011)

  39. Huang, Z.L., Wang, X.G., Huang, L.C., Huang, C., Wei, Y.C., Liu, W.Y.: Ccnet: Criss-cross attention for semantic segmentation. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 603–612 (2019).https://doi.org/10.1109/iccv.2019.00069

  40. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.39, 2481–2495 (2017).https://doi.org/10.1109/tpami.2016.2644615

    Article PubMed  Google Scholar 

  41. Fu, J., Liu, J., Tian, H.J., Li, Y., Bao, Y.J., Fang, Z.W., Lu, H.Q.: Dual attention network for scene segmentation. 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3141–3149 (2019).https://doi.org/10.1109/cvpr.2019.00326

  42. Kirillov, A., Girshick, R., He, K.M., Dollar, P.: Panoptic feature pyramid networks. 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6392–6401 (2019).https://doi.org/10.1109/cvpr.2019.00656

  43. Cao, Y., Xu, J.R., Lin, S., Wei, F.Y., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1971–1980 (2019).https://doi.org/10.1109/iccvw.2019.00246

  44. He, J.J., Deng, Z.Y., Zhou, L., Wang, Y.L., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7511–7520 (2019).https://doi.org/10.1109/cvpr.2019.00770

  45. Yu, C.Q., Gao, C.X., Wang, J.B., Yu, G., Shen, C.H., Sang, N.: Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vision129, 3051–3068 (2021).https://doi.org/10.1007/s11263-021-01515-2

    Article  Google Scholar 

  46. Li, X., Zhong, Z.S., Wu, J.L., Yang, Y.B., Lin, Z.C., Liu, H.: Expectation-maximization attention networks for semantic segmentation. IEEE International Conference on Computer Vision (ICCV), pp. 9166–9175 (2019).https://doi.org/10.1109/ICCV.2019.00926

  47. Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3182–3189 (2014).https://doi.org/10.1109/CVPR.2014.407

  48. Liu, S., Feng, J.S., Domokos, C., Xu, H., Huang, J.S., Hu, Z.Z., Yan, S.C.: Fashion parsing with weak color-category labels. IEEE Trans. Multimed.16, 253–265 (2014).https://doi.org/10.1109/TMM.2013.2285526

    Article  Google Scholar 

  49. Gong, K., Liang, X.D., Li, Y.C., Chen, Y.M., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. Proc. Eur. conf. Comput vis (ECCV)11208, 805–822 (2018).https://doi.org/10.1007/978-3-030-01225-0_47

    Article  Google Scholar 

  50. Luo, P., Wang, X.G., Tang, X.O.: Pedestrian parsing via deep decompositional network. IEEE International Conference on Computer Vision (ICCV), pp. 2648–2655 (2013).https://doi.org/10.1109/ICCV.2013.329

  51. Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashion photographs. 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2012).https://doi.org/10.1109/cvpr.2012.6248101

  52. Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Retrieving similar styles to parse clothing. IEEE Trans. Pattern Anal. Mach. Intell.37, 1028–1040 (2015).https://doi.org/10.1109/TPAMI.2014.2353624

    Article PubMed  Google Scholar 

  53. Ji, W., Li, X., Wu, F., Pan, Z.J., Zhuang, Y.T.: Human-centric clothing segmentation via deformable semantic locality-preserving network. IEEE Trans. Circ. Syst. Video Technol.30, 4837–4848 (2020).https://doi.org/10.1109/TCSVT.2019.2962216

    Article  Google Scholar 

  54. Zhao, R.L., Xue, Y.B., Cai, J., Gao, Z.: Parsing human image by fusing semantic and spatial features: a deep learning approach. Information Processing57 (2020).https://doi.org/10.1016/j.ipm.2020.102306

  55. Wang, F., Zhao, Y.Q., Yin, B.L., Xu, T.: Parsing fashion image into mid-level semantic parts based on chain-conditional random fields. IET Image Proc.10, 456–463 (2016).https://doi.org/10.1049/iet-ipr.2015.0507

    Article  Google Scholar 

  56. Ihsan, A.M., Loo, C.K., Naji, S.A., Seera, M.: Superpixels features extractor network (sp-fen) for clothing parsing enhancement. Neural Process. Lett.51, 2245–2263 (2020).https://doi.org/10.1007/s11063-019-10173-y

    Article  Google Scholar 

  57. Li, H.C., Xiong, P.F., An, J., Wang, L.X.: Pyramid attention network for semantic segmentation. Preprint athttps://arxiv.org/abs/1805.10180 (2018)

  58. Ji, W., Li, X., Zhuang, Y.T., Bourahla, O.E., Ji, Y.X., Li, S.A., Cui, J.B.: Semantic locality-aware deformable network for clothing segmentation. IJCAI, pp .764–770 (2018)

  59. Liang, X.D., Gong, K., Shen, X.H., Lin, L.: Look into person: joint body parsing and pose estimation network and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell.41, 871–885 (2019).https://doi.org/10.1109/TPAMI.2018.2820063

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

Funding

This work was supported by the National Natural Science Foundation of China [Grant Number 51605443].

Author information

Authors and Affiliations

  1. School of Mechanical Engineering, Zhejiang Sci-Tech University, 928/2 Main Street, Xiasha Higher Education Park, Hangzhou, 310018, Zhejiang, People’s Republic of China

    Zhong Xiang, Chenglin Zhu, Miao Qian, Yujia Shen & Yizhou Shao

Authors
  1. Zhong Xiang

    You can also search for this author inPubMed Google Scholar

  2. Chenglin Zhu

    You can also search for this author inPubMed Google Scholar

  3. Miao Qian

    You can also search for this author inPubMed Google Scholar

  4. Yujia Shen

    You can also search for this author inPubMed Google Scholar

  5. Yizhou Shao

    You can also search for this author inPubMed Google Scholar

Contributions

ZX contributed to supervision, conceptualization, methodology, data curation, writing—original draft preparation, and revision. CZ contributed to visualization, investigation, writing, and software. MQ and YS performed writing. YS provided methodology.

Corresponding author

Correspondence toZhong Xiang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp