- Zhong Xiang ORCID:orcid.org/0000-0003-3046-61701,
- Chenglin Zhu1,
- Miao Qian1,
- Yujia Shen1 &
- …
- Yizhou Shao1
743Accesses
1Altmetric
Abstract
Clothing image segmentation is a method to predict the clothing category label of each pixel in the input image. We reduced the influence of the variability of image shots, the similarity of clothing categories, and the complexity of boundaries on the segmentation accuracy of clothing images by developing an advanced ResNet50-based semantic segmentation model in this study whose primary structure is the encoder–decoder. An improved spatial pyramid pooling module combined with a global feature extraction branch of a large convolution kernel is developed to achieve multi-scale feature fusion and improve the model’s ability to identify clothing and its boundary features in different shots. Furthermore, to balance the clothing shape and category information in the model, a spatial and semantic information enhancement module is proposed, which can enhance the circulation of the information between different stages of the network through cross-stage connection technology. The model was finally trained and tested on the Deepfashion2 dataset. The comparison experiment demonstrates that the proposed model obtained the highest mIoU and Boundary IoU of 74.55% and 57.51%, respectively, compared with the DeepLabv3+, PSPNet, and other networks.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.

















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The raw and processed data required to reproduce these findings can be obtained as follows: DeepFashion2 dataset can be downloaded fromhttps://github.com/switchablenorms/DeepFashion2.
Code availability
The code is available athttps://github.com/justdoit-lin/FashionSegNet.
References
Zhao, L.H., Liu, S.L., Zhao, X.M.: Big data and digital design models for fashion design. J. Eng. Fibers Fabrics. 16, (2021).https://doi.org/10.1177/15589250211019023
Chen, F., Chen, Z., Du, Y., Wu, Z., Li, Y., Hu, Q.: Two-dimensional virtual try-on algorithm and application research for personalized dressing. Int. J. Intell. Syst. (2022).https://doi.org/10.1002/int.23086
Kim, M., Cheeyong, K.: Augmented reality fashion apparel simulation using a magic mirror. Int. J. Smart Home9, 169–178 (2015)
Al-Amri, S.S., Kalyankar, N.V.: Image segmentation by using threshold techniques. Preprint athttps://arxiv.org/abs/1005.4020 (2010)
Muthukrishnan, R., Radha, M.: Edge detection techniques for image segmentation. Int. J. Comput. Sci. Inform. Technol.3, 259 (2011)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015).https://doi.org/10.1109/cvpr.2015.7298965
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)9351, pp. 234–241, (2015)https://doi.org/10.1007/978-3-319-24574-4_28
Zhao, H.S., Shi, J.P., Qi, X.J., Wang, X.G., Jia, J.Y.: Pyramid scene parsing network. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017).https://doi.org/10.1109/cvpr.2017.660
Chen, L.C.E., Zhu, Y.K., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. 15th European Conference on Computer Vision (ECCV)11211, pp. 833–851 (2018).https://doi.org/10.1007/978-3-030-01234-2_49
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis.88, 303–338 (2010).https://doi.org/10.1007/s11263-009-0275-4
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. 13th European Conference on Computer Vision (ECCV)8693, pp. 740–755 (2014).https://doi.org/10.1007/978-3-319-10602-1_48
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016).https://doi.org/10.1109/cvpr.2016.350
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vis.127, 302–321 (2019).https://doi.org/10.1007/s11263-018-1140-0
Martinsson, J., Mogren, O.: Semantic segmentation of fashion images using feature pyramid networks. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3133–3136 (2019).https://doi.org/10.1109/iccvw.2019.00382
Mameli, M., Paolanti, M., Pietrini, R., Pazzaglia, G., Frontoni, E., Zingaretti, P.: Deep learning approaches for fashion knowledge extraction from social media: a review. Ieee Access10, 1545–1576 (2022).https://doi.org/10.1109/access.2021.3137893
He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016).https://doi.org/10.1109/cvpr.2016.90
Liang, X.D., Liu, S., Shen, X.H., Yang, J.C., Liu, L.Q., Dong, J., Lin, L., Yan, S.C.: Deep human parsing with active template regression. IEEE Trans. Pattern Anal. Mach. Intell.37, 2402–2414 (2015).https://doi.org/10.1109/tpami.2015.2408360
Dang, A.H., Kameyama, W.: Robust semantic segmentation for street fashion photos. 22nd IEEE International Conference on Advanced Communication Technology (ICACT), pp. 1248–1257 (2000).https://doi.org/10.23919/ICACT48636.2020.9061408
Vozarikova, G., Stana, R., Semanisin, G.: Clothing parsing using extended u-net. VISIGRAPP (5: VISAPP), pp. 15–24 (2021).https://doi.org/10.5220/0010177700150024
Cheng, Z.M., Qu, A.P., He, X.F.: Contour-aware semantic segmentation network with spatial attention mechanism for medical image. The Visual Computer36, pp. 749–762 (2022)
Xia, Z.Y., Kim, J.: Mixed spatial pyramid pooling for semantic segmentation. Appl. Soft Comput.91, 106209 (2020).https://doi.org/10.1016/j.asoc.2020.106209
Wang, J., Wan, X., Li, L., Wang, J.: An improved deeplab model for clothing image segmentation. 2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE), pp. 49–54 (2021)
Fu, Y.P., Chen, Q.Q., Zhao, H.F.: Cgfnet: cross-guided fusion network for rgb-thermal semantic segmentation. Vis. Comput.38, 3243–3252 (2022).https://doi.org/10.1007/s00371-022-02559-2
Chen, G.S., Li, C., Wei, W., Jing, W.P., Wozniak, M., Blazauskas, T., Damasevicius, R.: Fully convolutional neural network with augmented atrous spatial pyramid pool and fully connected fusion path for high resolution remote sensing image segmentation. Appl. Sci.9(9), 1816 (2019).https://doi.org/10.3390/app9091816
Yan, L., Fan, B., Liu, H.M., Huo, C.L., Xiang, S.M., Pan, C.H.: Triplet adversarial domain adaptation for pixel-level classification of vhr remote sensing images. IEEE Trans. Geosci. Remote Sens.58, 3558–3573 (2020).https://doi.org/10.1109/TGRS.2019.2958123
Gao, H., Guo, J.C., Wang, G.L., Zhang, Q.: Cross-domain correlation distillation for unsupervised domain adaptation in nighttime semantic segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9903–9913 (2022).https://doi.org/10.1109/CVPR52688.2022.00968
Peng, C., Zhang, X.Y., Yu, G., Luo, G.M., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1743–1751 (2017).https://doi.org/10.1109/cvpr.2017.189
Lin, G.S., Milan, A., Shen, C.H., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5168–5177 (2017).https://doi.org/10.1109/cvpr.2017.549
Yang, Q.R., Ku, T., Hu, K.Y.: Efficient attention pyramid network for semantic segmentation. Ieee Access9, 18867–18875 (2021).https://doi.org/10.1109/access.2021.3053316
Wu, Y., Jiang, J.Y., Huang, Z.M., Tian, Y.L.: Fpanet: feature pyramid aggregation network for real-time semantic segmentation. Appl. Intell.52, 3319–3336 (2022).https://doi.org/10.1007/s10489-021-02603-z
Ding, X., Zhang, X., Han, J., Ding, G.: Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11963–11975 (2022)
Ge, Y.Y., Zhang, R.M., Wang, X.G., Tang, X.O., Luo, P.: Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5332–5340 (2019).https://doi.org/10.1109/cvpr.2019.00548
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: A large-scale hierarchical image database. IEEE-Computer-Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 248–255 (2009).https://doi.org/10.1109/cvpr.2009.5206848
Lin, T.Y., Goyal, P., Girshick, R., He, K.M., Dollar, P.: Focal loss for dense object detection. 16th IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017).https://doi.org/10.1109/iccv.2017.324
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. 4th IEEE International Conference on 3D Vision (3DV), pp. 565–571 (2016).https://doi.org/10.1109/3dv.2016.79
Jouanneau, W., Bugeau, A., Palyart, M., Papadakis, N., Vezard, L.: Where are my clothes? a multi-level approach for evaluating deep instance segmentation architectures on fashion images. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3946–3950 (2000).https://doi.org/10.1109/cvprw53098.2021.00443
Cheng, B.W., Girshick, R., Dollar, P., Berg, A.C., Kirillov, A.: Boundary iou: Improving object-centric image segmentation evaluation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15329–15337 (2021).https://doi.org/10.1109/cvpr46437.2021.01508
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems24 (2011)
Huang, Z.L., Wang, X.G., Huang, L.C., Huang, C., Wei, Y.C., Liu, W.Y.: Ccnet: Criss-cross attention for semantic segmentation. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 603–612 (2019).https://doi.org/10.1109/iccv.2019.00069
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.39, 2481–2495 (2017).https://doi.org/10.1109/tpami.2016.2644615
Fu, J., Liu, J., Tian, H.J., Li, Y., Bao, Y.J., Fang, Z.W., Lu, H.Q.: Dual attention network for scene segmentation. 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3141–3149 (2019).https://doi.org/10.1109/cvpr.2019.00326
Kirillov, A., Girshick, R., He, K.M., Dollar, P.: Panoptic feature pyramid networks. 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6392–6401 (2019).https://doi.org/10.1109/cvpr.2019.00656
Cao, Y., Xu, J.R., Lin, S., Wei, F.Y., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1971–1980 (2019).https://doi.org/10.1109/iccvw.2019.00246
He, J.J., Deng, Z.Y., Zhou, L., Wang, Y.L., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7511–7520 (2019).https://doi.org/10.1109/cvpr.2019.00770
Yu, C.Q., Gao, C.X., Wang, J.B., Yu, G., Shen, C.H., Sang, N.: Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vision129, 3051–3068 (2021).https://doi.org/10.1007/s11263-021-01515-2
Li, X., Zhong, Z.S., Wu, J.L., Yang, Y.B., Lin, Z.C., Liu, H.: Expectation-maximization attention networks for semantic segmentation. IEEE International Conference on Computer Vision (ICCV), pp. 9166–9175 (2019).https://doi.org/10.1109/ICCV.2019.00926
Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3182–3189 (2014).https://doi.org/10.1109/CVPR.2014.407
Liu, S., Feng, J.S., Domokos, C., Xu, H., Huang, J.S., Hu, Z.Z., Yan, S.C.: Fashion parsing with weak color-category labels. IEEE Trans. Multimed.16, 253–265 (2014).https://doi.org/10.1109/TMM.2013.2285526
Gong, K., Liang, X.D., Li, Y.C., Chen, Y.M., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. Proc. Eur. conf. Comput vis (ECCV)11208, 805–822 (2018).https://doi.org/10.1007/978-3-030-01225-0_47
Luo, P., Wang, X.G., Tang, X.O.: Pedestrian parsing via deep decompositional network. IEEE International Conference on Computer Vision (ICCV), pp. 2648–2655 (2013).https://doi.org/10.1109/ICCV.2013.329
Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashion photographs. 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2012).https://doi.org/10.1109/cvpr.2012.6248101
Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Retrieving similar styles to parse clothing. IEEE Trans. Pattern Anal. Mach. Intell.37, 1028–1040 (2015).https://doi.org/10.1109/TPAMI.2014.2353624
Ji, W., Li, X., Wu, F., Pan, Z.J., Zhuang, Y.T.: Human-centric clothing segmentation via deformable semantic locality-preserving network. IEEE Trans. Circ. Syst. Video Technol.30, 4837–4848 (2020).https://doi.org/10.1109/TCSVT.2019.2962216
Zhao, R.L., Xue, Y.B., Cai, J., Gao, Z.: Parsing human image by fusing semantic and spatial features: a deep learning approach. Information Processing57 (2020).https://doi.org/10.1016/j.ipm.2020.102306
Wang, F., Zhao, Y.Q., Yin, B.L., Xu, T.: Parsing fashion image into mid-level semantic parts based on chain-conditional random fields. IET Image Proc.10, 456–463 (2016).https://doi.org/10.1049/iet-ipr.2015.0507
Ihsan, A.M., Loo, C.K., Naji, S.A., Seera, M.: Superpixels features extractor network (sp-fen) for clothing parsing enhancement. Neural Process. Lett.51, 2245–2263 (2020).https://doi.org/10.1007/s11063-019-10173-y
Li, H.C., Xiong, P.F., An, J., Wang, L.X.: Pyramid attention network for semantic segmentation. Preprint athttps://arxiv.org/abs/1805.10180 (2018)
Ji, W., Li, X., Zhuang, Y.T., Bourahla, O.E., Ji, Y.X., Li, S.A., Cui, J.B.: Semantic locality-aware deformable network for clothing segmentation. IJCAI, pp .764–770 (2018)
Liang, X.D., Gong, K., Shen, X.H., Lin, L.: Look into person: joint body parsing and pose estimation network and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell.41, 871–885 (2019).https://doi.org/10.1109/TPAMI.2018.2820063
Acknowledgements
The authors would like to thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.
Funding
This work was supported by the National Natural Science Foundation of China [Grant Number 51605443].
Author information
Authors and Affiliations
School of Mechanical Engineering, Zhejiang Sci-Tech University, 928/2 Main Street, Xiasha Higher Education Park, Hangzhou, 310018, Zhejiang, People’s Republic of China
Zhong Xiang, Chenglin Zhu, Miao Qian, Yujia Shen & Yizhou Shao
- Zhong Xiang
You can also search for this author inPubMed Google Scholar
- Chenglin Zhu
You can also search for this author inPubMed Google Scholar
- Miao Qian
You can also search for this author inPubMed Google Scholar
- Yujia Shen
You can also search for this author inPubMed Google Scholar
- Yizhou Shao
You can also search for this author inPubMed Google Scholar
Contributions
ZX contributed to supervision, conceptualization, methodology, data curation, writing—original draft preparation, and revision. CZ contributed to visualization, investigation, writing, and software. MQ and YS performed writing. YS provided methodology.
Corresponding author
Correspondence toZhong Xiang.
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiang, Z., Zhu, C., Qian, M.et al. FashionSegNet: a model for high-precision semantic segmentation of clothing images.Vis Comput40, 1711–1727 (2024). https://doi.org/10.1007/s00371-023-02881-3
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative