- Chaitanya Modiboyina ORCID:orcid.org/0009-0002-4611-23341 na1,
- Indrajit Chakrabarti1 na1 &
- Soumya Kanti Ghosh2 na1
293Accesses
Abstract
The U-Net is a popular deep-learning model for semantic segmentation tasks. This paper describes an implementation of the U-Net architecture on FPGA (Field Programmable Gate Array) for real-time image segmentation. The proposed design uses a parallel-pipelined architecture to achieve high throughput and also focuses on addressing the resource and power constraints in edge devices by compressing CNN (Convolutional Neural Networks) models and improving hardware efficiency. To this end, we propose a pruning technique based on parallel quantization that reduces weight storage requirements by quantizing U-Net layers into a few segments, which in turn leads to the light weight of the U-Net model. The system requires\(\approx 1.5Mb\) of memory for storing weights. The Electron Microscopy Dataset and BraTs Dataset has demonstrated the proposed U-Net architecture, achieving an Intersection over Union (IoU) of 90.31% and 94.1% when utilizing 4-bit quantized weights. Additionally, we designed a shift-based U-Net accelerator that replaces multiplications with simple shift operations, further improving efficiency. The proposed U-Net architecture achieves a 3.5\(\times\) reduction in power consumption and a 35% reduction in area compared to previous architectures. To further reduce power consumption, we omit the computation for zero weights. Overall, the present work puts forward an effective method for optimizing CNN models in edge devices while meeting their computational and power constraints.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.






















Similar content being viewed by others
Data Availability Statement
The data that support the experimental evaluations in this study are taken from the Electron Microscopic and BraTs online database which are duly cited in this paper. Further, the datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Notes
Electron Microscopy Dataset:https://www.epfl.ch/labs/cvlab/data/data-em/
References
A. Ardakani, C. Condo, M. Ahmadi, W.J. Gross, An architecture to accelerate convolution in deep neural networks. IEEE Trans. Circuits Syst. I Regul. Pap.65(4), 1349–1362 (2018)
L. Bai, Y. Zhao, X. Huang, A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circuits Syst. II Express Briefs65(10), 1415–1419 (2018)
A. Bulat, G. Tzimiropoulos, Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources.in 2017 IEEE International Conference on Computer Vision (ICCV), pp.3726–3734 (2017).
Y.-H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits52(1), 127–138 (2017)
M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv preprinthttp://arxiv.org/abs/1602.02830 (2016).
C. Dechesne, P. Lassalle, S. Lefèvre, Bayesian u-net: Estimating uncertainty in semantic segmentation of earth observation images. Remote Sensing.13(19), 3836 (2021)
A. Esmaeilzehi, L. Ma, M. O. Ahmad, Towards Analyzing the Robustness of Deep Light-weight Image Super Resolution Networks under Distribution Shift.in 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2022).
A. Esmaeilzehi, M. O. Ahmad, M. N. S. Swamy, Srnmfrb: A Deep Light-Weight Super Resolution Network Using Multi-Receptive Field Feature Generation Residual Blocks. in 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2020).
A. Esmaeilzehi, M. O. Ahmad, M. N. S. Swamy, FPNet: A Deep Light-Weight Interpretable Neural Network Using Forward Prediction Filtering for Efficient Single Image Super Resolution. IEEE Trans. Circuits Syst II: Express Briefs,69(3), 1937–1941 (2021).
A. Esmaeilzehi, M.O. Ahmad, M.N.S. Swamy, Ultralight-Weight Three-Prior Convolutional Neural Network for Single Image Super Resolution. IEEE Trans. Artificial Intelligence4(6), 1724–1738 (2023)
S. Fang, L. Tian, J. Wang, S. Liang, D. Xie, Z. Chen, L. Sui, Q. Yu, X. Sun, Y. Shan, Y. Wang, Real-time object detection and semantic segmentation hardware system with deep learning networks.in 2018 International Conference on Field-Programmable Technology (FPT), pp. 389–392 (2018).
K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, H. Yang, Angel-eye: a complete design flow for mapping cnn onto embedded fpga. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.37(1), 35–47 (2018)
S. Han, J. Pool, J. Tran, W.J. Dally, Learning both weights and connections for efficient neural networks.in Proceedings of the 28th International Conference on Neural Information Processing Systems (NeurIPS), pp. 1135–1143 (2015).
H. Huang, Y. Wu, M. Yu, X. Shi, F. Qiao, L. Luo, Q. Wei, X. Liu, Edssa: An encoder-decoder semantic segmentation networks accelerator on opencl-based fpga platform. Sensors20(14), 3969 (2020)
W. Jia, J. Cui, X. Zheng, Q. Wu, Design and implementation of real-time semantic segmentation network based on fpga.in Proceedings of the 2021 7th International Conference on Computing and Artificial Intelligence (ICAIIC), pp. 321–325 (2021).
H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. Peter Graf, Pruning filters for efficient convnets.in International Conference on Learning Representations (ICLR), pp. 1–13 (2017).
H.-J. Lin, C.-A. Shen, The data flow and architectural optimizations for a highly efficient cnn accelerator based on the depthwise separable convolution. Circuits Syst. Signal Process41, 3547–3569 (2022)
H.-W. Liu, C.-A. Shen, The design of efficient data flow and low-complexity architecture for a highly configurable cnn accelerator. Circuits Syst. Signal Process42, 4759–4783 (2023)
S. Liu, H. Fan, X. Niu, H.-C. Ng, Y. Chu, W. Luk, Optimizing cnn-based segmentation with deeply customized convolutional and deconvolutional architectures on fpga. ACM Trans. Reconfigurable Technol. Syst.11(3) (2018).
S. Liu, W. Luk, Towards an efficient accelerator for dnn-based remote sensing image segmentation on fpgas. in 2019 29th International Conference on Field Programmable Logic and Applications (FPL), pp. 187–193 (2019).
N. Ma, X. Zhang, H.-T. Zheng, J. Sun, Shufflenet v2: Practical guidelines for efficient cnn architecture design.in Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018).
F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation.in Proceedings of the Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016).
M. Mubashir, H. Ali, C. Grönlund, S. Azmat, R2u++: A multiscale recurrent residual u-net with dense skip connections for medical image segmentation. Neural Comput. Appl.34(20), 17723–17739 (2022)
D.-T. Nguyen, T.N. Nguyen, H. Kim, H.-J. Lee, A high-throughput and power efficient fpga implementation of yolo cnn for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.27(8), 1861–1873 (2019)
D. Przewlocka-Rus, S.S. Sarwa, H. E. Sumbul, Y. Li, B. De Salvo, Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks. arXivpreprinthttps://arxiv.org/abs/2203.05025. (2022).
D. Przewlocka-Rus, T. Kryjak. 2023. Energy efficient hardware acceleration of neural networks with power-of-two quantisation.in Internation Conference on Computer Vision and Graphics (ICCVG). Springer. Cham. 225–236
N.S. Punn, S. Agarwal, Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images. ACM Trans. Multimedia Comput. Commun. Appl.16(1), 1–15 (2020)
M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks.in Proceedings of European Conference on Computer Vision (ECCV), pp. 525–542 (2016).
G. Raut, J. Mukala, V. Sharma, S.K. Vishvakarma, Designing a performance-centric mac unit with pipelined architecture for dnn accelerators. Circuits Syst. Signal Process42, 6089–6115 (2023)
O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation.in Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234–241. Springer, Cham (2015).
L. Rundo, C. Han, Y. Nagano, J. Zhang, R. Hataya, C. Militello, A. Tangherloni, M.S. Nobile, C. Ferretti, D. Besozzi, M.C. Gilardi, S. Vitabile, G. Mauri, H. Nakayama, P. Cazzaniga, Use-net: Incorporating squeeze-and-excitation blocks into u-net for prostate zonal segmentation of multi-institutional mri datasets. Neuro computing365, 31–43 (2019)
N. Sambyal, P. Saini, R. Syal, V. Gupta, Modified u-net architecture for semantic segmentation of diabetic retinopathy images. Biocybern. Biomed. Eng.40(3), 1094–1109 (2020)
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018).
N. Siddique, S. Paheding, C.P. Elkin, V. Devabhaktuni, U-net and its variants for medical image segmentation: A review of theory and applications. IEEE Access9, 82031–82057 (2021)
H. Song, Y.Wang, S. Zeng, X. Guo, Z. Li, Oau-net: Outlined attention u-net for biomedical image segmentation. Biomed. Signal Process.Control79 (2023).
R. Stahl, A. Hoffman, D. Mueller Gritschneder, A. Gerstlauer, U. Schlichtmann, Deeperthings: fully distributed cnn inference on resourceconstrained edge devices. Int. J. Parallel Program49, 600–624 (2021)
F. Sun et al., Circle-u-net: An efficient architecture for semantic segmentation. Algorithms.14(6), 159 (2021)
R. Szeliski, 2010. Computer Vision: Algorithms and Applications. Springer. Cham. 187–271
F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, S. Wei, Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans. Very Large Scale Integr. VLSI Syst.25(8), 2220–2233 (2017)
V. Venkata Bhargava Narendra, P. Rangababu, B. K. Balabantaray. 2021. Lowpower u-net for semantic image segmentation.in Machine Learning Deep Learning and Computational Intelligence for Wireless Communication (MDCWC). Springer. Singapore. 473–491
S. Wu, G. Li, F. Chen, and L. Shi, Training and Inference with Integers in Deep Neural Networks. arXiv preprinthttp://arxiv.org/abs/1802.04680. (2018)
Y. Yu, C. Wu, T. Zhao, K. Wang, L. He, Opu: An fpga-based overlay processor for convolutional neural networks. IEEE Trans. Very Large Scale Integr. VLSI Syst.28(1), 35–47 (2020)
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang. (2018). Unet++: A nested u-net architecture for medical image segmentation.in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11. Springer. Cham
Funding
No funding has been received for this work.
Author information
Indrajit Chakrabarti and Soumya Kanti Ghosh have contributed equally to this work.
Authors and Affiliations
Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India
Chaitanya Modiboyina & Indrajit Chakrabarti
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India
Soumya Kanti Ghosh
- Chaitanya Modiboyina
You can also search for this author inPubMed Google Scholar
- Indrajit Chakrabarti
You can also search for this author inPubMed Google Scholar
- Soumya Kanti Ghosh
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toChaitanya Modiboyina.
Ethics declarations
Conflicts of interests
Further, the authors declare that there are no conflicts of interests/competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Modiboyina, C., Chakrabarti, I. & Ghosh, S.K. Lightweight Low-Power U-Net Architecture for Semantic Segmentation.Circuits Syst Signal Process44, 2527–2561 (2025). https://doi.org/10.1007/s00034-024-02920-x
Received:
Revised:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative