- Héctor Martínez11,
- Sandra Catalán14,
- Carlos García12,
- Francisco D. Igual12,
- Rafael Rodríguez-Sánchez13,
- Adrián Castelló15 &
- …
- Enrique S. Quintana-Ortí15
Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 15058))
Included in the following conference series:
425Accesses
Abstract
Following the recent advances in open hardware generally, andRISC-V architectures particularly, we analyse the performance of transformer encoder inference on three low-power platforms with this type of architecture. For this purpose, we conduct a detailed profile of the inference process for two representative members of the BERT family, identifying the main bottlenecks and opportunities for optimisation on threeRISC-V processors equipped with floating-point SIMD (single instruction, multiple data) units: XuanTie C906, C908, and C910.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 8465
- Price includes VAT (Japan)
- Softcover Book
- JPY 10581
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The differences in the micro-kernel between both versions are minimal, replacing thevle/vse instructions by theirvle32/vse32 counterparts.
References
Brown, T.B., et al.: Language Models are Few-Shot Learners (2020).arxiv.org/abs/2005.14165
Chitty-Venkata, K.T., et al.: A survey of techniques for optimizing transformer inference. J. Syst. Archit.144, 102990 (2023)
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings 2019 Conference of the North American Chapter of the ACL: Human Language Technologies, vol. 1, pp. 4171–4186 (2019)
Dongarra, J.J., Du Croz, J., Hammarling, S., Hanson, R.J.: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Softw.14(1), 1–17 (1988)
Dosovitskiy, A., et al.: An Image is Worth\(16\times 16\) Words: Transformers for Image Recognition at Scale (2021).arxiv.org/abs/2010.11929
Goto, K., van de Geijn, R.A.: Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw.34(3), 12:1-12:25 (2008)
Hennessy, J.L., Patterson, D.A.: A new golden age for computer architecture. Comm. ACM62(2), 48–60 (2019)
Igual, F., Piñuel, L., Catalán, S., Martínez, H., Castelló, A., Quintana-Ortí, E.: Automatic generation of micro-kernels for performance portability of matrix multiplication on RISC-V vector processors. In: Proceedings of the SC ’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pp. 1523–1532. SC-W ’23, ACM, New York, NY, USA (2023)
Pati, S., et al.: Demystifying BERT: system design implications. In: 2022 IEEE International Symposium Workload Characterization (IISWC), pp. 296–309 (2022)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res.21(1), (2020)
Touvron, H., Cord, M., Jégou, H.: DeiT III: Revenge of the ViT. In: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV, pp. 516–533. Springer-Verlag, Berlin, Heidelberg (2022)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst.30, 5998–6008 (2017)
Acknowledgements
Research funded by projects PID2020-113656RB-C22, PID2021-126576NB-I00, PID2021-123627OB-C52, TED2021-129334B-I00, TED2021-130123B-I00 (MCIN/AEI/10.13039/5011 00011033), GVA CIPROM/2022/20, UJI-2023-04. H. M. is a POSTDOC_21_00025 fellow supported by Junta de Andalucía. S. C. is supported by grant RYC2021-033973-I, funded by MCIN/AEI/10.13039/501100011033 and the EU "NextGenerationEU"/PRTR, and UJI-2023-04, funded by UJI.
Author information
Authors and Affiliations
Universidad de Córdoba, Córdoba, Spain
Héctor Martínez
Universidad Complutense de Madrid, Madrid, Spain
Carlos García & Francisco D. Igual
Universidad de Castilla-La Mancha, Ciudad Real, Spain
Rafael Rodríguez-Sánchez
Universidad Jaume I de Castellón, Castelló de la Plana, Spain
Sandra Catalán
Universitat Politècnica de València, Valencia, Spain
Adrián Castelló & Enrique S. Quintana-Ortí
- Héctor Martínez
You can also search for this author inPubMed Google Scholar
- Sandra Catalán
You can also search for this author inPubMed Google Scholar
- Carlos García
You can also search for this author inPubMed Google Scholar
- Francisco D. Igual
You can also search for this author inPubMed Google Scholar
- Rafael Rodríguez-Sánchez
You can also search for this author inPubMed Google Scholar
- Adrián Castelló
You can also search for this author inPubMed Google Scholar
- Enrique S. Quintana-Ortí
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toFrancisco D. Igual.
Editor information
Editors and Affiliations
University of Edinburgh, Edinburgh, UK
Michèle Weiland
Johannes Gutenberg University Mainz, Mainz, Germany
Sarah Neuwirth
Cerfacs, Toulouse, France
Carola Kruse
Durham University, Durham, UK
Tobias Weinzierl
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Martínez, H.et al. (2025). Performance Analysis of BERT on RISC-V Processors with SIMD Units. In: Weiland, M., Neuwirth, S., Kruse, C., Weinzierl, T. (eds) High Performance Computing. ISC High Performance 2024 International Workshops. ISC High Performance 2023. Lecture Notes in Computer Science, vol 15058. Springer, Cham. https://doi.org/10.1007/978-3-031-73716-9_23
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-031-73715-2
Online ISBN:978-3-031-73716-9
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative