Posts tagged MLPerf Inference
Technical Dive into AMD’s MLPerf Inference v5.1 Submission
- 09 September 2025
- Meena Arunachalam ,Miro Hodak ,Poovaiah Palangappa ,Wei-Ting Liao ,Uma Kannikanti ,Fulu Li ,Neha Mathews ,Rajesh Poornachandran ,Ean Garvey ,Kumar Deepak ,Yixing Xu ,Zhe Li ,Guanchen Li ,Xuanwu Yin ,Dong Li ,Zhao Lin ,Wei Luo ,Bowen Bao ,Spandan Tiwari ,Niels Zhang ,Vinayak Gokhale ,Clint Greene ,Eliot Li
- English
- Applications & models
- AI/MLGenAIPerformanceOptimizationMLPerf InferenceMLPerf
In the rapidly evolving landscape of artificial intelligence, the demand for reliable and efficient model inference has never been greater. With advancements in large language models (LLMs) and a growing reliance on real-time applications, benchmarks are critical in evaluating how well AI systems perform under varying conditions. EnterMLPerf Inference: Datacenter v5.1 — a significant update to the well-respected benchmarking suite that assesses inference performance across a wide array of models and use cases, catering especially to data centers.
Slim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance
- 09 September 2025
In this blog, we demonstrate how quantization, intelligent depth pruning and supervised fine-tuning can dramatically improve the inference performance of Meta’s Llama 3.1 405B model on AMD Instinct MI355X GPUs. By applying quantization and reducing the number of layers from the original 126, we are able to decrease memory requirements and boost token throughput. Additionally, with carefully applied fine-tuning, we maintain high inference accuracy for both RougeL and Exact Match metrics on MLPerf workloads. To see how these optimizations fit into AMD’s broader MLPerf Inference v5.1 efforts, readReproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission. For a detailed technical breakdown into other optimizations, check out ourTechnical Dive into AMD’s MLPerf Inference v5.1 Submission.
Reproducing the AMD Instinct™ GPUs MLPerf Inference v5.1 Submission
- 09 September 2025
- Meena Arunachalam ,Miro Hodak ,Poovaiah Palangappa ,Wei-Ting Liao ,Uma Kannikanti ,Fulu Li ,Karan Verma ,Neha Mathews ,Yamini Kamisetty ,Chelsea Iluno ,Ean Garvey ,Kumar Deepak ,Yixing Xu ,Zhe Li ,Guanchen Li ,Xuanwu Yin ,Dong Li ,Clint Greene ,Eliot Li
- English
- Applications & models
- AI/MLGenAIPerformanceOptimizationMLPerf InferenceMLPerf
MLPerf Inference v5.1 marks AMD’s third round of submissions and the most ambitious yet. This round features submissions on AMD Instinct MI325X and MI355X systems, including multi-node inference and models in MXFP4 datatype. Building upon the success inMLPerf Inference v5.0, AMD has submitted improved results for Llama 2 70B and SDXL on the MI325X platform in this round using new optimization techniques. For a deeper look at these optimizations, see ourTechnical Dive into AMD’s MLPerf Inference v5.1 Submission. Additionally, explore how we optimized Llama 3.1 405B through pruning and fine-tuning inSlim Down Your Llama: Pruning & Fine-Tuning for Maximum Performance. In addition, AMD has made submissions for the following workloads: