Movatterモバイル変換


[0]ホーム

URL:


PPTX, PDF4,716 views

Leverage the Speed of OpenCL™ with AMD Math Libraries

The document provides an overview of various heterogeneous math libraries, including clmathlibraries, clblas, clfft, acml, clmagma, and bolt, highlighting their functionalities and applications within OpenCL. It details the APIs, usage instructions, and community contributions such as Python and Julia wrappers for these libraries. Additionally, it discusses performance optimizations and the integration of these libraries into larger systems for improved computational efficiency.

Embed presentation

Downloaded 49 times
HETEROGENEOUS MATH LIBRARIESKENT KNOX12/16/2014
2 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014AGENDAclMATH‒clBLAS‒clFFTACMLclMAGMABoltLIBRARIES COVEREDA survey of available libraries
3 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLMATHLIBRARIESclMathLibraries is a github organization for OpenCL™math related subprojectshttps://github.com/clMathLibrariesCurrently hosting two subprojects: clBLAS & clFFT
Open SourceclBLAS
5 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS clBLAS implements the NetLib BLAS functionality with OpenCL‒ Level 3 – Matrix x Matrix operations, O( N^3 ), compute bound‒ Level 2 – Matrix x Vector operations, O( N^2 ), mostly memory bound‒ Level 1 – Vector x Vector operations, O( N ), memory bound The API is in the same style as NetLib, but appends OpenCL structures‒ clblasStatus clblasSgemm( clblasOrder order, clblasTranspose transA,clblasTranspose transB, size_t M, size_t N, size_t K, cl_float alpha, constcl_mem A, size_t offA, size_t lda, const cl_mem B, size_t offB, size_t ldb,cl_float beta, cl_mem C, size_t offC, size_t ldc, cl_uint numCommandQueues,cl_command_queue* commandQueues, cl_uint numEventsInWaitList, const cl_event*eventWaitList, cl_event* events ) clBLAS assumes that the user is comfortable with OpenCL programming‒ The host code is responsible for detecting /choosing devices, transferring memory and synchronizingoperationsAPI
6 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS A proof of concept Python wrapper for clBLAS started, but only sgemm wrapped‒https://github.com/clMathLibraries/clBLAS/tree/master/src/wrappers/python‒Based on Cython‒Works with PyOpenCL to manage OpenCL state‒Would love help from the community to finish this The community wrote a Julia wrapper for clBLAS‒https://github.com/JuliaGPU/CLBLAS.jlAPI
7 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS• The user is responsible for running the tool on their machineas a preprocessing step• The tool creates a kernel database file (.kdb) that contains the bestperforming kernel for a given BLAS routine• The .kdb file is specific to an OpenCL device; will be named afterthat device; e.g. tahiti.kdb• Example• export CLBLAS_STORAGE_PATH = /usr/local/lib• ./tune --gemm --doubleclBLAS contains a Tune tool for findingbetter OpenCL kernels
Open SourceclFFT
9 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT clFFT implements an FFTW inspired interface with OpenCL‒ Provides a fast and accurate platform for calculating discrete FFTs‒ Supports 1D, 2D, and 3D transforms with a batch size that can be greater than 1‒ Supports dimension lengths that can be any mix of powers of 2, 3, and 5‒ Supports single and double precision floating point formats clFFT assumes that the user is comfortable with OpenCL programming‒ The host code is responsible for detecting/choosing devices, transferring memory and synchronizingoperations The community wrote a Python wrapper for clFFT‒https://github.com/geggo/gpyfft The community wrote a Julia wrapper for clFFT‒https://github.com/JuliaGPU/CLFFT.jlAPI
10 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT• Users set all FFT state in an FFT plan object when initializing• Call ‘BakePlan’ using the plan object to tell the library to JIT andcompile the kernel outside of performance sensitive loops• Reuse those plans as much as possible!clFFT contains the concept of ‘plans’,which allows the library to tune OpenCLkernels at runtime
11 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFTPERFORMANCE clFFT v2.3.1 included in ACMLv6.1 This version containsoptimizations not yet pushedinto public github repo You can use the clFFT.h headerfile from GitHub to compileyour application, then use thebinary from ACML Benchmark system 64bit Linux FirePro W9100 Catalyst Pro14.301.1010 AMD A10-7850K
ACML 6
13 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014ACML 6 INTRODUCES HETEROGENEOUS COMPUTE OpenCL can be a difficult language to learn‒ There exists legacy applications that won’t be ported to OpenCL‒ They might be willing to sacrifice peak performance for programportability ACML 6 includes clBLAS & clFFT as new backends‒ ACML hides all OpenCL programming from end users‒ Client programs do not need to change at all; they only relink ACML 6 When ACML determines that a particular BLAS or FFT call willgain benefit from offloading computation, it will do so withoutknowledge of the client programLEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCLACML 6 keeps the same API!
14 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014NEW FFTW WRAPPERACML 6 now ships with fftw.hFFTW programs could link with ACML 6 to offloadcomputation onto OpenCL devicesNo changes in host code required!
15 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014ACMLSCRIPT• The scripting language uses Lua, with custom ACML callbackfunctions• http://www.lua.org/• Refer to chapter 7 of the ACML documentation for moreinformation on how to modify or create your own scriptsACML includes a new scriptinglanguage that expresses the logicACML uses to offload computation
16 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014ACMLSCRIPT: 3-PART VIDEO TUTORIALSACMLScript: Part 1ACMLScript: Part 2ACMLScript: Part 3HTTPS://WWW.YOUTUBE.COM/USER/AMDDEVCENTRAL
17 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014ACML- HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFTPERFORMANCE ACML v6.0 sgemm Slightly old at this time Notice that the green line isequivalent to Max( blue, red ) ACML loads the hostprocessor if the problemis too small to benefitfrom GPU acceleration Benchmark system AMD A10-7850K CPU & GPU 64bit Linux Catalyst 14.301.1001
Open SourceclMagma
19 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLMAGMAclMAGMA implements LAPACK functionality withOpenCL accelerationhttps://bitbucket.org/icl/clmagmaMaintained by the University of Tennessee Knoxville
20 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLMAGMA The newest v1.3 Supports‒ LU, QR and Cholesky factorizations‒ Linear and least squares solvers‒ Reductions to Hessenberg, bidiagonal and tridiagonal forms‒ Eigen and singular value problem solvers‒ Orthogonal transformation routines clMagma uses clBLAS as the GPU compute backend‒ It currently provides static load balancing between CPU & GPU cores Multi-GPU supportLEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCLv1.3 adds support for Windows andMac OSX
Open SourceBolt
22 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014BOLTBolt implements parallel C++ STL functionality withAMP & OpenCL accelerationBolt on GitHubMaintained by AMD
23 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014BOLT Bolt provides containers and algorithms that enable clients toaccelerate C++ code with minimal GPU knowledge‒ Sorts‒ Reductions‒ Transforms‒ Scans Through control structures, clients control where data isallocated and computed (minimal knowledge of AMP or OpenCLis helpful here) Bolt provides support for both OpenCL & C++ AMP pathsPARALLEL STLBolt provides containers such asbolt::device_vector<>
24 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014BOLT#include <bolt/cl/device_vector.h>#include <bolt/cl/scan.h>#include <vector>#include <numeric>int main(){size_t length = 1024;// Create device_vector and initialize it to 1bolt::cl::device_vector< int > boltInput( length, 1 );// Calculate the inclusive_scan of the device_vectorbolt::cl::inclusive_scan(boltInput.begin(),boltInput.end(),boltInput.begin( ) );// Create an std vector and initialize it to 1std::vector< int > stdInput( length, 1 );// Calculate the inclusive_scan of the std vectorbolt::cl::inclusive_scan(stdInput.begin( ),stdInput.end( ),stdInput.begin( ) );return 0;}EXAMPLE CODE
25 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014Q&A & CONTACT INFOFor More Info:Follow us on Twitter: @AMDDevCentralVisit our forums: http://devgurus.amd.com/welcomeVisit our website: www.developer.amd.comWatch the replay: www.youtube.com/user/AMDDevCentralDownload the presentation: www.slideshare.net/DevCentralAMD
26 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014DISCLAIMER & ATTRIBUTIONThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmapchanges, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, softwarechanges, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMDreserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person ofsuch revisions or changes.AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANYINACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BELIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.ATTRIBUTION© 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

Recommended

PDF
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
PDF
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
PDF
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PPSX
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
PDF
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
PDF
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
PPSX
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
PDF
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
PDF
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PDF
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
PDF
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
PDF
HSA-4123, HSA Memory Model, by Ben Gaster
PPSX
Gcn performance ftw by stephan hodes
PDF
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PDF
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PDF
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PDF
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PPSX
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
PDF
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PDF
PostgreSQL with OpenCL
PDF
Final lisa opening_keynote_draft_-_v12.1tb
 
PDF
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PPTX
Media SDK Webinar 2014
PPT
Webinar: Whats New in Java 8 with Develop Intelligence
PDF
Utilizing AMD GPUs: Tuning, programming models, and roadmap
PDF
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
PDF
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
PDF
DirectGMA on AMD’S FirePro™ GPUS
PPTX
Introduction to Node.js
PPTX
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah

More Related Content

PDF
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
PDF
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
PDF
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
PPSX
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
PDF
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
PDF
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
PPSX
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
PDF
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, ...
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...

What's hot

PDF
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PDF
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
PDF
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
PDF
HSA-4123, HSA Memory Model, by Ben Gaster
PPSX
Gcn performance ftw by stephan hodes
PDF
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PDF
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PDF
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PDF
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PPSX
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
PDF
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PDF
PostgreSQL with OpenCL
PDF
Final lisa opening_keynote_draft_-_v12.1tb
 
PDF
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
PPTX
Media SDK Webinar 2014
PPT
Webinar: Whats New in Java 8 with Develop Intelligence
PDF
Utilizing AMD GPUs: Tuning, programming models, and roadmap
PDF
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
PDF
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
PDF
DirectGMA on AMD’S FirePro™ GPUS
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
HSA-4123, HSA Memory Model, by Ben Gaster
Gcn performance ftw by stephan hodes
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PostgreSQL with OpenCL
Final lisa opening_keynote_draft_-_v12.1tb
 
PL-4051, An Introduction to SPIR for OpenCL Application Developers and Compil...
Media SDK Webinar 2014
Webinar: Whats New in Java 8 with Develop Intelligence
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
DirectGMA on AMD’S FirePro™ GPUS

Viewers also liked

PPTX
Introduction to Node.js
PPTX
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
PDF
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
PPSX
Introduction to Direct 3D 12 by Ivan Nevraev
PDF
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
PPSX
Getting the-best-out-of-d3 d12
PDF
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
PPTX
OpenCL Heterogeneous Parallel Computing
PDF
OpenCL Programming 101
PDF
Introduction to OpenCL
PPSX
Inside XBox- One, by Martin Fuller
PPSX
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
PPSX
TressFX The Fast and The Furry by Nicolas Thibieroz
PPTX
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
PPSX
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
PDF
Introduction to OpenCL, 2010
PPTX
Hands on OpenCL
PDF
Reactive Design Patterns — J on the Beach
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Introduction to Node.js
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Introduction to Direct 3D 12 by Ivan Nevraev
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
Getting the-best-out-of-d3 d12
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
OpenCL Heterogeneous Parallel Computing
OpenCL Programming 101
Introduction to OpenCL
Inside XBox- One, by Martin Fuller
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
TressFX The Fast and The Furry by Nicolas Thibieroz
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Introduction to OpenCL, 2010
Hands on OpenCL
Reactive Design Patterns — J on the Beach
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14

Similar to Leverage the Speed of OpenCL™ with AMD Math Libraries

PDF
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
PPTX
MattsonTutorialSC14.pptx
PDF
MattsonTutorialSC14.pdf
PDF
Introduction to OpenCL By Hammad Ghulam Mustafa
PDF
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PDF
Gpus graal
PDF
C++ amp on linux
PDF
clWrap: Nonsense free control of your GPU
PDF
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PDF
Open CL For Speedup Workshop
PDF
main
PDF
Open CL For Haifa Linux Club
PPTX
OpenCL Overview Japan Virtual Open House Feb 2021
PDF
Cuda 6 performance_report
PDF
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
PPTX
General Programming on the GPU - Confoo
PDF
High performance GPU computing with Ruby
PDF
Arvindsujeeth scaladays12
PDF
SDAccel Design Contest: Xilinx SDAccel
PDF
LCU13: GPGPU on ARM Experience Report
 
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pdf
Introduction to OpenCL By Hammad Ghulam Mustafa
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
Gpus graal
C++ amp on linux
clWrap: Nonsense free control of your GPU
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
Open CL For Speedup Workshop
main
Open CL For Haifa Linux Club
OpenCL Overview Japan Virtual Open House Feb 2021
Cuda 6 performance_report
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
General Programming on the GPU - Confoo
High performance GPU computing with Ruby
Arvindsujeeth scaladays12
SDAccel Design Contest: Xilinx SDAccel
LCU13: GPGPU on ARM Experience Report
 

More from AMD Developer Central

PPSX
Inside XBOX ONE by Martin Fuller
PDF
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
PPSX
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
PPSX
Mantle - Introducing a new API for Graphics - AMD at GDC14
PPSX
Direct3D and the Future of Graphics APIs - AMD at GDC14
PDF
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
PDF
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
PDF
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
PDF
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...
Inside XBOX ONE by Martin Fuller
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle - Introducing a new API for Graphics - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
Keynote (Tony King-Smith) - Silicon? Check. HSA? Check. All done? Wrong! - by...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...

Recently uploaded

PDF
Oracle MySQL HeatWave - One Page - Version 3
PDF
Supervised Machine Learning Approaches for Log-Based Anomaly Detection: A Cas...
PDF
Oracle MySQL HeatWave - Complete - Version 3
PDF
Rolling out Enterprise AI: Tools, Insights, and Team Empowerment
PDF
[BDD 2025 - Mobile Development] Exploring Apple’s On-Device FoundationModels
PDF
So You Want to Work at Google | DevFest Seattle 2025
PDF
Crane Accident Prevention Guide: Key OSHA Regulations for Safer Operations
PDF
Cheryl Hung, Vibe Coding Auth Without Melting Down! isaqb Software Architectu...
PDF
Transcript: The partnership effect: Libraries and publishers on collaborating...
PDF
[BDD 2025 - Full-Stack Development] Agentic AI Architecture: Redefining Syste...
PDF
Open Source Post-Quantum Cryptography - Matt Caswell
PDF
Transforming Supply Chains with Amazon Bedrock AgentCore (AWS Swiss User Grou...
PDF
The partnership effect: Libraries and publishers on collaborating and thrivin...
PDF
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
PPTX
kernel PPT (Explanation of Windows Kernal).pptx
PPTX
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
PDF
Cybersecurity Prevention and Detection: Unit 2
PPTX
Guardrails in Action - Ensuring Safe AI with Azure AI Content Safety.pptx
PDF
10 Best Automation QA Testing Software Tools in 2025.pdf
PDF
Beyond Basics: How to Build Scalable, Intelligent Imagery Pipelines
Oracle MySQL HeatWave - One Page - Version 3
Supervised Machine Learning Approaches for Log-Based Anomaly Detection: A Cas...
Oracle MySQL HeatWave - Complete - Version 3
Rolling out Enterprise AI: Tools, Insights, and Team Empowerment
[BDD 2025 - Mobile Development] Exploring Apple’s On-Device FoundationModels
So You Want to Work at Google | DevFest Seattle 2025
Crane Accident Prevention Guide: Key OSHA Regulations for Safer Operations
Cheryl Hung, Vibe Coding Auth Without Melting Down! isaqb Software Architectu...
Transcript: The partnership effect: Libraries and publishers on collaborating...
[BDD 2025 - Full-Stack Development] Agentic AI Architecture: Redefining Syste...
Open Source Post-Quantum Cryptography - Matt Caswell
Transforming Supply Chains with Amazon Bedrock AgentCore (AWS Swiss User Grou...
The partnership effect: Libraries and publishers on collaborating and thrivin...
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
kernel PPT (Explanation of Windows Kernal).pptx
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
Cybersecurity Prevention and Detection: Unit 2
Guardrails in Action - Ensuring Safe AI with Azure AI Content Safety.pptx
10 Best Automation QA Testing Software Tools in 2025.pdf
Beyond Basics: How to Build Scalable, Intelligent Imagery Pipelines

Leverage the Speed of OpenCL™ with AMD Math Libraries

  • 1.
  • 2.
    2 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014AGENDAclMATH‒clBLAS‒clFFTACMLclMAGMABoltLIBRARIES COVEREDA survey of available libraries
  • 3.
    3 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014CLMATHLIBRARIESclMathLibraries is a github organization for OpenCL™math related subprojectshttps://github.com/clMathLibrariesCurrently hosting two subprojects: clBLAS & clFFT
  • 4.
  • 5.
    5 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS clBLAS implements the NetLib BLAS functionality with OpenCL‒ Level 3 – Matrix x Matrix operations, O( N^3 ), compute bound‒ Level 2 – Matrix x Vector operations, O( N^2 ), mostly memory bound‒ Level 1 – Vector x Vector operations, O( N ), memory bound The API is in the same style as NetLib, but appends OpenCL structures‒ clblasStatus clblasSgemm( clblasOrder order, clblasTranspose transA,clblasTranspose transB, size_t M, size_t N, size_t K, cl_float alpha, constcl_mem A, size_t offA, size_t lda, const cl_mem B, size_t offB, size_t ldb,cl_float beta, cl_mem C, size_t offC, size_t ldc, cl_uint numCommandQueues,cl_command_queue* commandQueues, cl_uint numEventsInWaitList, const cl_event*eventWaitList, cl_event* events ) clBLAS assumes that the user is comfortable with OpenCL programming‒ The host code is responsible for detecting /choosing devices, transferring memory and synchronizingoperationsAPI
  • 6.
    6 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS A proof of concept Python wrapper for clBLAS started, but only sgemm wrapped‒https://github.com/clMathLibraries/clBLAS/tree/master/src/wrappers/python‒Based on Cython‒Works with PyOpenCL to manage OpenCL state‒Would love help from the community to finish this The community wrote a Julia wrapper for clBLAS‒https://github.com/JuliaGPU/CLBLAS.jlAPI
  • 7.
    7 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS• The user is responsible for running the tool on their machineas a preprocessing step• The tool creates a kernel database file (.kdb) that contains the bestperforming kernel for a given BLAS routine• The .kdb file is specific to an OpenCL device; will be named afterthat device; e.g. tahiti.kdb• Example• export CLBLAS_STORAGE_PATH = /usr/local/lib• ./tune --gemm --doubleclBLAS contains a Tune tool for findingbetter OpenCL kernels
  • 8.
  • 9.
    9 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT clFFT implements an FFTW inspired interface with OpenCL‒ Provides a fast and accurate platform for calculating discrete FFTs‒ Supports 1D, 2D, and 3D transforms with a batch size that can be greater than 1‒ Supports dimension lengths that can be any mix of powers of 2, 3, and 5‒ Supports single and double precision floating point formats clFFT assumes that the user is comfortable with OpenCL programming‒ The host code is responsible for detecting/choosing devices, transferring memory and synchronizingoperations The community wrote a Python wrapper for clFFT‒https://github.com/geggo/gpyfft The community wrote a Julia wrapper for clFFT‒https://github.com/JuliaGPU/CLFFT.jlAPI
  • 10.
    10 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT• Users set all FFT state in an FFT plan object when initializing• Call ‘BakePlan’ using the plan object to tell the library to JIT andcompile the kernel outside of performance sensitive loops• Reuse those plans as much as possible!clFFT contains the concept of ‘plans’,which allows the library to tune OpenCLkernels at runtime
  • 11.
    11 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFTPERFORMANCE clFFT v2.3.1 included in ACMLv6.1 This version containsoptimizations not yet pushedinto public github repo You can use the clFFT.h headerfile from GitHub to compileyour application, then use thebinary from ACML Benchmark system 64bit Linux FirePro W9100 Catalyst Pro14.301.1010 AMD A10-7850K
  • 12.
  • 13.
    13 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014ACML 6 INTRODUCES HETEROGENEOUS COMPUTE OpenCL can be a difficult language to learn‒ There exists legacy applications that won’t be ported to OpenCL‒ They might be willing to sacrifice peak performance for programportability ACML 6 includes clBLAS & clFFT as new backends‒ ACML hides all OpenCL programming from end users‒ Client programs do not need to change at all; they only relink ACML 6 When ACML determines that a particular BLAS or FFT call willgain benefit from offloading computation, it will do so withoutknowledge of the client programLEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCLACML 6 keeps the same API!
  • 14.
    14 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014NEW FFTW WRAPPERACML 6 now ships with fftw.hFFTW programs could link with ACML 6 to offloadcomputation onto OpenCL devicesNo changes in host code required!
  • 15.
    15 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014ACMLSCRIPT• The scripting language uses Lua, with custom ACML callbackfunctions• http://www.lua.org/• Refer to chapter 7 of the ACML documentation for moreinformation on how to modify or create your own scriptsACML includes a new scriptinglanguage that expresses the logicACML uses to offload computation
  • 16.
    16 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014ACMLSCRIPT: 3-PART VIDEO TUTORIALSACMLScript: Part 1ACMLScript: Part 2ACMLScript: Part 3HTTPS://WWW.YOUTUBE.COM/USER/AMDDEVCENTRAL
  • 17.
    17 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014ACML- HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFTPERFORMANCE ACML v6.0 sgemm Slightly old at this time Notice that the green line isequivalent to Max( blue, red ) ACML loads the hostprocessor if the problemis too small to benefitfrom GPU acceleration Benchmark system AMD A10-7850K CPU & GPU 64bit Linux Catalyst 14.301.1001
  • 18.
  • 19.
    19 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014CLMAGMAclMAGMA implements LAPACK functionality withOpenCL accelerationhttps://bitbucket.org/icl/clmagmaMaintained by the University of Tennessee Knoxville
  • 20.
    20 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014CLMAGMA The newest v1.3 Supports‒ LU, QR and Cholesky factorizations‒ Linear and least squares solvers‒ Reductions to Hessenberg, bidiagonal and tridiagonal forms‒ Eigen and singular value problem solvers‒ Orthogonal transformation routines clMagma uses clBLAS as the GPU compute backend‒ It currently provides static load balancing between CPU & GPU cores Multi-GPU supportLEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCLv1.3 adds support for Windows andMac OSX
  • 21.
  • 22.
    22 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014BOLTBolt implements parallel C++ STL functionality withAMP & OpenCL accelerationBolt on GitHubMaintained by AMD
  • 23.
    23 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014BOLT Bolt provides containers and algorithms that enable clients toaccelerate C++ code with minimal GPU knowledge‒ Sorts‒ Reductions‒ Transforms‒ Scans Through control structures, clients control where data isallocated and computed (minimal knowledge of AMP or OpenCLis helpful here) Bolt provides support for both OpenCL & C++ AMP pathsPARALLEL STLBolt provides containers such asbolt::device_vector<>
  • 24.
    24 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014BOLT#include <bolt/cl/device_vector.h>#include <bolt/cl/scan.h>#include <vector>#include <numeric>int main(){size_t length = 1024;// Create device_vector and initialize it to 1bolt::cl::device_vector< int > boltInput( length, 1 );// Calculate the inclusive_scan of the device_vectorbolt::cl::inclusive_scan(boltInput.begin(),boltInput.end(),boltInput.begin( ) );// Create an std vector and initialize it to 1std::vector< int > stdInput( length, 1 );// Calculate the inclusive_scan of the std vectorbolt::cl::inclusive_scan(stdInput.begin( ),stdInput.end( ),stdInput.begin( ) );return 0;}EXAMPLE CODE
  • 25.
    25 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014Q&A & CONTACT INFOFor More Info:Follow us on Twitter: @AMDDevCentralVisit our forums: http://devgurus.amd.com/welcomeVisit our website: www.developer.amd.comWatch the replay: www.youtube.com/user/AMDDevCentralDownload the presentation: www.slideshare.net/DevCentralAMD
  • 26.
    26 | HETEROGENEOUSMATH LIBRARIES | DECEMBER 16, 2014DISCLAIMER & ATTRIBUTIONThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmapchanges, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, softwarechanges, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMDreserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person ofsuch revisions or changes.AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANYINACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BELIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.ATTRIBUTION© 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

[8]ページ先頭

©2009-2025 Movatter.jp