Movatterモバイル変換

Downloaded 49 times

$HETEROGENEOUS MATH LIBRARIESKENT KNOX12/16/2014$

$2 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014AGENDAclMATH‒clBLAS‒clFFTACMLclMAGMABoltLIBRARIES COVEREDA survey of available libraries$

$3 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLMATHLIBRARIESclMathLibraries is a github organization for OpenCL™math related subprojectshttps://github.com/clMathLibrariesCurrently hosting two subprojects: clBLAS & clFFT$

$Open SourceclBLAS$

$5 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS clBLAS implements the NetLib BLAS functionality with OpenCL‒ Level 3 – Matrix x Matrix operations, O( N^3 ), compute bound‒ Level 2 – Matrix x Vector operations, O( N^2 ), mostly memory bound‒ Level 1 – Vector x Vector operations, O( N ), memory bound The API is in the same style as NetLib, but appends OpenCL structures‒ clblasStatus clblasSgemm( clblasOrder order, clblasTranspose transA,clblasTranspose transB, size_t M, size_t N, size_t K, cl_float alpha, constcl_mem A, size_t offA, size_t lda, const cl_mem B, size_t offB, size_t ldb,cl_float beta, cl_mem C, size_t offC, size_t ldc, cl_uint numCommandQueues,cl_command_queue* commandQueues, cl_uint numEventsInWaitList, const cl_event*eventWaitList, cl_event* events ) clBLAS assumes that the user is comfortable with OpenCL programming‒ The host code is responsible for detecting /choosing devices, transferring memory and synchronizingoperationsAPI$

$6 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS A proof of concept Python wrapper for clBLAS started, but only sgemm wrapped‒https://github.com/clMathLibraries/clBLAS/tree/master/src/wrappers/python‒Based on Cython‒Works with PyOpenCL to manage OpenCL state‒Would love help from the community to finish this The community wrote a Julia wrapper for clBLAS‒https://github.com/JuliaGPU/CLBLAS.jlAPI$

$7 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLBLAS - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLBLAS• The user is responsible for running the tool on their machineas a preprocessing step• The tool creates a kernel database file (.kdb) that contains the bestperforming kernel for a given BLAS routine• The .kdb file is specific to an OpenCL device; will be named afterthat device; e.g. tahiti.kdb• Example• export CLBLAS_STORAGE_PATH = /usr/local/lib• ./tune --gemm --doubleclBLAS contains a Tune tool for findingbetter OpenCL kernels$

$Open SourceclFFT$

$9 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT clFFT implements an FFTW inspired interface with OpenCL‒ Provides a fast and accurate platform for calculating discrete FFTs‒ Supports 1D, 2D, and 3D transforms with a batch size that can be greater than 1‒ Supports dimension lengths that can be any mix of powers of 2, 3, and 5‒ Supports single and double precision floating point formats clFFT assumes that the user is comfortable with OpenCL programming‒ The host code is responsible for detecting/choosing devices, transferring memory and synchronizingoperations The community wrote a Python wrapper for clFFT‒https://github.com/geggo/gpyfft The community wrote a Julia wrapper for clFFT‒https://github.com/JuliaGPU/CLFFT.jlAPI$

$10 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFT• Users set all FFT state in an FFT plan object when initializing• Call ‘BakePlan’ using the plan object to tell the library to JIT andcompile the kernel outside of performance sensitive loops• Reuse those plans as much as possible!clFFT contains the concept of ‘plans’,which allows the library to tune OpenCLkernels at runtime$

$11 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLFFT - HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFTPERFORMANCE clFFT v2.3.1 included in ACMLv6.1 This version containsoptimizations not yet pushedinto public github repo You can use the clFFT.h headerfile from GitHub to compileyour application, then use thebinary from ACML Benchmark system 64bit Linux FirePro W9100 Catalyst Pro14.301.1010 AMD A10-7850K$

$ACML 6$

$13 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014ACML 6 INTRODUCES HETEROGENEOUS COMPUTE OpenCL can be a difficult language to learn‒ There exists legacy applications that won’t be ported to OpenCL‒ They might be willing to sacrifice peak performance for programportability ACML 6 includes clBLAS & clFFT as new backends‒ ACML hides all OpenCL programming from end users‒ Client programs do not need to change at all; they only relink ACML 6 When ACML determines that a particular BLAS or FFT call willgain benefit from offloading computation, it will do so withoutknowledge of the client programLEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCLACML 6 keeps the same API!$

$14 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014NEW FFTW WRAPPERACML 6 now ships with fftw.hFFTW programs could link with ACML 6 to offloadcomputation onto OpenCL devicesNo changes in host code required!$

$15 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014ACMLSCRIPT• The scripting language uses Lua, with custom ACML callbackfunctions• http://www.lua.org/• Refer to chapter 7 of the ACML documentation for moreinformation on how to modify or create your own scriptsACML includes a new scriptinglanguage that expresses the logicACML uses to offload computation$

$16 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014ACMLSCRIPT: 3-PART VIDEO TUTORIALSACMLScript: Part 1ACMLScript: Part 2ACMLScript: Part 3HTTPS://WWW.YOUTUBE.COM/USER/AMDDEVCENTRAL$

$17 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014ACML- HTTPS://GITHUB.COM/CLMATHLIBRARIES/CLFFTPERFORMANCE ACML v6.0 sgemm Slightly old at this time Notice that the green line isequivalent to Max( blue, red ) ACML loads the hostprocessor if the problemis too small to benefitfrom GPU acceleration Benchmark system AMD A10-7850K CPU & GPU 64bit Linux Catalyst 14.301.1001$

$Open SourceclMagma$

$19 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLMAGMAclMAGMA implements LAPACK functionality withOpenCL accelerationhttps://bitbucket.org/icl/clmagmaMaintained by the University of Tennessee Knoxville$

$20 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014CLMAGMA The newest v1.3 Supports‒ LU, QR and Cholesky factorizations‒ Linear and least squares solvers‒ Reductions to Hessenberg, bidiagonal and tridiagonal forms‒ Eigen and singular value problem solvers‒ Orthogonal transformation routines clMagma uses clBLAS as the GPU compute backend‒ It currently provides static load balancing between CPU & GPU cores Multi-GPU supportLEVERAGING CLMATH LIBRARIES TO ACCELERATE WITH OPENCLv1.3 adds support for Windows andMac OSX$

$Open SourceBolt$

$22 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014BOLTBolt implements parallel C++ STL functionality withAMP & OpenCL accelerationBolt on GitHubMaintained by AMD$

$23 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014BOLT Bolt provides containers and algorithms that enable clients toaccelerate C++ code with minimal GPU knowledge‒ Sorts‒ Reductions‒ Transforms‒ Scans Through control structures, clients control where data isallocated and computed (minimal knowledge of AMP or OpenCLis helpful here) Bolt provides support for both OpenCL & C++ AMP pathsPARALLEL STLBolt provides containers such asbolt::device_vector<>$

$24 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014BOLT#include <bolt/cl/device_vector.h>#include <bolt/cl/scan.h>#include <vector>#include <numeric>int main(){size_t length = 1024;// Create device_vector and initialize it to 1bolt::cl::device_vector< int > boltInput( length, 1 );// Calculate the inclusive_scan of the device_vectorbolt::cl::inclusive_scan(boltInput.begin(),boltInput.end(),boltInput.begin( ) );// Create an std vector and initialize it to 1std::vector< int > stdInput( length, 1 );// Calculate the inclusive_scan of the std vectorbolt::cl::inclusive_scan(stdInput.begin( ),stdInput.end( ),stdInput.begin( ) );return 0;}EXAMPLE CODE$

$25 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014Q&A & CONTACT INFOFor More Info:Follow us on Twitter: @AMDDevCentralVisit our forums: http://devgurus.amd.com/welcomeVisit our website: www.developer.amd.comWatch the replay: www.youtube.com/user/AMDDevCentralDownload the presentation: www.slideshare.net/DevCentralAMD$

$26 | HETEROGENEOUS MATH LIBRARIES | DECEMBER 16, 2014DISCLAIMER & ATTRIBUTIONThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmapchanges, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, softwarechanges, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMDreserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person ofsuch revisions or changes.AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANYINACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BELIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.ATTRIBUTION© 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.$

Movatterモバイル変換

Change Language

Leverage the Speed of OpenCL™ with AMD Math Libraries

Embed presentation

Recommended

More Related Content

What's hot

Viewers also liked

Similar to Leverage the Speed of OpenCL™ with AMD Math Libraries

More from AMD Developer Central

Recently uploaded

Leverage the Speed of OpenCL™ with AMD Math Libraries