Movatterモバイル変換


[0]ホーム

URL:


5,353 views

PostgreSQL with OpenCL

PgOpenCL is a new PostgreSQL procedural language that allows developers to write OpenCL kernels to harness the parallel processing power of GPUs. It introduces a new execution model where tables can be copied to arrays, passed to an OpenCL kernel for parallel operations on the GPU, and results copied back to tables. This unlock the potential for dramatically improved performance on compute-intensive database operations like joins, aggregations, and sorting.

Embed presentation

Downloaded 72 times
Introducing PgOpenCL        A New PostgreSQL       Procedural LanguageUnlocking the Power of the GPU!                By             Tim Child
BioTim Child• 35 years experience of software development• Formerly  •   VP Oracle Corporation  •   VP BEA Systems Inc.  •   VP Informix  •   Leader at Illustra, Autodesk, Navteq, Intuit, …• 30+ years experience in 3D, CAD, GIS and DBMS
TerminologyTerm                  DescriptionProcedure Language    Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )GPU                   Graphics Processing Unit (highly specialized CPU for graphics)GPGPU                 General Purpose GPU (non-graphics programming on a GPU)CUDA                  Nvidia’s GPU programming environmentAPU                   Accelerated Processing Unit      (AMD’s Hybrid CPU & GPU chip)ISO C99               Modern standard version of the C languageOpenCL                Open Compute LanguageOpenMP                Open Multi-Processing (parallelizing compilers)SIMD                  Single Instruction Multiple Data (Vector instructions )SSE                   x86, x64 (Intel, AMD) Streaming SIMD ExtensionsxPU                   Any Processing Unit device (CPU, GPU, APU)Kernel                Functions that execute on a OpenCL DeviceWork Item             Instance of a KernelWorkgroup             A group of Work ItemsFLOP                  Floating Point Operation (single = SQL real type )MIC                   Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
Some Technology Trends            Impacting DBMS• Solid State Storage    – Reduced Access Time, Lower Power, Increasing in capacity• Virtualization    – Server consolidation, Specialized VM’s, lowers direct costs• Cloud Computing    – EC2, Azure, … lowers capital requirements• Multi-Core    – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications• xPU (GPU/APU)    –   GPU >1000 Cores    –    > 1T FLOP /s @ €2500    –   APU = CPU + GPU Chip Hybrids due in Mid 2011    –   2 T FLOP /s for $2.10 per hour (AWS EC2)    –   Intel MIC “Knights Corner “ > 50 x86 Cores
Compute Intensive    xPU Database Applications•   Bioinformatics•   Signal/Audio/Image Processing/Video•   Data Mining & Analytics•   Searching•   Sorting•   Spatial Selections and Joins•   Map/Reduce•   Scientific Computing•   Many Others …
GPU vs CPUVendor           NVidia       ATI Radeon      IntelArchitecture     Fermi         Evergreen    NehalemCores              448           1600          4                  Simple        Simple      ComplexTransistors       3.1 B         2.15 B       731 MClock            1.5 G Hz      851 M Hz      3 G HzPeak Float       1500 G        2720 G         96 GPerformance      FLOP / s      FLOP / s     FLOP / sPeak Double       750 G         544 G         48 GPerformance      FLOP / s      FLOP / s     FLOP / sMemory          ~ 190 G / s   ~ 153 G / s   ~ 30 G / sBandwidthPower             250 W        > 250 W        80 WConsumptionSIMD / Vector     Many          Many         SSE4+Instructions
Multi-Core PerformanceSource NVidia
Future (Mid 2011)                 APU Based PCAPU (Accelerated Processing Unit)              APU Chip      CPU             CPU                 ~20 GB/s     System RAM         North Bridge        ~20 GB/s                                           APU’s                          PCIE ~12 GB/s                          PCIE ~12 GB/s                                                     Adds an Embedded      Embedded                                             GPU        GPU                   Discrete                                          150 GB/s     Graphic RAM                     GPU             Source AMD
Scalar vs. SIMDScalar Instruction          C=A+B                           1       +       2        =        3SIMD Instruction                              1       3       5         7                                                          +      Vector C = Vector A + Vector B          2       4       6        8                                                          =                                              3       7       11       15        OpenCL                  Vector lengths 2,4,8,16 for char, short, int, float, double
Summarizing xPU            Trends• Many more xPU Cores in our Future• Compute Environment becoming Hybrid  – CPU and GPU’s  – Need CPU to give access to GPU power• GPU Capabilities  – Lots of cores  – Vector/SIMD Instructions  – Fast Memory• GPU Futures  – Virtual Memory  – Multi-tasking / Pre-emption
Scaling PostgreSQL Queries                       on xPU’s            Multi-Core CPU                                           Many Core GPU PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL       PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL  Threads     Threads    Threads    Threads        Threads     Threads    Threads    Thread     Thread                                                   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCLPostgres                                            Threads    Threads    Threads    Thread     ThreadProcess                                                   PgOpenCL              PgOpenCL   PgOpenCL   PgOpenCL                                                              PgOpenCL                                                    Threads               Threads    Thread     Thread                                                               Threads                                              Using More                                              Transistors
Parallel      Programming SystemsCategory             CUDA     OpenMP       OpenCLLanguage               C      C, Fortran     CCross Platform         X          √           √Standard             Vendor   OpenMP       KhronosCPU                    X          √           √GPU                    √          X           √Clusters               X          √           XCompilation / Link   Static     Static     Dynamic
What is OpenCL?• OpenCL - Open Compute Language  –   Subset of C 99  –   Open Specification  –   Proposed by Apple  –   Many Companies Collaborated on the Specification  –   Portable, Device Agnostic  –   Specification maintained by Khronos Group• PgOpenCL  – OpenCL as a PostgreSQL Procedural Language
System Overview                                    DBMS Server                                                   PgOpenCL                                                    PgOpenCL  Web     HTTP     Web               SQL              SQL                                                       SQLBrowser           Server             Statement     Procedure                                                    Procedure                                                       PCIe X2 Bus                           TCP/IP                   App                                      PostgreSQL              GPGPU                  Server                                        Disk I/O     Tables                           TCP/IP          PostgreSQL            Client
OpenCL                       Language• A subset of ISO C99   – - But without some C99 features such as standard C99 headers,   – function pointers, recursion, variable length arrays, and bit fields• A superset of ISO C99 with additions for:   –   - Work-items and Workgroups   –   - Vector types   –   - Synchronization   –   - Address space qualifiers• Also includes a large set of built-in functions   – - Image manipulation   – - Work-item manipulation,   – - Specialized math routines, etc.
PgOpenCL             Components• New PostgreSQL Procedural Language  – Language handler     • Maps arguments     • Calls function     • Returns results  – Language validator     • Creates Function with parameter & syntax checking     • Compiles Function to a Binary format• New data types  – cl_double4, cl_double8, ….• System Admin Pseudo-Tables  – Platform, Device, Run-Time, …
PgOpenCL Admin
PGOpenCL                        Function DeclarationCREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[])AS $BODY$#pragma PGOPENCL Platform : ATI Stream#pragma PGOPENCL Device : CPU__kernel __attribute__((reqd_work_group_size(64, 1, 1)))void VectorAdd( __global const float *a, __global const float *b, __global float *c)  {    int i = get_global_id(0);      c[i] = a[i] + b[i];  }$BODY$Language PgOpenCL;
PgOpenCL                                   Execution Model            ATable            B            Select Table                    100’s - 1000’s of              to Array                      Threads (Kernels)                                        xPU                                           VectorAdd(A, B)        A           +        B                Returns C               =       C                            Copy                                                  Unnest Array                                                                 Copy               To Table            Table                C       C    C      C   C   C    C    C      C    C       C   C      C
Using               Re-Shaped Tables                       100’s - 1000’s of    Table of           Threads (Kernels)                  Table of     Arrays                                                Arrays                  A    +   B     =         CA                                                      C     C        C   CB                   xPU                      VectorAdd(A, B)                         Returns CA                                                      C     C        C   CB                Copy                                               Copy
Today’s GPGPU              Challenges• No Pre-emptive Multi-Tasking• No Virtual Memory• Limited Bandwidth to discrete GPGPU   – 1 – 8 G/s over PCIe Bus• Hard to Program   – New Parallel Algorithms and constructs   – “New” C language dialect• Immature Tools   – Compilers, IDE, Debuggers, Profilers - early years• Data organization really matters   – Types, Structure, and Alignment   – SQL needs to Shape the Data• Profiling and Debugging is not easySolves Well for Problem Sets with the Right Shape!
Making a Problem                           Work for You        • Determine % Parallelism Possiblefor ( i = 0, i <  ∞, i++)            for ( j = 0; j < ∞; j++ )                      for ( k = 0; k <   ∞; k++ )        • Arrange data to fit available GPU RAM        •    Ensure calculation time >> I/O transfer overhead        •    Learn about Parallel Algorithms and the OpenCL language        •    Learn new tools        •    Carefully choose Data Types, Organization and Alignments        •    Profile and Measure at Every Stage
PgOpenCL     System Requirements• PostgreSQL 9.x• For GPU’s   – AMD ATI OpenCL Stream SDK 2.x   – NVidia CUDA 3.x SDK   – Recent Macs with O/S 11.6• For CPU’s (Pentium M or more recent)   – AMD ATI OpenCL Stream SDK 2.x   – Intel OpenCL SDK Alpha Release (x86)   – Recent Macs with O/S 11.6
PGOpenCL                                   Status    Today        1Q 2011  Prototype       Beta     2010             2011• Wish List       • Beta Testers              – Existing OpenCL App?              – Have a GPU App?       • Contributors              – Code server side functions?       • Sponsors & Supporters           – AMD Fusion Fund?           – Khronos?
PgOpenCL               Future Plans• Increase Platform Support• Scatter/Gather Functions• Additional Type Support   – Image Types   – Sparse Matrices• Run-Time   –   Asynchronous   –   Events   –   Profiling   –   Debugging
Using the                                Whole Brain                        APU ChipPgOpenCl                           PgOpenCl  PgOpenCL                           PgOpenCL                 CPU         CPU                    CPU      Postgres                                  You can’t be in a                                                parallel universe                                                  with a single                                                     brain!                 North Bridge             ~20 GB/s                                                 • Heterogeneous Compute Environments                          PgOpenCl                            PgOpenCl                  • CPU’s, GPU’s, APU’s             Embedded         PgOpenCl                • Expect 100’s – 1000’s of cores                                PgOpenCl               GPU                PgOpenCL             The Future Is Parallel: What's a Programmer to Do?
Summarizing              PgOpenCL• Supports Heterogeneous Parallel Compute Environments    • CPU’s, GPU’s, APU’s• OpenCL    • Portable and high-performance framework        –Ideal for computationally intensive algorithms        –Access to all compute resources (CPU, APU, GPU)        –Well-defined computation/memory model    •Efficient parallel programming language        –C99 with extensions for task and data parallelism        –Rich set of built-in functions    •Open standard for heterogeneous parallel computing• PgOpenCL   • Integrates PostgreSQL with OpenCL   • Provides Easy SQL Access to xPU’s       • APU, CPU, GPGPU   • Integrates OpenCL       • SQL + Web Apps(PHP, Ruby, … )
More                    Information•   PGOpenCL        • Twitter @3DMashUp•   OpenCL• www.khronos.org/opencl/• www.amd.com/us/products/technologies/stream-technology/opencl/• http://software.intel.com/en-us/articles/intel-opencl-sdk• http://www.nvidia.com/object/cuda_opencl_new.html• http://developer.apple.com/technologies/mac/snowleopard/opencl.html
Q&A• Using Parallel Applications?• Benefits of OpenCL / PgOpenCL?• Want to Collaborate on PgOpenCL?

Recommended

PDF
GPGPU Accelerates PostgreSQL (English)
PDF
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PDF
SQL+GPU+SSD=∞ (English)
PDF
PG-Strom - A FDW module utilizing GPU device
PDF
PG-Strom - GPU Accelerated Asyncr
PDF
PG-Strom
PDF
20160407_GTC2016_PgSQL_In_Place
PDF
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
PPTX
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
PDF
20170602_OSSummit_an_intelligent_storage
PPTX
GPGPU programming with CUDA
PDF
20190909_PGconf.ASIA_KaiGai
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PDF
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
PDF
pgconfasia2016 plcuda en
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
20150318-SFPUG-Meetup-PGStrom
PDF
Using GPUs to handle Big Data with Java by Adam Roberts.
PDF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PDF
Transparent GPU Exploitation for Java
PDF
20171206 PGconf.ASIA LT gstore_fdw
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PDF
Easy and High Performance GPU Programming for Java Programmers
PDF
Making Hardware Accelerator Easier to Use
PDF
20181212 - PGconfASIA - LT - English
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
An Intelligent Storage?

More Related Content

PDF
GPGPU Accelerates PostgreSQL (English)
PDF
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PDF
SQL+GPU+SSD=∞ (English)
PDF
PG-Strom - A FDW module utilizing GPU device
PDF
PG-Strom - GPU Accelerated Asyncr
PDF
PG-Strom
PDF
20160407_GTC2016_PgSQL_In_Place
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
SQL+GPU+SSD=∞ (English)
PG-Strom - A FDW module utilizing GPU device
PG-Strom - GPU Accelerated Asyncr
PG-Strom
20160407_GTC2016_PgSQL_In_Place

What's hot

PDF
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
PPTX
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
PDF
20170602_OSSummit_an_intelligent_storage
PPTX
GPGPU programming with CUDA
PDF
20190909_PGconf.ASIA_KaiGai
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PDF
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
PDF
pgconfasia2016 plcuda en
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
20150318-SFPUG-Meetup-PGStrom
PDF
Using GPUs to handle Big Data with Java by Adam Roberts.
PDF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PDF
Transparent GPU Exploitation for Java
PDF
20171206 PGconf.ASIA LT gstore_fdw
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PDF
Easy and High Performance GPU Programming for Java Programmers
PDF
Making Hardware Accelerator Easier to Use
PDF
20181212 - PGconfASIA - LT - English
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
20170602_OSSummit_an_intelligent_storage
GPGPU programming with CUDA
20190909_PGconf.ASIA_KaiGai
PG-Strom v2.0 Technical Brief (17-Apr-2018)
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Let's turn your PostgreSQL into columnar store with cstore_fdw
pgconfasia2016 plcuda en
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
20150318-SFPUG-Meetup-PGStrom
Using GPUs to handle Big Data with Java by Adam Roberts.
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
Transparent GPU Exploitation for Java
20171206 PGconf.ASIA LT gstore_fdw
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Easy and High Performance GPU Programming for Java Programmers
Making Hardware Accelerator Easier to Use
20181212 - PGconfASIA - LT - English

Viewers also liked

PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
An Intelligent Storage?
PDF
Convolutional Neural Networks (CNN)
PDF
Task Parallel Library (TPL)
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
PDF
TPL Dataflow – зачем и для кого?
PDF
20170127 JAWS HPC-UG#8
PPTX
Task Parallel Library 2014
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
An Intelligent Storage?
Convolutional Neural Networks (CNN)
Task Parallel Library (TPL)
Convolutional Neural Network (CNN) presentation from theory to code in Theano
TPL Dataflow – зачем и для кого?
20170127 JAWS HPC-UG#8
Task Parallel Library 2014

Similar to PostgreSQL with OpenCL

PDF
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
PPT
Current Trends in HPC
PDF
Trip down the GPU lane with Machine Learning
PDF
The Rise of Parallel Computing
PPTX
Cuda Architecture
PDF
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
PDF
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
PDF
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
PPTX
Gpgpu intro
PDF
PDF
Programming the PS3
PDF
The road to multi/many core computing
PDF
Using GPUs for parallel processing
 
PDF
Computing using GPUs
PPT
Vpu technology &gpgpu computing
PDF
[05][cuda 및 fermi 최적화 기술] hryu optimization
PDF
OpenCL & the Future of Desktop High Performance Computing in CAD
PDF
Accelerating Real Time Applications on Heterogeneous Platforms
 
PPTX
Gpu archi
PDF
Open CL For Haifa Linux Club
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Current Trends in HPC
Trip down the GPU lane with Machine Learning
The Rise of Parallel Computing
Cuda Architecture
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
Gpgpu intro
Programming the PS3
The road to multi/many core computing
Using GPUs for parallel processing
 
Computing using GPUs
Vpu technology &gpgpu computing
[05][cuda 및 fermi 최적화 기술] hryu optimization
OpenCL & the Future of Desktop High Performance Computing in CAD
Accelerating Real Time Applications on Heterogeneous Platforms
 
Gpu archi
Open CL For Haifa Linux Club

Recently uploaded

PDF
Parallel Computing BCS702 Module notes of the vtu college 7th sem 4.pdf
PDF
Mulesoft Meetup Online Portuguese: MCP e IA
PDF
Open Source Post-Quantum Cryptography - Matt Caswell
PDF
[BDD 2025 - Full-Stack Development] Agentic AI Architecture: Redefining Syste...
PDF
[DevFest Strasbourg 2025] - NodeJs Can do that !!
PPTX
UFCD 0797 - SISTEMAS OPERATIVOS_Unidade Completa.pptx
PDF
Agentic Intro and Hands-on: Build your first Coded Agent
PDF
[BDD 2025 - Mobile Development] Crafting Immersive UI with E2E and AGSL Shade...
PPTX
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
PPTX
Leon Brands - Intro to GPU Occlusion (Graphics Programming Conference 2024)
PDF
Dev Dives: Build smarter agents with UiPath Agent Builder
PPTX
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
PDF
Transcript: The partnership effect: Libraries and publishers on collaborating...
PDF
The Evolving Role of the CEO in the Age of AI
PDF
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
PDF
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
PDF
10 Best Automation QA Testing Software Tools in 2025.pdf
PDF
[BDD 2025 - Artificial Intelligence] AI for the Underdogs: Innovation for Sma...
PDF
Integrating AI with Meaningful Human Collaboration
PDF
How Much Does It Cost To Build Software
Parallel Computing BCS702 Module notes of the vtu college 7th sem 4.pdf
Mulesoft Meetup Online Portuguese: MCP e IA
Open Source Post-Quantum Cryptography - Matt Caswell
[BDD 2025 - Full-Stack Development] Agentic AI Architecture: Redefining Syste...
[DevFest Strasbourg 2025] - NodeJs Can do that !!
UFCD 0797 - SISTEMAS OPERATIVOS_Unidade Completa.pptx
Agentic Intro and Hands-on: Build your first Coded Agent
[BDD 2025 - Mobile Development] Crafting Immersive UI with E2E and AGSL Shade...
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
Leon Brands - Intro to GPU Occlusion (Graphics Programming Conference 2024)
Dev Dives: Build smarter agents with UiPath Agent Builder
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
Transcript: The partnership effect: Libraries and publishers on collaborating...
The Evolving Role of the CEO in the Age of AI
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
10 Best Automation QA Testing Software Tools in 2025.pdf
[BDD 2025 - Artificial Intelligence] AI for the Underdogs: Innovation for Sma...
Integrating AI with Meaningful Human Collaboration
How Much Does It Cost To Build Software

PostgreSQL with OpenCL

  • 1.
    Introducing PgOpenCL A New PostgreSQL Procedural LanguageUnlocking the Power of the GPU! By Tim Child
  • 2.
    BioTim Child• 35years experience of software development• Formerly • VP Oracle Corporation • VP BEA Systems Inc. • VP Informix • Leader at Illustra, Autodesk, Navteq, Intuit, …• 30+ years experience in 3D, CAD, GIS and DBMS
  • 3.
    TerminologyTerm DescriptionProcedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )GPU Graphics Processing Unit (highly specialized CPU for graphics)GPGPU General Purpose GPU (non-graphics programming on a GPU)CUDA Nvidia’s GPU programming environmentAPU Accelerated Processing Unit (AMD’s Hybrid CPU & GPU chip)ISO C99 Modern standard version of the C languageOpenCL Open Compute LanguageOpenMP Open Multi-Processing (parallelizing compilers)SIMD Single Instruction Multiple Data (Vector instructions )SSE x86, x64 (Intel, AMD) Streaming SIMD ExtensionsxPU Any Processing Unit device (CPU, GPU, APU)Kernel Functions that execute on a OpenCL DeviceWork Item Instance of a KernelWorkgroup A group of Work ItemsFLOP Floating Point Operation (single = SQL real type )MIC Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
  • 4.
    Some Technology Trends Impacting DBMS• Solid State Storage – Reduced Access Time, Lower Power, Increasing in capacity• Virtualization – Server consolidation, Specialized VM’s, lowers direct costs• Cloud Computing – EC2, Azure, … lowers capital requirements• Multi-Core – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications• xPU (GPU/APU) – GPU >1000 Cores – > 1T FLOP /s @ €2500 – APU = CPU + GPU Chip Hybrids due in Mid 2011 – 2 T FLOP /s for $2.10 per hour (AWS EC2) – Intel MIC “Knights Corner “ > 50 x86 Cores
  • 5.
    Compute Intensive xPU Database Applications• Bioinformatics• Signal/Audio/Image Processing/Video• Data Mining & Analytics• Searching• Sorting• Spatial Selections and Joins• Map/Reduce• Scientific Computing• Many Others …
  • 6.
    GPU vs CPUVendor NVidia ATI Radeon IntelArchitecture Fermi Evergreen NehalemCores 448 1600 4 Simple Simple ComplexTransistors 3.1 B 2.15 B 731 MClock 1.5 G Hz 851 M Hz 3 G HzPeak Float 1500 G 2720 G 96 GPerformance FLOP / s FLOP / s FLOP / sPeak Double 750 G 544 G 48 GPerformance FLOP / s FLOP / s FLOP / sMemory ~ 190 G / s ~ 153 G / s ~ 30 G / sBandwidthPower 250 W > 250 W 80 WConsumptionSIMD / Vector Many Many SSE4+Instructions
  • 7.
  • 8.
    Future (Mid 2011) APU Based PCAPU (Accelerated Processing Unit) APU Chip CPU CPU ~20 GB/s System RAM North Bridge ~20 GB/s APU’s PCIE ~12 GB/s PCIE ~12 GB/s Adds an Embedded Embedded GPU GPU Discrete 150 GB/s Graphic RAM GPU Source AMD
  • 9.
    Scalar vs. SIMDScalarInstruction C=A+B 1 + 2 = 3SIMD Instruction 1 3 5 7 + Vector C = Vector A + Vector B 2 4 6 8 = 3 7 11 15 OpenCL Vector lengths 2,4,8,16 for char, short, int, float, double
  • 10.
    Summarizing xPU Trends• Many more xPU Cores in our Future• Compute Environment becoming Hybrid – CPU and GPU’s – Need CPU to give access to GPU power• GPU Capabilities – Lots of cores – Vector/SIMD Instructions – Fast Memory• GPU Futures – Virtual Memory – Multi-tasking / Pre-emption
  • 11.
    Scaling PostgreSQL Queries on xPU’s Multi-Core CPU Many Core GPU PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Threads Threads Threads Threads Threads Thread Thread PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCLPostgres Threads Threads Threads Thread ThreadProcess PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Thread Thread Threads Using More Transistors
  • 12.
    Parallel Programming SystemsCategory CUDA OpenMP OpenCLLanguage C C, Fortran CCross Platform X √ √Standard Vendor OpenMP KhronosCPU X √ √GPU √ X √Clusters X √ XCompilation / Link Static Static Dynamic
  • 13.
    What is OpenCL?•OpenCL - Open Compute Language – Subset of C 99 – Open Specification – Proposed by Apple – Many Companies Collaborated on the Specification – Portable, Device Agnostic – Specification maintained by Khronos Group• PgOpenCL – OpenCL as a PostgreSQL Procedural Language
  • 14.
    System Overview DBMS Server PgOpenCL PgOpenCL Web HTTP Web SQL SQL SQLBrowser Server Statement Procedure Procedure PCIe X2 Bus TCP/IP App PostgreSQL GPGPU Server Disk I/O Tables TCP/IP PostgreSQL Client
  • 15.
    OpenCL Language• A subset of ISO C99 – - But without some C99 features such as standard C99 headers, – function pointers, recursion, variable length arrays, and bit fields• A superset of ISO C99 with additions for: – - Work-items and Workgroups – - Vector types – - Synchronization – - Address space qualifiers• Also includes a large set of built-in functions – - Image manipulation – - Work-item manipulation, – - Specialized math routines, etc.
  • 16.
    PgOpenCL Components• New PostgreSQL Procedural Language – Language handler • Maps arguments • Calls function • Returns results – Language validator • Creates Function with parameter & syntax checking • Compiles Function to a Binary format• New data types – cl_double4, cl_double8, ….• System Admin Pseudo-Tables – Platform, Device, Run-Time, …
  • 17.
  • 18.
    PGOpenCL Function DeclarationCREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[])AS $BODY$#pragma PGOPENCL Platform : ATI Stream#pragma PGOPENCL Device : CPU__kernel __attribute__((reqd_work_group_size(64, 1, 1)))void VectorAdd( __global const float *a, __global const float *b, __global float *c) { int i = get_global_id(0); c[i] = a[i] + b[i]; }$BODY$Language PgOpenCL;
  • 19.
    PgOpenCL Execution Model ATable B Select Table 100’s - 1000’s of to Array Threads (Kernels) xPU VectorAdd(A, B) A + B Returns C = C Copy Unnest Array Copy To Table Table C C C C C C C C C C C C C
  • 20.
    Using Re-Shaped Tables 100’s - 1000’s of Table of Threads (Kernels) Table of Arrays Arrays A + B = CA C C C CB xPU VectorAdd(A, B) Returns CA C C C CB Copy Copy
  • 21.
    Today’s GPGPU Challenges• No Pre-emptive Multi-Tasking• No Virtual Memory• Limited Bandwidth to discrete GPGPU – 1 – 8 G/s over PCIe Bus• Hard to Program – New Parallel Algorithms and constructs – “New” C language dialect• Immature Tools – Compilers, IDE, Debuggers, Profilers - early years• Data organization really matters – Types, Structure, and Alignment – SQL needs to Shape the Data• Profiling and Debugging is not easySolves Well for Problem Sets with the Right Shape!
  • 22.
    Making a Problem Work for You • Determine % Parallelism Possiblefor ( i = 0, i < ∞, i++) for ( j = 0; j < ∞; j++ ) for ( k = 0; k < ∞; k++ ) • Arrange data to fit available GPU RAM • Ensure calculation time >> I/O transfer overhead • Learn about Parallel Algorithms and the OpenCL language • Learn new tools • Carefully choose Data Types, Organization and Alignments • Profile and Measure at Every Stage
  • 23.
    PgOpenCL System Requirements• PostgreSQL 9.x• For GPU’s – AMD ATI OpenCL Stream SDK 2.x – NVidia CUDA 3.x SDK – Recent Macs with O/S 11.6• For CPU’s (Pentium M or more recent) – AMD ATI OpenCL Stream SDK 2.x – Intel OpenCL SDK Alpha Release (x86) – Recent Macs with O/S 11.6
  • 24.
    PGOpenCL Status Today 1Q 2011 Prototype Beta 2010 2011• Wish List • Beta Testers – Existing OpenCL App? – Have a GPU App? • Contributors – Code server side functions? • Sponsors & Supporters – AMD Fusion Fund? – Khronos?
  • 25.
    PgOpenCL Future Plans• Increase Platform Support• Scatter/Gather Functions• Additional Type Support – Image Types – Sparse Matrices• Run-Time – Asynchronous – Events – Profiling – Debugging
  • 26.
    Using the Whole Brain APU ChipPgOpenCl PgOpenCl PgOpenCL PgOpenCL CPU CPU CPU Postgres You can’t be in a parallel universe with a single brain! North Bridge ~20 GB/s • Heterogeneous Compute Environments PgOpenCl PgOpenCl • CPU’s, GPU’s, APU’s Embedded PgOpenCl • Expect 100’s – 1000’s of cores PgOpenCl GPU PgOpenCL The Future Is Parallel: What's a Programmer to Do?
  • 27.
    Summarizing PgOpenCL• Supports Heterogeneous Parallel Compute Environments • CPU’s, GPU’s, APU’s• OpenCL • Portable and high-performance framework –Ideal for computationally intensive algorithms –Access to all compute resources (CPU, APU, GPU) –Well-defined computation/memory model •Efficient parallel programming language –C99 with extensions for task and data parallelism –Rich set of built-in functions •Open standard for heterogeneous parallel computing• PgOpenCL • Integrates PostgreSQL with OpenCL • Provides Easy SQL Access to xPU’s • APU, CPU, GPGPU • Integrates OpenCL • SQL + Web Apps(PHP, Ruby, … )
  • 28.
    More Information• PGOpenCL • Twitter @3DMashUp• OpenCL• www.khronos.org/opencl/• www.amd.com/us/products/technologies/stream-technology/opencl/• http://software.intel.com/en-us/articles/intel-opencl-sdk• http://www.nvidia.com/object/cuda_opencl_new.html• http://developer.apple.com/technologies/mac/snowleopard/opencl.html
  • 29.
    Q&A• Using ParallelApplications?• Benefits of OpenCL / PgOpenCL?• Want to Collaborate on PgOpenCL?

[8]ページ先頭

©2009-2025 Movatter.jp