Movatterモバイル変換


[0]ホーム

URL:


5,353 views

PostgreSQL with OpenCL

PgOpenCL is a new PostgreSQL procedural language that allows developers to write OpenCL kernels to harness the parallel processing power of GPUs. It introduces a new execution model where tables can be copied to arrays, passed to an OpenCL kernel for parallel operations on the GPU, and results copied back to tables. This unlock the potential for dramatically improved performance on compute-intensive database operations like joins, aggregations, and sorting.

Embed presentation

Downloaded 72 times
Introducing PgOpenCL        A New PostgreSQL       Procedural LanguageUnlocking the Power of the GPU!                By             Tim Child
BioTim Child• 35 years experience of software development• Formerly  •   VP Oracle Corporation  •   VP BEA Systems Inc.  •   VP Informix  •   Leader at Illustra, Autodesk, Navteq, Intuit, …• 30+ years experience in 3D, CAD, GIS and DBMS
TerminologyTerm                  DescriptionProcedure Language    Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )GPU                   Graphics Processing Unit (highly specialized CPU for graphics)GPGPU                 General Purpose GPU (non-graphics programming on a GPU)CUDA                  Nvidia’s GPU programming environmentAPU                   Accelerated Processing Unit      (AMD’s Hybrid CPU & GPU chip)ISO C99               Modern standard version of the C languageOpenCL                Open Compute LanguageOpenMP                Open Multi-Processing (parallelizing compilers)SIMD                  Single Instruction Multiple Data (Vector instructions )SSE                   x86, x64 (Intel, AMD) Streaming SIMD ExtensionsxPU                   Any Processing Unit device (CPU, GPU, APU)Kernel                Functions that execute on a OpenCL DeviceWork Item             Instance of a KernelWorkgroup             A group of Work ItemsFLOP                  Floating Point Operation (single = SQL real type )MIC                   Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
Some Technology Trends            Impacting DBMS• Solid State Storage    – Reduced Access Time, Lower Power, Increasing in capacity• Virtualization    – Server consolidation, Specialized VM’s, lowers direct costs• Cloud Computing    – EC2, Azure, … lowers capital requirements• Multi-Core    – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications• xPU (GPU/APU)    –   GPU >1000 Cores    –    > 1T FLOP /s @ €2500    –   APU = CPU + GPU Chip Hybrids due in Mid 2011    –   2 T FLOP /s for $2.10 per hour (AWS EC2)    –   Intel MIC “Knights Corner “ > 50 x86 Cores
Compute Intensive    xPU Database Applications•   Bioinformatics•   Signal/Audio/Image Processing/Video•   Data Mining & Analytics•   Searching•   Sorting•   Spatial Selections and Joins•   Map/Reduce•   Scientific Computing•   Many Others …
GPU vs CPUVendor           NVidia       ATI Radeon      IntelArchitecture     Fermi         Evergreen    NehalemCores              448           1600          4                  Simple        Simple      ComplexTransistors       3.1 B         2.15 B       731 MClock            1.5 G Hz      851 M Hz      3 G HzPeak Float       1500 G        2720 G         96 GPerformance      FLOP / s      FLOP / s     FLOP / sPeak Double       750 G         544 G         48 GPerformance      FLOP / s      FLOP / s     FLOP / sMemory          ~ 190 G / s   ~ 153 G / s   ~ 30 G / sBandwidthPower             250 W        > 250 W        80 WConsumptionSIMD / Vector     Many          Many         SSE4+Instructions
Multi-Core PerformanceSource NVidia
Future (Mid 2011)                 APU Based PCAPU (Accelerated Processing Unit)              APU Chip      CPU             CPU                 ~20 GB/s     System RAM         North Bridge        ~20 GB/s                                           APU’s                          PCIE ~12 GB/s                          PCIE ~12 GB/s                                                     Adds an Embedded      Embedded                                             GPU        GPU                   Discrete                                          150 GB/s     Graphic RAM                     GPU             Source AMD
Scalar vs. SIMDScalar Instruction          C=A+B                           1       +       2        =        3SIMD Instruction                              1       3       5         7                                                          +      Vector C = Vector A + Vector B          2       4       6        8                                                          =                                              3       7       11       15        OpenCL                  Vector lengths 2,4,8,16 for char, short, int, float, double
Summarizing xPU            Trends• Many more xPU Cores in our Future• Compute Environment becoming Hybrid  – CPU and GPU’s  – Need CPU to give access to GPU power• GPU Capabilities  – Lots of cores  – Vector/SIMD Instructions  – Fast Memory• GPU Futures  – Virtual Memory  – Multi-tasking / Pre-emption
Scaling PostgreSQL Queries                       on xPU’s            Multi-Core CPU                                           Many Core GPU PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL       PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL  Threads     Threads    Threads    Threads        Threads     Threads    Threads    Thread     Thread                                                   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCLPostgres                                            Threads    Threads    Threads    Thread     ThreadProcess                                                   PgOpenCL              PgOpenCL   PgOpenCL   PgOpenCL                                                              PgOpenCL                                                    Threads               Threads    Thread     Thread                                                               Threads                                              Using More                                              Transistors
Parallel      Programming SystemsCategory             CUDA     OpenMP       OpenCLLanguage               C      C, Fortran     CCross Platform         X          √           √Standard             Vendor   OpenMP       KhronosCPU                    X          √           √GPU                    √          X           √Clusters               X          √           XCompilation / Link   Static     Static     Dynamic
What is OpenCL?• OpenCL - Open Compute Language  –   Subset of C 99  –   Open Specification  –   Proposed by Apple  –   Many Companies Collaborated on the Specification  –   Portable, Device Agnostic  –   Specification maintained by Khronos Group• PgOpenCL  – OpenCL as a PostgreSQL Procedural Language
System Overview                                    DBMS Server                                                   PgOpenCL                                                    PgOpenCL  Web     HTTP     Web               SQL              SQL                                                       SQLBrowser           Server             Statement     Procedure                                                    Procedure                                                       PCIe X2 Bus                           TCP/IP                   App                                      PostgreSQL              GPGPU                  Server                                        Disk I/O     Tables                           TCP/IP          PostgreSQL            Client
OpenCL                       Language• A subset of ISO C99   – - But without some C99 features such as standard C99 headers,   – function pointers, recursion, variable length arrays, and bit fields• A superset of ISO C99 with additions for:   –   - Work-items and Workgroups   –   - Vector types   –   - Synchronization   –   - Address space qualifiers• Also includes a large set of built-in functions   – - Image manipulation   – - Work-item manipulation,   – - Specialized math routines, etc.
PgOpenCL             Components• New PostgreSQL Procedural Language  – Language handler     • Maps arguments     • Calls function     • Returns results  – Language validator     • Creates Function with parameter & syntax checking     • Compiles Function to a Binary format• New data types  – cl_double4, cl_double8, ….• System Admin Pseudo-Tables  – Platform, Device, Run-Time, …
PgOpenCL Admin
PGOpenCL                        Function DeclarationCREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[])AS $BODY$#pragma PGOPENCL Platform : ATI Stream#pragma PGOPENCL Device : CPU__kernel __attribute__((reqd_work_group_size(64, 1, 1)))void VectorAdd( __global const float *a, __global const float *b, __global float *c)  {    int i = get_global_id(0);      c[i] = a[i] + b[i];  }$BODY$Language PgOpenCL;
PgOpenCL                                   Execution Model            ATable            B            Select Table                    100’s - 1000’s of              to Array                      Threads (Kernels)                                        xPU                                           VectorAdd(A, B)        A           +        B                Returns C               =       C                            Copy                                                  Unnest Array                                                                 Copy               To Table            Table                C       C    C      C   C   C    C    C      C    C       C   C      C
Using               Re-Shaped Tables                       100’s - 1000’s of    Table of           Threads (Kernels)                  Table of     Arrays                                                Arrays                  A    +   B     =         CA                                                      C     C        C   CB                   xPU                      VectorAdd(A, B)                         Returns CA                                                      C     C        C   CB                Copy                                               Copy
Today’s GPGPU              Challenges• No Pre-emptive Multi-Tasking• No Virtual Memory• Limited Bandwidth to discrete GPGPU   – 1 – 8 G/s over PCIe Bus• Hard to Program   – New Parallel Algorithms and constructs   – “New” C language dialect• Immature Tools   – Compilers, IDE, Debuggers, Profilers - early years• Data organization really matters   – Types, Structure, and Alignment   – SQL needs to Shape the Data• Profiling and Debugging is not easySolves Well for Problem Sets with the Right Shape!
Making a Problem                           Work for You        • Determine % Parallelism Possiblefor ( i = 0, i <  ∞, i++)            for ( j = 0; j < ∞; j++ )                      for ( k = 0; k <   ∞; k++ )        • Arrange data to fit available GPU RAM        •    Ensure calculation time >> I/O transfer overhead        •    Learn about Parallel Algorithms and the OpenCL language        •    Learn new tools        •    Carefully choose Data Types, Organization and Alignments        •    Profile and Measure at Every Stage
PgOpenCL     System Requirements• PostgreSQL 9.x• For GPU’s   – AMD ATI OpenCL Stream SDK 2.x   – NVidia CUDA 3.x SDK   – Recent Macs with O/S 11.6• For CPU’s (Pentium M or more recent)   – AMD ATI OpenCL Stream SDK 2.x   – Intel OpenCL SDK Alpha Release (x86)   – Recent Macs with O/S 11.6
PGOpenCL                                   Status    Today        1Q 2011  Prototype       Beta     2010             2011• Wish List       • Beta Testers              – Existing OpenCL App?              – Have a GPU App?       • Contributors              – Code server side functions?       • Sponsors & Supporters           – AMD Fusion Fund?           – Khronos?
PgOpenCL               Future Plans• Increase Platform Support• Scatter/Gather Functions• Additional Type Support   – Image Types   – Sparse Matrices• Run-Time   –   Asynchronous   –   Events   –   Profiling   –   Debugging
Using the                                Whole Brain                        APU ChipPgOpenCl                           PgOpenCl  PgOpenCL                           PgOpenCL                 CPU         CPU                    CPU      Postgres                                  You can’t be in a                                                parallel universe                                                  with a single                                                     brain!                 North Bridge             ~20 GB/s                                                 • Heterogeneous Compute Environments                          PgOpenCl                            PgOpenCl                  • CPU’s, GPU’s, APU’s             Embedded         PgOpenCl                • Expect 100’s – 1000’s of cores                                PgOpenCl               GPU                PgOpenCL             The Future Is Parallel: What's a Programmer to Do?
Summarizing              PgOpenCL• Supports Heterogeneous Parallel Compute Environments    • CPU’s, GPU’s, APU’s• OpenCL    • Portable and high-performance framework        –Ideal for computationally intensive algorithms        –Access to all compute resources (CPU, APU, GPU)        –Well-defined computation/memory model    •Efficient parallel programming language        –C99 with extensions for task and data parallelism        –Rich set of built-in functions    •Open standard for heterogeneous parallel computing• PgOpenCL   • Integrates PostgreSQL with OpenCL   • Provides Easy SQL Access to xPU’s       • APU, CPU, GPGPU   • Integrates OpenCL       • SQL + Web Apps(PHP, Ruby, … )
More                    Information•   PGOpenCL        • Twitter @3DMashUp•   OpenCL• www.khronos.org/opencl/• www.amd.com/us/products/technologies/stream-technology/opencl/• http://software.intel.com/en-us/articles/intel-opencl-sdk• http://www.nvidia.com/object/cuda_opencl_new.html• http://developer.apple.com/technologies/mac/snowleopard/opencl.html
Q&A• Using Parallel Applications?• Benefits of OpenCL / PgOpenCL?• Want to Collaborate on PgOpenCL?

Recommended

PDF
PG-Strom - GPU Accelerated Asyncr
PDF
PG-Strom
PDF
SQL+GPU+SSD=∞ (English)
PDF
GPGPU Accelerates PostgreSQL (English)
PDF
PG-Strom - A FDW module utilizing GPU device
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PDF
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
PDF
20160407_GTC2016_PgSQL_In_Place
PDF
20150318-SFPUG-Meetup-PGStrom
PDF
pgconfasia2016 plcuda en
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
20170602_OSSummit_an_intelligent_storage
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
PDF
Easy and High Performance GPU Programming for Java Programmers
PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
PDF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PDF
Making Hardware Accelerator Easier to Use
PPTX
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
PDF
Using GPUs to handle Big Data with Java by Adam Roberts.
PDF
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
PDF
20181212 - PGconfASIA - LT - English
PDF
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
PDF
Transparent GPU Exploitation for Java
PDF
20171206 PGconf.ASIA LT gstore_fdw
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PPTX
GPGPU programming with CUDA
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PDF
20190909_PGconf.ASIA_KaiGai
PDF
Task Parallel Library (TPL)
PDF
TPL Dataflow – зачем и для кого?

More Related Content

PDF
PG-Strom - GPU Accelerated Asyncr
PDF
PG-Strom
PDF
SQL+GPU+SSD=∞ (English)
PDF
GPGPU Accelerates PostgreSQL (English)
PDF
PG-Strom - A FDW module utilizing GPU device
PDF
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PDF
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
PDF
20160407_GTC2016_PgSQL_In_Place
PG-Strom - GPU Accelerated Asyncr
PG-Strom
SQL+GPU+SSD=∞ (English)
GPGPU Accelerates PostgreSQL (English)
PG-Strom - A FDW module utilizing GPU device
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
20160407_GTC2016_PgSQL_In_Place

What's hot

PDF
20150318-SFPUG-Meetup-PGStrom
PDF
pgconfasia2016 plcuda en
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PDF
20170602_OSSummit_an_intelligent_storage
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
PDF
Easy and High Performance GPU Programming for Java Programmers
PDF
Let's turn your PostgreSQL into columnar store with cstore_fdw
PDF
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PDF
Making Hardware Accelerator Easier to Use
PPTX
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
PDF
Using GPUs to handle Big Data with Java by Adam Roberts.
PDF
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
PDF
20181212 - PGconfASIA - LT - English
PDF
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
PDF
Transparent GPU Exploitation for Java
PDF
20171206 PGconf.ASIA LT gstore_fdw
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PPTX
GPGPU programming with CUDA
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PDF
20190909_PGconf.ASIA_KaiGai
20150318-SFPUG-Meetup-PGStrom
pgconfasia2016 plcuda en
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
20170602_OSSummit_an_intelligent_storage
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
Easy and High Performance GPU Programming for Java Programmers
Let's turn your PostgreSQL into columnar store with cstore_fdw
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
Making Hardware Accelerator Easier to Use
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
Using GPUs to handle Big Data with Java by Adam Roberts.
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
20181212 - PGconfASIA - LT - English
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
Transparent GPU Exploitation for Java
20171206 PGconf.ASIA LT gstore_fdw
PG-Strom v2.0 Technical Brief (17-Apr-2018)
GPGPU programming with CUDA
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
20190909_PGconf.ASIA_KaiGai

Viewers also liked

PDF
Task Parallel Library (TPL)
PDF
TPL Dataflow – зачем и для кого?
PPTX
Task Parallel Library 2014
PDF
An Intelligent Storage?
PDF
20170127 JAWS HPC-UG#8
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
PDF
Convolutional Neural Networks (CNN)
Task Parallel Library (TPL)
TPL Dataflow – зачем и для кого?
Task Parallel Library 2014
An Intelligent Storage?
20170127 JAWS HPC-UG#8
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Networks (CNN)

Similar to PostgreSQL with OpenCL

PDF
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
PPTX
Gpu archi
PPT
Current Trends in HPC
PPTX
Gpgpu intro
PDF
The Rise of Parallel Computing
PDF
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
PDF
Computing using GPUs
PDF
[05][cuda 및 fermi 최적화 기술] hryu optimization
PPTX
Cuda Architecture
PDF
Accelerating Real Time Applications on Heterogeneous Platforms
 
PDF
Trip down the GPU lane with Machine Learning
PDF
Programming the PS3
PDF
The road to multi/many core computing
PDF
Open CL For Haifa Linux Club
PDF
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
PDF
PPT
Vpu technology &gpgpu computing
PDF
OpenCL & the Future of Desktop High Performance Computing in CAD
PDF
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
PDF
Using GPUs for parallel processing
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
Gpu archi
Current Trends in HPC
Gpgpu intro
The Rise of Parallel Computing
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
Computing using GPUs
[05][cuda 및 fermi 최적화 기술] hryu optimization
Cuda Architecture
Accelerating Real Time Applications on Heterogeneous Platforms
 
Trip down the GPU lane with Machine Learning
Programming the PS3
The road to multi/many core computing
Open CL For Haifa Linux Club
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Vpu technology &gpgpu computing
OpenCL & the Future of Desktop High Performance Computing in CAD
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
Using GPUs for parallel processing
 

Recently uploaded

PDF
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
PDF
Accessibility & Inclusion: What Comes Next. Presentation of the Digital Acces...
PDF
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
PDF
[BDD 2025 - Full-Stack Development] Agentic AI Architecture: Redefining Syste...
PDF
How Much Does It Cost to Build an eCommerce Website in 2025.pdf
PDF
Open Source Post-Quantum Cryptography - Matt Caswell
PDF
PCCC25(設立25年記念PCクラスタシンポジウム):エヌビディア合同会社 テーマ2「NVIDIA BlueField-4 DPU」
PPTX
MuleSoft AI Series : Introduction to MCP
PDF
[BDD 2025 - Mobile Development] Exploring Apple’s On-Device FoundationModels
PDF
Integrating AI with Meaningful Human Collaboration
PDF
[DevFest Strasbourg 2025] - NodeJs Can do that !!
PDF
Dev Dives: Build smarter agents with UiPath Agent Builder
PPTX
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
PDF
The Necessity of Digital Forensics, the Digital Forensics Process & Laborator...
PPTX
Leon Brands - Intro to GPU Occlusion (Graphics Programming Conference 2024)
PDF
Transcript: The partnership effect: Libraries and publishers on collaborating...
PDF
So You Want to Work at Google | DevFest Seattle 2025
PDF
ODSC AI West: Agent Optimization: Beyond Context engineering
PDF
Parallel Computing BCS702 Module notes of the vtu college 7th sem 4.pdf
PPTX
Guardrails in Action - Ensuring Safe AI with Azure AI Content Safety.pptx
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
Accessibility & Inclusion: What Comes Next. Presentation of the Digital Acces...
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
[BDD 2025 - Full-Stack Development] Agentic AI Architecture: Redefining Syste...
How Much Does It Cost to Build an eCommerce Website in 2025.pdf
Open Source Post-Quantum Cryptography - Matt Caswell
PCCC25(設立25年記念PCクラスタシンポジウム):エヌビディア合同会社 テーマ2「NVIDIA BlueField-4 DPU」
MuleSoft AI Series : Introduction to MCP
[BDD 2025 - Mobile Development] Exploring Apple’s On-Device FoundationModels
Integrating AI with Meaningful Human Collaboration
[DevFest Strasbourg 2025] - NodeJs Can do that !!
Dev Dives: Build smarter agents with UiPath Agent Builder
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
The Necessity of Digital Forensics, the Digital Forensics Process & Laborator...
Leon Brands - Intro to GPU Occlusion (Graphics Programming Conference 2024)
Transcript: The partnership effect: Libraries and publishers on collaborating...
So You Want to Work at Google | DevFest Seattle 2025
ODSC AI West: Agent Optimization: Beyond Context engineering
Parallel Computing BCS702 Module notes of the vtu college 7th sem 4.pdf
Guardrails in Action - Ensuring Safe AI with Azure AI Content Safety.pptx

PostgreSQL with OpenCL

  • 1.
    Introducing PgOpenCL A New PostgreSQL Procedural LanguageUnlocking the Power of the GPU! By Tim Child
  • 2.
    BioTim Child• 35years experience of software development• Formerly • VP Oracle Corporation • VP BEA Systems Inc. • VP Informix • Leader at Illustra, Autodesk, Navteq, Intuit, …• 30+ years experience in 3D, CAD, GIS and DBMS
  • 3.
    TerminologyTerm DescriptionProcedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )GPU Graphics Processing Unit (highly specialized CPU for graphics)GPGPU General Purpose GPU (non-graphics programming on a GPU)CUDA Nvidia’s GPU programming environmentAPU Accelerated Processing Unit (AMD’s Hybrid CPU & GPU chip)ISO C99 Modern standard version of the C languageOpenCL Open Compute LanguageOpenMP Open Multi-Processing (parallelizing compilers)SIMD Single Instruction Multiple Data (Vector instructions )SSE x86, x64 (Intel, AMD) Streaming SIMD ExtensionsxPU Any Processing Unit device (CPU, GPU, APU)Kernel Functions that execute on a OpenCL DeviceWork Item Instance of a KernelWorkgroup A group of Work ItemsFLOP Floating Point Operation (single = SQL real type )MIC Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
  • 4.
    Some Technology Trends Impacting DBMS• Solid State Storage – Reduced Access Time, Lower Power, Increasing in capacity• Virtualization – Server consolidation, Specialized VM’s, lowers direct costs• Cloud Computing – EC2, Azure, … lowers capital requirements• Multi-Core – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications• xPU (GPU/APU) – GPU >1000 Cores – > 1T FLOP /s @ €2500 – APU = CPU + GPU Chip Hybrids due in Mid 2011 – 2 T FLOP /s for $2.10 per hour (AWS EC2) – Intel MIC “Knights Corner “ > 50 x86 Cores
  • 5.
    Compute Intensive xPU Database Applications• Bioinformatics• Signal/Audio/Image Processing/Video• Data Mining & Analytics• Searching• Sorting• Spatial Selections and Joins• Map/Reduce• Scientific Computing• Many Others …
  • 6.
    GPU vs CPUVendor NVidia ATI Radeon IntelArchitecture Fermi Evergreen NehalemCores 448 1600 4 Simple Simple ComplexTransistors 3.1 B 2.15 B 731 MClock 1.5 G Hz 851 M Hz 3 G HzPeak Float 1500 G 2720 G 96 GPerformance FLOP / s FLOP / s FLOP / sPeak Double 750 G 544 G 48 GPerformance FLOP / s FLOP / s FLOP / sMemory ~ 190 G / s ~ 153 G / s ~ 30 G / sBandwidthPower 250 W > 250 W 80 WConsumptionSIMD / Vector Many Many SSE4+Instructions
  • 7.
  • 8.
    Future (Mid 2011) APU Based PCAPU (Accelerated Processing Unit) APU Chip CPU CPU ~20 GB/s System RAM North Bridge ~20 GB/s APU’s PCIE ~12 GB/s PCIE ~12 GB/s Adds an Embedded Embedded GPU GPU Discrete 150 GB/s Graphic RAM GPU Source AMD
  • 9.
    Scalar vs. SIMDScalarInstruction C=A+B 1 + 2 = 3SIMD Instruction 1 3 5 7 + Vector C = Vector A + Vector B 2 4 6 8 = 3 7 11 15 OpenCL Vector lengths 2,4,8,16 for char, short, int, float, double
  • 10.
    Summarizing xPU Trends• Many more xPU Cores in our Future• Compute Environment becoming Hybrid – CPU and GPU’s – Need CPU to give access to GPU power• GPU Capabilities – Lots of cores – Vector/SIMD Instructions – Fast Memory• GPU Futures – Virtual Memory – Multi-tasking / Pre-emption
  • 11.
    Scaling PostgreSQL Queries on xPU’s Multi-Core CPU Many Core GPU PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Threads Threads Threads Threads Threads Thread Thread PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCLPostgres Threads Threads Threads Thread ThreadProcess PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Thread Thread Threads Using More Transistors
  • 12.
    Parallel Programming SystemsCategory CUDA OpenMP OpenCLLanguage C C, Fortran CCross Platform X √ √Standard Vendor OpenMP KhronosCPU X √ √GPU √ X √Clusters X √ XCompilation / Link Static Static Dynamic
  • 13.
    What is OpenCL?•OpenCL - Open Compute Language – Subset of C 99 – Open Specification – Proposed by Apple – Many Companies Collaborated on the Specification – Portable, Device Agnostic – Specification maintained by Khronos Group• PgOpenCL – OpenCL as a PostgreSQL Procedural Language
  • 14.
    System Overview DBMS Server PgOpenCL PgOpenCL Web HTTP Web SQL SQL SQLBrowser Server Statement Procedure Procedure PCIe X2 Bus TCP/IP App PostgreSQL GPGPU Server Disk I/O Tables TCP/IP PostgreSQL Client
  • 15.
    OpenCL Language• A subset of ISO C99 – - But without some C99 features such as standard C99 headers, – function pointers, recursion, variable length arrays, and bit fields• A superset of ISO C99 with additions for: – - Work-items and Workgroups – - Vector types – - Synchronization – - Address space qualifiers• Also includes a large set of built-in functions – - Image manipulation – - Work-item manipulation, – - Specialized math routines, etc.
  • 16.
    PgOpenCL Components• New PostgreSQL Procedural Language – Language handler • Maps arguments • Calls function • Returns results – Language validator • Creates Function with parameter & syntax checking • Compiles Function to a Binary format• New data types – cl_double4, cl_double8, ….• System Admin Pseudo-Tables – Platform, Device, Run-Time, …
  • 17.
  • 18.
    PGOpenCL Function DeclarationCREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[])AS $BODY$#pragma PGOPENCL Platform : ATI Stream#pragma PGOPENCL Device : CPU__kernel __attribute__((reqd_work_group_size(64, 1, 1)))void VectorAdd( __global const float *a, __global const float *b, __global float *c) { int i = get_global_id(0); c[i] = a[i] + b[i]; }$BODY$Language PgOpenCL;
  • 19.
    PgOpenCL Execution Model ATable B Select Table 100’s - 1000’s of to Array Threads (Kernels) xPU VectorAdd(A, B) A + B Returns C = C Copy Unnest Array Copy To Table Table C C C C C C C C C C C C C
  • 20.
    Using Re-Shaped Tables 100’s - 1000’s of Table of Threads (Kernels) Table of Arrays Arrays A + B = CA C C C CB xPU VectorAdd(A, B) Returns CA C C C CB Copy Copy
  • 21.
    Today’s GPGPU Challenges• No Pre-emptive Multi-Tasking• No Virtual Memory• Limited Bandwidth to discrete GPGPU – 1 – 8 G/s over PCIe Bus• Hard to Program – New Parallel Algorithms and constructs – “New” C language dialect• Immature Tools – Compilers, IDE, Debuggers, Profilers - early years• Data organization really matters – Types, Structure, and Alignment – SQL needs to Shape the Data• Profiling and Debugging is not easySolves Well for Problem Sets with the Right Shape!
  • 22.
    Making a Problem Work for You • Determine % Parallelism Possiblefor ( i = 0, i < ∞, i++) for ( j = 0; j < ∞; j++ ) for ( k = 0; k < ∞; k++ ) • Arrange data to fit available GPU RAM • Ensure calculation time >> I/O transfer overhead • Learn about Parallel Algorithms and the OpenCL language • Learn new tools • Carefully choose Data Types, Organization and Alignments • Profile and Measure at Every Stage
  • 23.
    PgOpenCL System Requirements• PostgreSQL 9.x• For GPU’s – AMD ATI OpenCL Stream SDK 2.x – NVidia CUDA 3.x SDK – Recent Macs with O/S 11.6• For CPU’s (Pentium M or more recent) – AMD ATI OpenCL Stream SDK 2.x – Intel OpenCL SDK Alpha Release (x86) – Recent Macs with O/S 11.6
  • 24.
    PGOpenCL Status Today 1Q 2011 Prototype Beta 2010 2011• Wish List • Beta Testers – Existing OpenCL App? – Have a GPU App? • Contributors – Code server side functions? • Sponsors & Supporters – AMD Fusion Fund? – Khronos?
  • 25.
    PgOpenCL Future Plans• Increase Platform Support• Scatter/Gather Functions• Additional Type Support – Image Types – Sparse Matrices• Run-Time – Asynchronous – Events – Profiling – Debugging
  • 26.
    Using the Whole Brain APU ChipPgOpenCl PgOpenCl PgOpenCL PgOpenCL CPU CPU CPU Postgres You can’t be in a parallel universe with a single brain! North Bridge ~20 GB/s • Heterogeneous Compute Environments PgOpenCl PgOpenCl • CPU’s, GPU’s, APU’s Embedded PgOpenCl • Expect 100’s – 1000’s of cores PgOpenCl GPU PgOpenCL The Future Is Parallel: What's a Programmer to Do?
  • 27.
    Summarizing PgOpenCL• Supports Heterogeneous Parallel Compute Environments • CPU’s, GPU’s, APU’s• OpenCL • Portable and high-performance framework –Ideal for computationally intensive algorithms –Access to all compute resources (CPU, APU, GPU) –Well-defined computation/memory model •Efficient parallel programming language –C99 with extensions for task and data parallelism –Rich set of built-in functions •Open standard for heterogeneous parallel computing• PgOpenCL • Integrates PostgreSQL with OpenCL • Provides Easy SQL Access to xPU’s • APU, CPU, GPGPU • Integrates OpenCL • SQL + Web Apps(PHP, Ruby, … )
  • 28.
    More Information• PGOpenCL • Twitter @3DMashUp• OpenCL• www.khronos.org/opencl/• www.amd.com/us/products/technologies/stream-technology/opencl/• http://software.intel.com/en-us/articles/intel-opencl-sdk• http://www.nvidia.com/object/cuda_opencl_new.html• http://developer.apple.com/technologies/mac/snowleopard/opencl.html
  • 29.
    Q&A• Using ParallelApplications?• Benefits of OpenCL / PgOpenCL?• Want to Collaborate on PgOpenCL?

[8]ページ先頭

©2009-2025 Movatter.jp