Movatterモバイル変換


[0]ホーム

URL:


Mark Kilgard, profile picture
Uploaded byMark Kilgard
PPT, PDF54,665 views

OpenGL 3.2 and More

The document provides a comprehensive overview of OpenGL 3.2 and its features, highlighting enhancements from previous versions, such as increased shader capabilities and modern GPU functionality. It also discusses the compatibility of OpenGL with Direct3D conventions, addressing the differences in pixel processing, vertex ordering, and shading models. Additionally, the document emphasizes Nvidia's commitment to backward compatibility and support for both OpenGL and Direct3D APIs in visual computing applications.

Embed presentation

Downloaded 1,024 times
San Jose  |  September 30, 2009  |  Mark J. Kilgard, NVIDIA Corporation OpenGL 3.2 and More
Mark J. Kilgard Principal System Software Engineer OpenGL driver Cg shading language OpenGL Utility Toolkit (GLUT) implementer co-author of  Cg Tutorial
Overview OpenGL 3.2 Available today What’s in it? NVIDIA’s additional functionality Above & beyond OpenGL 3.2
A  brief  2-slide review of OpenGL 3.0 & 3.1 Before we get really started… You are already familiar and  using OpenGL 3.1 aren’t you??
For review, OpenGL 3.0 Texturing Integer & floating-point texture formats Compact floating-point formats sRGB color space texture formats 1- and 2-component compressed texture formats 1D and 2D texture array targets Miscellaneous Vertex array objects Conditional rendering Multisample-aware stretch blits Fine control over mapping & flushing buffer sub-ranges Framebuffer functionality Render-to-texture with framebuffer objects sRGB blending Packed depth/stencil formats for render-buffers (and texturing) Per-color-attachment blend enables and color write masks Shader improvements OpenGL Shading Language 1.30
For review, OpenGL 3.1 Texturing Guarantees 16 texture units Texture buffer objects Texture rectangle target: 2D image with [0..width, 0..height] coordinate space Signed normalized texture formats Miscellaneous Fast data copying between buffer objects Primitive restart index for vertex arrays Shader improvements OpenGL Shading Language 1.40 Shader can access uniform values from buffer objects Instanced rendering provides instance counter to vertex shader
OpenGL 3.2 modern GPU functionality, platform portability, API maturity & completeness
From the 1994 OpenGL 1.1 Data Flow… vertex processing rasterization & fragment coloring texture raster operations framebuffer pixel unpack pixel pack vertex puller client memory pixel transfer glReadPixels / glCopyPixels / glCopyTex{Sub}Image glDrawPixels glBitmap glCopyPixels glTex{Sub}Image glCopyTex{Sub}Image glDrawElements glDrawArrays selection / feedback / transform feedback glVertex* glColor* glTexCoord* etc.  blending depth testing stencil testing accumulation storage access operations
… OpenGL 1.0 in detail Vertex processing Pixel processing Texture mapping Image primitive processing Pixel unpacking Pixel packing Vertex assembly texture image specification image rectangles, bitmaps primitive topology, transformed vertex data stenciling, depth testing, blending, accumulation pixel image primitive batch type, vertex attributes primitive batch type, vertex data fragment texture fetches pixel image or texture image specification image and bitmap fragments point, line, and polygon fragments pixels to pack unpacked pixels pixels fragments filtered texels buffer data vertices Legend programmable operations fixed-function operations copy pixels, copy texture image Fragment processing Geometric primitive assembly & processing Raster operations Framebuffer Command parser
… to the 2009 OpenGL 3.2 Data Flow Vertex processing Pixel processing Texture mapping Geometric primitive assembly & processing Image primitive processing Transform feedback Pixel unpacking Pixel packing Vertex assembly pixels in framebuffer object textures texture buffer objects texture image specification image rectangles, bitmaps primitive topology, transformed vertex data vertex texture fetches pixel pack buffer objects pixel unpack buffer objects vertex buffer objects transform feedback buffer objects buffer data, unmap buffer geometry texture fetches primitive batch type, vertex indices, vertex attributes primitive batch type, vertex data fragment texture fetches pixel image or texture image specification map buffer, get buffer data transformed vertex attributes image and bitmap fragments point, line, and polygon fragments pixels to pack unpacked pixels pixels fragments filtered texels buffer data vertices Legend programmable operations fixed-function operations copy pixels, copy texture image Buffer store uniform/ parameters buffer objects Fragment processing stenciling, depth testing, blending, accumulation Raster operations Framebuffer Command parser
Buffer Centric View of OpenGL Vertex Array Buffer Object (VaBO) Transform Feedback Buffer (XBO) Parameter Buffer (PaBO) Pixel Unpack Buffer (PuBO) Pixel Pack Buffer (PpBO) Bindable Uniform Buffer (BUB) Texture Buffer Object (TexBO) Vertex Puller Vertex Shading Geometry Shading Fragment Shading Texturing Array Element Buffer Object (VeBO) Pixel Pipeline vertex data texel data pixel data parameter data ( not ARB functionality yet ) glBegin, glDrawElements, etc. glDrawPixels, glTexImage2D, etc. glReadPixels, etc. Framebuffer
OpenGL 3.2 Functional Overview Direct3D-isms BGRA vertex component ordering Provoking vertex convention Drawing commands allowing modification of the base vertex index Upper-left and lower-left fragment coordinate conventions Geometry shaders Per-primitive programmability Shader improvements OpenGL Shading Language 1.50 Miscellaneous Depth clamping, synchronization, seamless cube map filtering, multisample improvements
Direct3Disms better OpenGL & Direct3D content portability
Direct3Dism Motivation A posteriori “3D content tied to API” scheme Without intending it, 3D application content gets tied to API’s conventions Your OpenGL application OpenGL driver same GPU Direct3D driver Your OpenGL application content Your Direct3D application Your Direct3D application content OpenGL conventions Direct3D conventions content authored to OpenGL conventions content authored to Direct3D conventions OpenGL API Direct3D API hardware interface 3D API interface
NVIDIA Recognizes 3D API Reality You  decide the 3D API best for your application Lots of reasons to pick your API choice Target systems, intended market, cross-platform requirements, software legacy, content creation vs. deployment, etc. Fundamentally, NVIDIA believes in Visual Computing (not APIs) So is  essentially agnostic  about your 3D API choice OpenGL, Direct3D 9/10/11, or OpenGL ES NVIDIA provides best implementations of all options; you pick NVIDIA’s belief in Visual Computing means Your 3D API choice shouldn’t tie down your 3D application or 3D content
Direct3Dism Concept Allows your 3D content to be API agnostic OpenGL supports  both  OpenGL & Direct3D conventions, so support either style Your OpenGL application OpenGL driver GPU Direct3D driver Your OpenGL application content Your Direct3D application Your Direct3D application content OpenGL API Direct3D API content authored to OpenGL conventions content authored to Direct3D conventions OpenGL + Direct3D conventions Direct3D conventions hardware interface 3D API interface Direct3D conventions supported by OpenGL too
OpenGL & Direct3D Conventions OpenGL 3.2 First vertex of primitive Last vertex of primitive (mostly) Provoking vertex for flat-shading OpenGL 3.2 Upper-left Lower-left Fragment coordinate origin Cg HLSL 9, 10, and 11 GLSL Shading Language syntax Convention OpenGL Direct3D Addressed by  Window origin Lower-left, pixels at half-integers Upper-left, pixels on integers (DX9) pixels on half-integers (DX 10) projection matrix & front-facing re-configuration Clip space [-1…+1] 3 [-1…+1] 2 [0…1] projection matrix re-configuration 4-byte vertex color RGBA BGRA OpenGL 3.2 Shader bind granularity Linked (for GLSL) Per-domain (for Cg & assembly) Per-domain EXT_separate shader_objects Object manipulation Bind-to-edit, Bind-to-query Edit-by-name, Query-by-name EXT_direct_ state_access
Dealing with API Convention Differences Innocuous differences API granularity OpenGL fine-grain state vs. Direct3D 10 state blocks OpenGL selectors versus Direct3D direct state access Easily dealt with by reconfiguring existing state Examples: window origin & clip space conventions Formidable differences Format differences Unsupported formats such as 4-byte BGRA vertex colors Inconsistent state management Per-domain shaders vs. monolithic GLSL shaders Shaders coded to a particular shading language syntax GLSL vs. HLSL, achieve commonality via Cg Conventions baked into shaders Fragment coordinate origin as visible from a fragment shader fairly easy to address in your application  difficult to address without 3D API help
Impetus for Direct3Dism Effort Many software companies motivated this effort TransGaming Blizzard Destineer Aspyr CodeWeavers Direct result of feedback from 3D software engineers Yes, you can influence OpenGL’s direction & course
Supporting Direct3Disms  Not  New to OpenGL OpenGL has always supported multiple formats well OpenGL’s plethora of pixel and vertex formats Very first  OpenGL extension:  EXT_bgra Provides a pixel component ordering to match the color component ordering of Windows for 2D GDI rendering Made core functionality by OpenGL 1.3 Many OpenGL extensions have embraced Direct3Disms Secondary color Fog coordinate Point sprites OpenGL 3.0’s fine-grain buffer mapping
BGRA Vertex Array Order Direct3D 9’s most common usage for per-vertex colors is 32-bit D3DCOLOR data type: Red in bits 16:23 Green in bits 8:15 Blue in bits 0:7 Alpha in bits 24:31 Laid in memory, looks like BGRA order OpenGL assumes RGBA order for all vertex arrays However Direct3D colors  not  stored in packed unsigned bytes have RGBA order Direct3Dism EXT_vertex_array_bgra extension allows: glColorPointer( GL_BGRA , GL_UNSIGNED_BYTE, stride, pointer); glSecondaryColorPointer( GL_BGRA , GL_UNSIGNED_BYTE, stride, pointer); glVertexAttribPointer( GL_BGRA , GL_UNSIGNED_BYTE, stride, pointer); 8-bit red 8-bit alpha 8-bit green 8-bit blue bit 31 bit 0
Provoking Vertex Order Conventions Direct3D uses “first” vertex of a triangle or line to determine which color is used for flat shading OpenGL uses “last” vertex for lines, triangles, and quads Except for polygons ( GL_POLYGON ) mode that use the first vertex Direct3D 9 pDev->SetRenderState( D3DRS_SHADEMODE, D3DSHADE_FLAT); OpenGL glShadeModel(GL_FLAT); Input triangle strip with per-vertex colors
Configurable Provoking Vertex Easy-to-use API New command  glProvokingVertex // “native” OpenGL convention glProvokingVertex(GL_LAST_VERTEX_CONVENTION); // Direct3D convention glProvokingVertex(GL_FIRST_VERTEX_CONVENTION); OpenGL 3.2 promotion of  EXT_provoking_vertex  extension  Affects fixed-function  glShadeModel flat shaded attributes for fragment shaders geometry shaders that emit flat shaded attributes
Provoking Vertex Details Provoking vertex sounds really obscure Technically shade model is part of “deprecated” feature set of OpenGL However very common mode for real-time strategy games Many, many objects drawn this way Very difficult for application to “juggle” vertex data to match API’s native provoking vertex convention Particularly when using vertex buffer objects Quad behavior may vary Direct3D doesn’t support quadrilateral primitives So “first vertex” provoking vertex convention may or may not apply to quadrilateral primitives GeForce 8 say true for “quads follow the convention” GeForce 7 and earlier say false for “quads follow the convention” Check GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION boolean if you care
Provoking Vertex Behavior geometry shader primitives Last vertex convention First vertex convention Primitive type of polygon  i 2i+3 2i-1 GL_TRIANGLE_STRIP_ADJACENCY 6i-1 6i-5 GL_TRIANGLE_ADJACENCY i+2 i+1 GL_LINE_STRIP_ADJACENCY 4i-1 4i-2 GL_LINES_ADJACENCY i i GL_POLYGON 2i+2 , if quads follow provoking vertex 2i+2 , if not 2i-1 2i+2 GL_QUAD_STRIP 4i , if quads follow provoking vertex 4i , if not 4i-3 4i GL_QUADS i+2 i+1 GL_TRIANGLE_FAN i+2 i GL_TRIANGLE_STRIP 3i 3i-2 GL_TRIANGLES i+1 i GL_LINE_STRIP i+1 , if  i<n 1,  if  i=n i GL_LINE_LOOP 2i 2i-1 GL_LINES i i GL_POINT same same same same
Direct3D vs. OpenGL Coordinate System Conventions Window origin conventions Direct3D = upper-left origin OpenGL = lower-left origin Pixel center conventions Direct3D9 = pixel centers at integer locations OpenGL and Direct3D 10 = pixel centers at half-pixel locations Makes pixel centers for rasterization “match” texel centers for texturing Clip space conventions Direct3D = [-1,+1] for XY, [0,1] for Z OpenGL = [-1,+1] range for XYZ Affects How projection matrix is loaded Fragment shaders that access the window position Point sprites have upper-left texture coordinate origin OpenGL already lets application choose lower-left or upper-left
3 APIs, 3 Different Window Space Conventions Pixel center grids  coordinate systems OpenGL   Direct3D 9   Direct3D 10   Upper-left origin Lower-left origin = pixel sample   center
Direct3D 9 to OpenGL How to go from Direct3D ’s [-1,+1]x[-1,+1]x[0,1] clip space to OpenGL’s [-1,+1] 3 integer-centered pixel centers to OpenGL’s half-pixel centers  Simple state adjustment Projection matrix fudge glMatrixLoadIdentityEXT(GL_PROJECTION); glMatrixScalefEXT(GL_PROJECTION, 1, -1, 2); glMatrixTranslatefEXT(GL_PROJECTION,   0.5/windowWidth, 0.5/windowHeight,   -0.5); Reverse convention for what is front-facing glFrontFace(GL_CW);   //  OpenGL default is GL_CCW Compensates for y-flip that reverses coordinate system’s handedness No need for API additions to support Direct3D 9’s system
Direct3D 10 to OpenGL How to go from Direct3D 10’s [-1,+1]x[-1,+1]x[0,1] clip space to OpenGL’s [-1,+1] 3 where both APIs have half-pixel centers  Simple state adjustment Projection matrix fudge glMatrixLoadIdentityEXT(GL_PROJECTION); glMatrixScalefEXT(GL_PROJECTION, 1, -1, 2); glMatrixTranslatefEXT(GL_PROJECTION,   0, 0,  // no half-pixel shift for Direct3 10   -0.5); Reverse convention for what is front-facing glFrontFace(GL_CW);   //  OpenGL default is GL_CCW Compensates for y-flip that reverses coordinate system’s handedness Again, no need for API additions to support Direct3D 10’s system
Fragment Coordinate Convention Usage Typically used in  post-processing  shaders Examples: Motion blur Depth-of-field Shader assumes a particular convention for the fragment coordinate origin Attempting to “re-write” Direct3D shader tends to Compromise shader performance Introduces new “window height” uniform that must be always set correctly Hard to do automatically  and  robustly Robust approach:  Allow shader author (or automatic translator) to specify convention explicitly
Fragment Shader Coordinate Conventions Required GLSL introduction #extension GL_ARB_fragment_coord_conventions : require Pick  one  of the following GLSL declarations:  // “native” OpenGL convention in vec4 gl_FragCoord; // DirectX 9 convention layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord; // DirectX 10 convention layout(origin_upper_left) in vec4 gl_FragCoord; Also supported by NVIDIA assembly extensions OPTION ARB_fragment_coord_origin_upper_left; OPTION ARB_fragment_coord_pixel_center_integer;
Deprecation there’s “old” & there’s “still supported”
Deprecation – OpenGL ARB view OpenGL has never removed features. However, After 15+ years, defining new features to work with old features becomes increasingly difficult OpenGL 3.0 marks features as deprecated OpenGL 3.0 does not remove any features Redundant, legacy and obsolete features Parts of OpenGL unlikely to be accelerated Guidance to developers to prepare for future revisions
Deprecation – OpenGL ARB view OpenGL 3.1 removed these deprecated features Added support back with  ARB_compatibility  extension OpenGL 3.2 formalized this in two profiles “ Core” profile with features removes “ Compatibility” profile with all features present Implementation of “Core” mandatory “ Compatibility” optional
Deprecation – NVIDIA view Set of removed functionality is in use by applications today, helping our customer’s business Using just “Core” OpenGL 3.2 is a huge effort in rewriting existing code OpenGL 3.2 “Core” not offering enough incentive to re-write existing code Deprecation is NOT in the best interest of ISVs and therefore not in NVIDIA’s business interest
Deprecation – NVIDIA view We will not remove ANY feature from our drivers OpenGL on NVIDIA will be fully backwards compatible NVIDIA has and will ship the Compatibility profile NVIDIA will fully support, tune and bug fix  all  features See our public statement: http://developer.nvidia.com/object/opengl_3_driver.html
Deprecation – Myths Feature removal will result in a faster driver Feature removal will result in a higher quality driver Feature removal will result in a cleaner API Not removing features means OpenGL will die Only useless features were deprecated Far from true
So You can just ignore Deprecation NVIDIA values OpenGL API backward compatibility We don’t take API functionality away from you We aren’t going to force you to re-write apps Does deprecated functionality “stay fast”? Yes, of course—and stays fully tested Bottom-line: Old & new features run fast
Geometry Shaders per-primitive programmability
Geometry Shaders via OpenGL Programmability for geometric primitives one geometric primitive in, zero or more primitives out Supported by NVIDIA’s OpenGL driver since GeForce 8 launch NV_gpu_program4  for assembly Cg 2.x’s  gp4gp  “geometry” profile NV_geometry_shader4  /  EXT_geometry_shader4  for (GLSL) Standardized as an ARB extension in OpenGL 3.1 timeframe ARB_geometry_shader4 Now finally core functionality in OpenGL 3.2 Essentially unchanged from EXT and ARB versions
Geometry Shaders New programmable shader domain Operates on assembled primitives Triangles, lines, points, and new adjacency primitives Outputs zero or more primitives Must be point, line stripes, or triangle strips Primitive restarts allowed Warning: Not well suited for unbounded tessellation application Vertex shader Primitive assembly Geometry shader Rasterizer Fragment shader Raster operations framebuffer application programmable
Geometry Shader Silhouette Edge Rendering silhouette edge detection geometry program Complete mesh Silhouette edges Useful for non-photorealistic rendering Looks like human sketching
More Geometry Shader Examples Shimmering point sprites Generate fins for lines Generate shells for fur rendering
Improved Interpolation Using geometry shader functionality Quadratic normal interpolation True quadrilateral rendering with mean value coordinate interpolation
“Fair” Quadrilateral Interpolation glBegin(GL_QUADS); glColor3fv(red);   glVertex3fv(lowerLeft); glColor3fv(green);   glVertex3fv(lowerRight); glColor3fv(red);   glVertex3fv(upperRight); glColor3fv(blue);   glVertex3fv(upperLeft); glEnd(); Geometry shader actually operates on 4-vertex  GL_LINE_ADJACENCY  primitives instead of quads Wrong , slash triangle split   Wrong , backslash triangle split   Better : Mean value coordinates
Geometry Shader-based Bump Map Setup Vertex shader does skinning Problem: how does texture-space basis for bump mapping respond to arbitrary skinning? Solution: geometry shader constructs per-triangle texture-basis using post-skinning vertex positions and normals So geometry shader: Computes object-to-texture space basis for triangle Can account of texture mirroring in normal map Transforms object-space vectors to texture space Outputs triangle Fragment shader uses texture-space normals for bump map shading
Cg Code Shader performs texture-basis setup Can compile to GLSL or HLSL 10 code Cg 2.2 feature See working example code in Cg 2.2 TRIANGLE  void md2bump_geometry( AttribArray < float4 > position  :  POSITION , AttribArray < float2 > texCoord  :  TEXCOORD0 , AttribArray < float3 > objPosition :  TEXCOORD1 , AttribArray < float3 > objNormal  :  TEXCOORD2 , AttribArray < float3 > objView  :  TEXCOORD3 , AttribArray < float3 > objLight  :  TEXCOORD4 ) { float3  dXYZdU  = objPosition[1] - objPosition[0]; float   dSdU  = texCoord[1].s  - texCoord[0].s; float3  dXYZdV  = objPosition[2] - objPosition[0]; float   dSdV  = texCoord[2].s  - texCoord[0].s; float3  tangent =  normalize (dSdV * dXYZdU - dSdU * dXYZdV); float  area =  determinant ( float2x2 (dSTdV, dSTdU)); float3  orientedTangent = area >= 0 ? tangent : -tangent; for  ( int  i=0; i<3; i++) { float3  normal  = objNormal[i], binormal =  cross (tangent,normal); float3x3  basis =  float3x3 (orientedTangent, binormal, normal); float3  surfaceLightVector :  TEXCOORD1  =  mul (basis, objLight[i]); float3  surfaceViewVector  :  TEXCOORD2  =  mul (basis, objView[i]); emitVertex (position[i], texCoord[i], surfaceLightVector, surfaceViewVector); } }
Geometry Shader-based Shadow Volume Generation un-shadowed bump-mapped shading  via geometry shader texture-space basis setup shadow volume extrusion by geometry shader shadow region stencil multi-pass combination of shadowed and un-shadowed shading
Miscellaneous some other 3.2 goodness
Tripped Up By Near/Far Clipping Conventionally 3D APIs “clip” to near & far view frustum planes Results in classic artifacts Geometry is “cut open” by near clip plane Naïvely moving near plane closer poorly distributes depth buffer precision Alternatively, geometry is “lost” beyond the far clip plane no clipping problem closer to alien near clip plane cuts open alien head
Depth Clamping to the Rescue Depth clamping API Easy to enable/disable glEnable(GL_DEPTH_CLAMP); glDisable(GL_DEPTH_CLAMP); What it does Disables near & far clip planes But this allows depth values to interpolate beyond [0,1] representable range of the depth buffer So additionally clamps interpolated values to [0,1] range
Depth Clamping Applications Avoid near plane “cut opens” via depth clamping Fragment shader replaces color of z=0 fragments with black In GLSL: if (gl_FragCoord.z == 0)   gl_FragColor = vec4(0,0,0,1); Alternatively, use Painter’s algorithm for objects at the near plane Last (or first) fragment at z=0 “wins” Infinite Z-fail Shadow volumes See [Everett & Kilgard 2002] Conserves depth buffer precision when eye-space infinity must be within depth range
Near Plane Depth Clamping Example without depth clamping  depth clamping enabled * * simple situation because depth   complexity at z=0 is a single layer
Seam-free Cube Map Edges Cube maps have edges along each face Traditionally texture mapping hardware simply clamps to these seam edges Results in “seam” artifacts Particularly when level-of-detail bias is large Meaning very blurry levels But seams appear sharply Use  glEnable( GL_TEXTURE_CUBE_MAP_SEAMLESS)  to mitigate these artifacts seam
Seamless Cube Maps: Before and After Before: with edge seams After: without seams
Remaining OpenGL 3.2 Features Async objects Synchronization of GPU completion Supports synchronization between multiple contexts Draw elements base index Provides a base added to all vertex indices Multisampled renderbuffers Also can query framebuffer’s sample locations
Beyond OpenGL 3.2 NVIDIA’s further contributions
Texture arrays 1D texture array 2D texture array Cube map texture array Multisample 2D texture multisample 2D texture array multisample  All of OpenGL’s Texture Targets Conventional targets 1D texture 2D texture 3D texture Special addressing Cube map texture cube face selection Rectangle texture [0..w]x[0..h] range Texture buffer 1D unfiltered buffer objects
Bindless Graphics NVIDIA keeps building faster and faster GPUs But that x86 core feeding the GPU isn’t getting faster at anything near the same rate! Makes your application more & more likely to be CPU limited, instead of GPU limited Bundling OpenGL state in objects helps But time goes on… GPUs keep getting faster… Eventually even binding to objects becomes a bottleneck Hence the desire for “bindless” graphics Extensions: NV_vertex_buffer_unified_memory  (VBUM) for bindless vertex pulling NV_shader_buffer_load  (SBL) for bindless buffer loads from shaders
“Classic” OpenGL 1.0 Model Application Driver GPU command buffer GPU Video memory wide stream of commands wide interconnect OpenGL commands contains data directly Examples: immediate mode vertices, pixels to draw, downloaded texels Inefficient All data flows through the CPU GPU can’t access the data directly from video memory
Object Bind Model of OpenGL 2.x/3.x OpenGL commands “name” objects to use Objects allow GPU to access object data (texels, vertices, pixels, constants, etc.) via fast video memory directly Driver must lookup and access object’s vital information Tends to generate lots of cache misses Cache misses are the bane of modern, fast CPUs Application Driver GPU command buffer GPU Video memory narrow stream of commands wide interconnect System memory expensive stream of cache misses
Bindless Graphics Model of OpenGL OpenGL commands and shaders can use GPU addresses of buffers So driver doesn’t have to translate to addresses & doesn’t take cache misses GPU addresses for Vertex buffer offsets Constant loads from buffers within shaders Application Driver GPU command buffer GPU Video memory narrow stream of commands wide interconnect feedback GPU address at  creation time
Direct State Access Existing OpenGL model Bind-to-edit ,  bind-to-query ,  bind-to-use One bind operation for all three purposes To change a GL object, you must first “bind” to it Example glBindTexture(GL_TEXTURE_2D, obj); gl Tex Parameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); Bind-to-edit leads to unnecessary re-validations NEW  additional Direct State Access (DSA) approach Edit-by-name To change a GL object, name the object to change Example gl Texture ParameteriEXT(obj, GL_TEXTURE_2D,   GL_TEXTURE_MIN_FILTER, GL_LINEAR); Extension:  EXT_direct_state_access
What is the root of the problem? “ Selectors” OpenGL state that tells  which state   other  OpenGL commands should update Think of selectors as “sticky” phantom parameters to all your matrix, texture, program, buffers, etc. commands and queries Examples of selectors glMatrixMode glActiveTexture glBindTexture glBindProgramARB glUseProgram two  distinct  selectors for texture commands, extra confusing
Reasons to Avoid Selectors Direct3D has an “edit-by-name” model of operation Means Direct3D has no selectors Having to manage selectors when porting Direct3D or console code to OpenGL is awkward Requires deferring updates to minimize selector and object bind changes Layered libraries can’t count of selector state To be safe when updating sate controlled by selectors, such libraries must use idiom Save selector, Set selector, Update state, Restore selector Bad for performance, particularly bad for dual-core drivers since queries are expensive Cg 2.2  October 2009  makes use of DSA automatically when available
Direct State Access Advantages Less error-prone Consider this code glRotatef(phi, x,y,z); Which matrix did you change? Depends on how the matrix mode selector was last left! Instead consider the DSA version gl Matrix RotatefEXT( GL_MODELVIEW , phi, x,y,z); Another example Consider this code glActiveTexture(GL_TEXTURE3); some_function(); glBindTexture(GL_TEXTURE2D, 89); But what if  some_function   calls  glActiveTexture ? It might not now, but could in the future! Instead use glBind Multi TextureEXT( GL_TEXTURE3 , GL_TEXTURE_2D, 89); Problem solved!
Direct State Access Advantages More efficient layered libraries Consider a library that uses OpenGL commands to create a texture object from an image file Example: loadPNGtoGLtexture(GLuint texobj, …); Ideally, calling loadPNGtoGLtexture shouldn’t disturb the current bound texture Preserving the current bound texture requires a  save-selector/change-state/restore-selector  idiom GLint saved_current_binding; glGetIntegerv(GL_TEXTURE_BINDING_2D, &saved_current_binding); glBindTexture(GL_TEXTURE_2D, texobj); // now you can change texobj with bind-to-edit commands glBindTexture(GL_TEXTURE_2D, saved_current_binding); But save/change/restore undermines dual-core OpenGL operation Because GL queries of the selector sync the app and driver threads DSA routines avoid disturbing selectors Cg 2.2 October 2009 is an example of such a library
Latched State Direct State Access solves another problem Some OpenGL state is “latched” by subsequent commands Think of latched state as phantom parameters to commands that come from the OpenGL state Examples:  pixel store (pack/unpack) state, vertex array state Provides new commands glPushClientAttribDefaultEXT command Like glPushClientAttrib but also resets affected state to default Fast and efficient
Copy Image Fast copies of pixels between image objects 1D textures, 2D textures, 3D textures, cube maps, texture rectangles, 1D texture arrays, 2D texture arrays, cube map texture arrays, & render-buffers all work Pixel data can be 1D, 2D, or 3D Best part Image objects can belong to distinct OpenGL rendering contexts Even when contexts do not share objects! Even when contexts on system’s different physical GPUs Extension:  NV_copy_image
Basic Copy Image Command Basic prototype, for within a context void  glCopyImageSubDataNV(   GLuint  srcName,  GLenum  srcTarget,  GLint  srcLevel,   GLint  srcX,  GLint  srcY,  GLint  srcZ,   GLuint  dstName,  GLenum  dstTarget,  GLint  dstLevel,   GLint  dstX,  GLint  dstY,  GLint  dstZ,   GLsizei  width,  GLsizei  height,  GLsizei  depth ); Color key: source arguments destination arguments sub-image dimensions
Texture Barrier Background Framebuffer objects allow rendering into textures Nothing keeps you from sampling a texture you are also bound to, though the behavior is specified to be undefined Provides a mechanism to avoid read-after-write hazards when rendering into a bound texture In limited circumstances Reads (including all filtered samples) and writes are to/from disjoint pixels There is only a single read and write of a pixel by a fragment shader “over” that pixel without an intervening  glTextureBarrierNV()  command Extension:  NV_texture_barrier
Improved: Parameter Buffer Object Parameter buffer objects give shaders access to values stored in buffers Also called  constant  or  uniform  buffers Supported by Cg 2.2’s BUFFER semantics Originally just 32-bit scalars or 32-bit 4-component vectors Now 1, 2, 4, 8, or 16 byte accesses allowed Extension:  NV_parameter_buffer_object2
Separate Shader Objects Combining different GLSL shaders at once Needed linking Better to allow mixing and matching of shader objects Like Direct3D Like OpenGL assembly extensions Extension:   EXT_separate_shader_objects  (SSO) Specular brick bump mapping Red diffuse Wobbly torus Smooth torus Different GLSL vertex shaders Different GLSL fragment shaders
Separate Shader Object Binding Per-domain binding glUseShaderProgramEXT(GL_VERTEX_SHADER, vprog); glUseShaderProgramEXT(GL_GEOMETRY_SHADER, gprog); glUseShaderProgramEXT(GL_FRAGMENT_SHADER, fprog); Uses a linked program object, but only the portion of that linked program for the specified domain Introduces selector for glUniform calls glActiveProgramEXT(program_updated_by_glUniform); Better to use DSA’s selector-free  glProgramUniform*EXT  commands
glUseProgram Equivalence Question:   What does the existing glUseProgram call “mean” in the context of SSO? glUseProgram(glsl_prog); Answer : It is  exactly  equivalent to these calls: glUseShaderProgramEXT(GL_VERTEX_SHADER, glsl_prog); glUseShaderProgramEXT(GL_GEOMETRY_SHADER, glsl_prog); glUseShaderProgramEXT(GL_FRAGMENT_SHADER, glsl_prog); glActiveProgramEXT(glsl_prog);
Convenient 1-Step Single-domain Shader Loading GLSL requires elaborate multi-step API for compiling/linking a shader Over-kill for separate shader objects Desirable to have an API more like glProgramStringARB 1-Step command   glCreateShaderProgramEXT(   GLenum domain,   const char *shader_string); Just a convenience function You don’t have to use it for SSO You can still create separate shaders with multi-step API Sometimes necessary for binding attributes and fragment out locations
glCreateShaderProgramEXT Equivalent to:  const  GLuint  shader =  glCreateShader (type); if (shader) { const  GLint  len = ( GLint ) strlen(string); glShaderSource (shader, 1, &string, &len); glCompileShader (shader); const  GLuint  program =  glCreateProgram (); if (program) { GLint compiled =  GL_FALSE ; glGetShaderiv (shader,  GL_COMPILE_STATUS , &compiled); if (compiled) { glAttachShader( program, shader); glLinkProgram (program); glDetachShader (program, shader); } // Possibly... if ( active-user-defined-varyings-in-linked-program ) { append-error-to-info-log set-program-link-status-false } append-shader-info-log-to-program-info-log } glDeleteShader (shader); return program; } else { return 0; }
Passing Varyings Between Separate Shader Objects Programs in separate domains  should  pass varyings through builtin varyings (NOT user-specified varyings) So instead of  varying float4 my_varying; Use a built-in such as  gl_Texcoord[0] Guarantees up-stream and down-stream domains rendezvous with the same value Use of user-declared varyings are  undefined Compiling Cg code to GLSL profiles guarantees this is the case Cg has semantics to indicate how varyings correspond to API resources Example Cg declaration:  float4 my_varying : TEXCOORD0;
Thoughts of OpenGL Future what direction now?
Where Do OpenGL Extensions Come From? 44% of extensions are “core” or multi-vendor Lots of vendors have initiated extensions Extending OpenGL is industry-wide collaboration  EXT  SGI SGIS SGIX  ARB  NV  Others  Others  ATI  APPLE  MESA  Source:  http://www.opengl.org/registry  (Dec 2008)
What’s Driving OpenGL Modernization?  Human desire for Visual Intuition and Entertainment Embarrassing Parallelism of Graphics Increasing Semiconductor Density Particularly the hardware-amenable, latency tolerant nature of rasterization Particularly interactive video games
Conclusions NVIDIA’s OpenGL driver leads the industry Functional, performance, & semantic parity with Direct3D NVIDIA provides OpenGL 3.2  now If past is prologue…    NVIDIA OpenGL extensions: where  to-be-core  functionality shows up first Get a head-start by using the functionality now All  new GPU functionality exposed for OpenGL in first shipping NVIDIA driver
More Information NVIDIA OpenGL 3.2 driver Available now! http://developer.nvidia.com/object/opengl_3_driver.html OpenGL 3.2 specification http://www.opengl.org/registry/doc/glspec32.compatibility.20090803.pdf NVIDIA’s OpenGL extension registry http://developer.nvidia.com/object/nvidia_opengl_specs.html Cg Toolkit 2.2 October 2009 Includes geometry shader examples shown here http://developer.nvidia.com/object/ cg_toolkit.html
Links to Specific Extension Specifications Provoking Vertex Vertex Array BGRA Depth Clamp Texture Multisample Seamless Cube Map Fragment Coordinate Conventions Synchronization Objects Geometry Shaders Bindless graphics Shader Buffer Load Vertex Buffer Unified Memory Direct State Access Separate Shader Objects Copy Image Texture Barrier Draw Elements Base Vertex

Recommended

PPTX
FrameGraph: Extensible Rendering Architecture in Frostbite
PPTX
The Rendering Pipeline - Challenges & Next Steps
 
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPTX
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
PDF
Rendering AAA-Quality Characters of Project A1
PDF
Siggraph2016 - The Devil is in the Details: idTech 666
PDF
Graphics Gems from CryENGINE 3 (Siggraph 2013)
PPTX
Siggraph 2011: Occlusion culling in Alan Wake
 
PPTX
Approaching zero driver overhead
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
PPT
Secrets of CryENGINE 3 Graphics Technology
PPTX
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
PPT
A Bit More Deferred Cry Engine3
PDF
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
 
PDF
Taking Killzone Shadow Fall Image Quality Into The Next Generation
PPTX
Siggraph 2016 - Vulkan and nvidia : the essentials
PDF
Rendering Techniques in Rise of the Tomb Raider
PPTX
Shiny PC Graphics in Battlefield 3
PPT
Star Ocean 4 - Flexible Shader Managment and Post-processing
PPTX
Beyond porting
PPTX
Hable John Uncharted2 Hdr Lighting
PPTX
Stochastic Screen-Space Reflections
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
PPT
Bending the Graphics Pipeline
PPT
Light prepass
PPTX
Parallel Futures of a Game Engine
 
PPT
NVIDIA OpenGL in 2016
PPTX
OpenGL 4.5 Update for NVIDIA GPUs

More Related Content

PPTX
FrameGraph: Extensible Rendering Architecture in Frostbite
PPTX
The Rendering Pipeline - Challenges & Next Steps
 
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPTX
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
PDF
Rendering AAA-Quality Characters of Project A1
PDF
Siggraph2016 - The Devil is in the Details: idTech 666
PDF
Graphics Gems from CryENGINE 3 (Siggraph 2013)
PPTX
Siggraph 2011: Occlusion culling in Alan Wake
 
FrameGraph: Extensible Rendering Architecture in Frostbite
The Rendering Pipeline - Challenges & Next Steps
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Rendering AAA-Quality Characters of Project A1
Siggraph2016 - The Devil is in the Details: idTech 666
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Siggraph 2011: Occlusion culling in Alan Wake
 

What's hot

PPTX
Approaching zero driver overhead
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
PPT
Secrets of CryENGINE 3 Graphics Technology
PPTX
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
PPT
A Bit More Deferred Cry Engine3
PDF
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
 
PDF
Taking Killzone Shadow Fall Image Quality Into The Next Generation
PPTX
Siggraph 2016 - Vulkan and nvidia : the essentials
PDF
Rendering Techniques in Rise of the Tomb Raider
PPTX
Shiny PC Graphics in Battlefield 3
PPT
Star Ocean 4 - Flexible Shader Managment and Post-processing
PPTX
Beyond porting
PPTX
Hable John Uncharted2 Hdr Lighting
PPTX
Stochastic Screen-Space Reflections
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
PPT
Bending the Graphics Pipeline
PPT
Light prepass
PPTX
Parallel Futures of a Game Engine
 
Approaching zero driver overhead
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Rendering Technologies from Crysis 3 (GDC 2013)
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Secrets of CryENGINE 3 Graphics Technology
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
A Bit More Deferred Cry Engine3
Penner pre-integrated skin rendering (siggraph 2011 advances in real-time r...
 
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Siggraph 2016 - Vulkan and nvidia : the essentials
Rendering Techniques in Rise of the Tomb Raider
Shiny PC Graphics in Battlefield 3
Star Ocean 4 - Flexible Shader Managment and Post-processing
Beyond porting
Hable John Uncharted2 Hdr Lighting
Stochastic Screen-Space Reflections
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
Bending the Graphics Pipeline
Light prepass
Parallel Futures of a Game Engine
 

Viewers also liked

PPT
NVIDIA OpenGL in 2016
PPTX
OpenGL 4.5 Update for NVIDIA GPUs
PDF
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
PPTX
[1023 박민수] 깊이_버퍼_그림자_1
PPTX
OpenGL Shading Language
PPT
Real-time Shadowing Techniques: Shadow Volumes
PPTX
그림자 이야기
PPT
CS 354 More Graphics Pipeline
PDF
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
 
PDF
Haskell Accelerate
PDF
gtkgst video in your widgets!
PDF
Gallium3D - Mesa's New Driver Model
PPTX
13th kandroid OpenGL and EGL
PPT
Shadow Volumes on Programmable Graphics Hardware
PPT
CS 354 Project 2 and Compression
PDF
Shaders in Unity by Zoel
PPTX
ICON 2011 Introduction to OpenGL ES
PPTX
Practical Volume Rendering for realtime applications
PDF
Notes2StudyGST-160511
PPTX
Opportunity and prototype
NVIDIA OpenGL in 2016
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
[1023 박민수] 깊이_버퍼_그림자_1
OpenGL Shading Language
Real-time Shadowing Techniques: Shadow Volumes
그림자 이야기
CS 354 More Graphics Pipeline
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
 
Haskell Accelerate
gtkgst video in your widgets!
Gallium3D - Mesa's New Driver Model
13th kandroid OpenGL and EGL
Shadow Volumes on Programmable Graphics Hardware
CS 354 Project 2 and Compression
Shaders in Unity by Zoel
ICON 2011 Introduction to OpenGL ES
Practical Volume Rendering for realtime applications
Notes2StudyGST-160511
Opportunity and prototype

Similar to OpenGL 3.2 and More

PPT
NVIDIA's OpenGL Functionality
PPT
SIGGRAPH 2012: NVIDIA OpenGL for 2012
PPTX
What is OpenGL ?
PDF
Introduction of openGL
PDF
Computer Graphics - Lecture 01 - 3D Programming I
PPTX
OpenGL basics
PPTX
Opengl presentation
PPT
OpenGL 4 for 2010
PDF
lectureAll-OpenGL-complete-Guide-Tutorial.pdf
PPT
NVIDIA Graphics, Cg, and Transparency
PPT
GTC 2009 OpenGL Barthold
PPT
Introduction To Geometry Shaders
PPT
CS 354 Programmable Shading
PPT
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
PPT
Hardware Shaders
PPTX
3 CG_U1_P2_PPT_3 OpenGL.pptx
PDF
Chapter-3.pdf
PPT
Advanced Graphics Workshop - GFX2011
PDF
Open gl
NVIDIA's OpenGL Functionality
SIGGRAPH 2012: NVIDIA OpenGL for 2012
What is OpenGL ?
Introduction of openGL
Computer Graphics - Lecture 01 - 3D Programming I
OpenGL basics
Opengl presentation
OpenGL 4 for 2010
lectureAll-OpenGL-complete-Guide-Tutorial.pdf
NVIDIA Graphics, Cg, and Transparency
GTC 2009 OpenGL Barthold
Introduction To Geometry Shaders
CS 354 Programmable Shading
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
Hardware Shaders
3 CG_U1_P2_PPT_3 OpenGL.pptx
Chapter-3.pdf
Advanced Graphics Workshop - GFX2011
Open gl

More from Mark Kilgard

PPTX
Migrating from OpenGL to Vulkan
PPT
NVIDIA OpenGL and Vulkan Support for 2017
PPT
NVIDIA OpenGL 4.6 in 2017
PPT
OpenGL for 2015
PPT
GTC 2012: GPU-Accelerated Path Rendering
PPT
GTC 2012: NVIDIA OpenGL in 2012
PPT
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
PDF
GPU-accelerated Path Rendering
PDF
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
PPT
Virtual Reality Features of NVIDIA GPUs
PPT
Computers, Graphics, Engineering, Math, and Video Games for High School Students
PPT
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
PDF
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
PPT
NV_path rendering Functional Improvements
PPT
CS 354 Final Exam Review
PDF
D11: a high-performance, protocol-optional, transport-optional, window system...
PPT
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
PPT
GPU accelerated path rendering fastforward
PPT
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
PPT
EXT_window_rectangles
Migrating from OpenGL to Vulkan
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL 4.6 in 2017
OpenGL for 2015
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: NVIDIA OpenGL in 2012
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
GPU-accelerated Path Rendering
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
Virtual Reality Features of NVIDIA GPUs
Computers, Graphics, Engineering, Math, and Video Games for High School Students
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
NV_path rendering Functional Improvements
CS 354 Final Exam Review
D11: a high-performance, protocol-optional, transport-optional, window system...
CS 354 Surfaces, Programmable Tessellation, and NPR Graphics
GPU accelerated path rendering fastforward
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
EXT_window_rectangles

Recently uploaded

PDF
[BDD 2025 - Mobile Development] Exploring Apple’s On-Device FoundationModels
PDF
Beyond Basics: How to Build Scalable, Intelligent Imagery Pipelines
PDF
Mulesoft Meetup Online Portuguese: MCP e IA
PDF
Running Non-Cloud-Native Databases in Cloud-Native Environments_ Challenges a...
PDF
Transforming Supply Chains with Amazon Bedrock AgentCore (AWS Swiss User Grou...
PPTX
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
PDF
Oracle MySQL HeatWave - One Page - Version 3
PDF
Oracle MySQL HeatWave - Short - Version 3
PDF
Mastering UiPath Maestro – Session 2 – Building a Live Use Case - Session 2
PDF
Top Crypto Supers 15th Report November 2025
PPTX
Leon Brands - Intro to GPU Occlusion (Graphics Programming Conference 2024)
PDF
Dev Dives: Build smarter agents with UiPath Agent Builder
PDF
How Much Does It Cost to Build an eCommerce Website in 2025.pdf
PDF
The Evolving Role of the CEO in the Age of AI
PDF
The Necessity of Digital Forensics, the Digital Forensics Process & Laborator...
PDF
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
PDF
Crane Accident Prevention Guide: Key OSHA Regulations for Safer Operations
PDF
Transcript: The partnership effect: Libraries and publishers on collaborating...
PDF
The partnership effect: Libraries and publishers on collaborating and thrivin...
PDF
"DISC as GPS for team leaders: how to lead a team from storming to performing...
 
[BDD 2025 - Mobile Development] Exploring Apple’s On-Device FoundationModels
Beyond Basics: How to Build Scalable, Intelligent Imagery Pipelines
Mulesoft Meetup Online Portuguese: MCP e IA
Running Non-Cloud-Native Databases in Cloud-Native Environments_ Challenges a...
Transforming Supply Chains with Amazon Bedrock AgentCore (AWS Swiss User Grou...
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
Oracle MySQL HeatWave - One Page - Version 3
Oracle MySQL HeatWave - Short - Version 3
Mastering UiPath Maestro – Session 2 – Building a Live Use Case - Session 2
Top Crypto Supers 15th Report November 2025
Leon Brands - Intro to GPU Occlusion (Graphics Programming Conference 2024)
Dev Dives: Build smarter agents with UiPath Agent Builder
How Much Does It Cost to Build an eCommerce Website in 2025.pdf
The Evolving Role of the CEO in the Age of AI
The Necessity of Digital Forensics, the Digital Forensics Process & Laborator...
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
Crane Accident Prevention Guide: Key OSHA Regulations for Safer Operations
Transcript: The partnership effect: Libraries and publishers on collaborating...
The partnership effect: Libraries and publishers on collaborating and thrivin...
"DISC as GPS for team leaders: how to lead a team from storming to performing...
 

OpenGL 3.2 and More

  • 1.
    San Jose| September 30, 2009 | Mark J. Kilgard, NVIDIA Corporation OpenGL 3.2 and More
  • 2.
    Mark J. KilgardPrincipal System Software Engineer OpenGL driver Cg shading language OpenGL Utility Toolkit (GLUT) implementer co-author of Cg Tutorial
  • 3.
    Overview OpenGL 3.2Available today What’s in it? NVIDIA’s additional functionality Above & beyond OpenGL 3.2
  • 4.
    A brief 2-slide review of OpenGL 3.0 & 3.1 Before we get really started… You are already familiar and using OpenGL 3.1 aren’t you??
  • 5.
    For review, OpenGL3.0 Texturing Integer & floating-point texture formats Compact floating-point formats sRGB color space texture formats 1- and 2-component compressed texture formats 1D and 2D texture array targets Miscellaneous Vertex array objects Conditional rendering Multisample-aware stretch blits Fine control over mapping & flushing buffer sub-ranges Framebuffer functionality Render-to-texture with framebuffer objects sRGB blending Packed depth/stencil formats for render-buffers (and texturing) Per-color-attachment blend enables and color write masks Shader improvements OpenGL Shading Language 1.30
  • 6.
    For review, OpenGL3.1 Texturing Guarantees 16 texture units Texture buffer objects Texture rectangle target: 2D image with [0..width, 0..height] coordinate space Signed normalized texture formats Miscellaneous Fast data copying between buffer objects Primitive restart index for vertex arrays Shader improvements OpenGL Shading Language 1.40 Shader can access uniform values from buffer objects Instanced rendering provides instance counter to vertex shader
  • 7.
    OpenGL 3.2 modernGPU functionality, platform portability, API maturity & completeness
  • 8.
    From the 1994OpenGL 1.1 Data Flow… vertex processing rasterization & fragment coloring texture raster operations framebuffer pixel unpack pixel pack vertex puller client memory pixel transfer glReadPixels / glCopyPixels / glCopyTex{Sub}Image glDrawPixels glBitmap glCopyPixels glTex{Sub}Image glCopyTex{Sub}Image glDrawElements glDrawArrays selection / feedback / transform feedback glVertex* glColor* glTexCoord* etc. blending depth testing stencil testing accumulation storage access operations
  • 9.
    … OpenGL 1.0in detail Vertex processing Pixel processing Texture mapping Image primitive processing Pixel unpacking Pixel packing Vertex assembly texture image specification image rectangles, bitmaps primitive topology, transformed vertex data stenciling, depth testing, blending, accumulation pixel image primitive batch type, vertex attributes primitive batch type, vertex data fragment texture fetches pixel image or texture image specification image and bitmap fragments point, line, and polygon fragments pixels to pack unpacked pixels pixels fragments filtered texels buffer data vertices Legend programmable operations fixed-function operations copy pixels, copy texture image Fragment processing Geometric primitive assembly & processing Raster operations Framebuffer Command parser
  • 10.
    … to the2009 OpenGL 3.2 Data Flow Vertex processing Pixel processing Texture mapping Geometric primitive assembly & processing Image primitive processing Transform feedback Pixel unpacking Pixel packing Vertex assembly pixels in framebuffer object textures texture buffer objects texture image specification image rectangles, bitmaps primitive topology, transformed vertex data vertex texture fetches pixel pack buffer objects pixel unpack buffer objects vertex buffer objects transform feedback buffer objects buffer data, unmap buffer geometry texture fetches primitive batch type, vertex indices, vertex attributes primitive batch type, vertex data fragment texture fetches pixel image or texture image specification map buffer, get buffer data transformed vertex attributes image and bitmap fragments point, line, and polygon fragments pixels to pack unpacked pixels pixels fragments filtered texels buffer data vertices Legend programmable operations fixed-function operations copy pixels, copy texture image Buffer store uniform/ parameters buffer objects Fragment processing stenciling, depth testing, blending, accumulation Raster operations Framebuffer Command parser
  • 11.
    Buffer Centric Viewof OpenGL Vertex Array Buffer Object (VaBO) Transform Feedback Buffer (XBO) Parameter Buffer (PaBO) Pixel Unpack Buffer (PuBO) Pixel Pack Buffer (PpBO) Bindable Uniform Buffer (BUB) Texture Buffer Object (TexBO) Vertex Puller Vertex Shading Geometry Shading Fragment Shading Texturing Array Element Buffer Object (VeBO) Pixel Pipeline vertex data texel data pixel data parameter data ( not ARB functionality yet ) glBegin, glDrawElements, etc. glDrawPixels, glTexImage2D, etc. glReadPixels, etc. Framebuffer
  • 12.
    OpenGL 3.2 FunctionalOverview Direct3D-isms BGRA vertex component ordering Provoking vertex convention Drawing commands allowing modification of the base vertex index Upper-left and lower-left fragment coordinate conventions Geometry shaders Per-primitive programmability Shader improvements OpenGL Shading Language 1.50 Miscellaneous Depth clamping, synchronization, seamless cube map filtering, multisample improvements
  • 13.
    Direct3Disms better OpenGL& Direct3D content portability
  • 14.
    Direct3Dism Motivation Aposteriori “3D content tied to API” scheme Without intending it, 3D application content gets tied to API’s conventions Your OpenGL application OpenGL driver same GPU Direct3D driver Your OpenGL application content Your Direct3D application Your Direct3D application content OpenGL conventions Direct3D conventions content authored to OpenGL conventions content authored to Direct3D conventions OpenGL API Direct3D API hardware interface 3D API interface
  • 15.
    NVIDIA Recognizes 3DAPI Reality You decide the 3D API best for your application Lots of reasons to pick your API choice Target systems, intended market, cross-platform requirements, software legacy, content creation vs. deployment, etc. Fundamentally, NVIDIA believes in Visual Computing (not APIs) So is essentially agnostic about your 3D API choice OpenGL, Direct3D 9/10/11, or OpenGL ES NVIDIA provides best implementations of all options; you pick NVIDIA’s belief in Visual Computing means Your 3D API choice shouldn’t tie down your 3D application or 3D content
  • 16.
    Direct3Dism Concept Allowsyour 3D content to be API agnostic OpenGL supports both OpenGL & Direct3D conventions, so support either style Your OpenGL application OpenGL driver GPU Direct3D driver Your OpenGL application content Your Direct3D application Your Direct3D application content OpenGL API Direct3D API content authored to OpenGL conventions content authored to Direct3D conventions OpenGL + Direct3D conventions Direct3D conventions hardware interface 3D API interface Direct3D conventions supported by OpenGL too
  • 17.
    OpenGL & Direct3DConventions OpenGL 3.2 First vertex of primitive Last vertex of primitive (mostly) Provoking vertex for flat-shading OpenGL 3.2 Upper-left Lower-left Fragment coordinate origin Cg HLSL 9, 10, and 11 GLSL Shading Language syntax Convention OpenGL Direct3D Addressed by Window origin Lower-left, pixels at half-integers Upper-left, pixels on integers (DX9) pixels on half-integers (DX 10) projection matrix & front-facing re-configuration Clip space [-1…+1] 3 [-1…+1] 2 [0…1] projection matrix re-configuration 4-byte vertex color RGBA BGRA OpenGL 3.2 Shader bind granularity Linked (for GLSL) Per-domain (for Cg & assembly) Per-domain EXT_separate shader_objects Object manipulation Bind-to-edit, Bind-to-query Edit-by-name, Query-by-name EXT_direct_ state_access
  • 18.
    Dealing with APIConvention Differences Innocuous differences API granularity OpenGL fine-grain state vs. Direct3D 10 state blocks OpenGL selectors versus Direct3D direct state access Easily dealt with by reconfiguring existing state Examples: window origin & clip space conventions Formidable differences Format differences Unsupported formats such as 4-byte BGRA vertex colors Inconsistent state management Per-domain shaders vs. monolithic GLSL shaders Shaders coded to a particular shading language syntax GLSL vs. HLSL, achieve commonality via Cg Conventions baked into shaders Fragment coordinate origin as visible from a fragment shader fairly easy to address in your application difficult to address without 3D API help
  • 19.
    Impetus for Direct3DismEffort Many software companies motivated this effort TransGaming Blizzard Destineer Aspyr CodeWeavers Direct result of feedback from 3D software engineers Yes, you can influence OpenGL’s direction & course
  • 20.
    Supporting Direct3DismsNot New to OpenGL OpenGL has always supported multiple formats well OpenGL’s plethora of pixel and vertex formats Very first OpenGL extension: EXT_bgra Provides a pixel component ordering to match the color component ordering of Windows for 2D GDI rendering Made core functionality by OpenGL 1.3 Many OpenGL extensions have embraced Direct3Disms Secondary color Fog coordinate Point sprites OpenGL 3.0’s fine-grain buffer mapping
  • 21.
    BGRA Vertex ArrayOrder Direct3D 9’s most common usage for per-vertex colors is 32-bit D3DCOLOR data type: Red in bits 16:23 Green in bits 8:15 Blue in bits 0:7 Alpha in bits 24:31 Laid in memory, looks like BGRA order OpenGL assumes RGBA order for all vertex arrays However Direct3D colors not stored in packed unsigned bytes have RGBA order Direct3Dism EXT_vertex_array_bgra extension allows: glColorPointer( GL_BGRA , GL_UNSIGNED_BYTE, stride, pointer); glSecondaryColorPointer( GL_BGRA , GL_UNSIGNED_BYTE, stride, pointer); glVertexAttribPointer( GL_BGRA , GL_UNSIGNED_BYTE, stride, pointer); 8-bit red 8-bit alpha 8-bit green 8-bit blue bit 31 bit 0
  • 22.
    Provoking Vertex OrderConventions Direct3D uses “first” vertex of a triangle or line to determine which color is used for flat shading OpenGL uses “last” vertex for lines, triangles, and quads Except for polygons ( GL_POLYGON ) mode that use the first vertex Direct3D 9 pDev->SetRenderState( D3DRS_SHADEMODE, D3DSHADE_FLAT); OpenGL glShadeModel(GL_FLAT); Input triangle strip with per-vertex colors
  • 23.
    Configurable Provoking VertexEasy-to-use API New command glProvokingVertex // “native” OpenGL convention glProvokingVertex(GL_LAST_VERTEX_CONVENTION); // Direct3D convention glProvokingVertex(GL_FIRST_VERTEX_CONVENTION); OpenGL 3.2 promotion of EXT_provoking_vertex extension Affects fixed-function glShadeModel flat shaded attributes for fragment shaders geometry shaders that emit flat shaded attributes
  • 24.
    Provoking Vertex DetailsProvoking vertex sounds really obscure Technically shade model is part of “deprecated” feature set of OpenGL However very common mode for real-time strategy games Many, many objects drawn this way Very difficult for application to “juggle” vertex data to match API’s native provoking vertex convention Particularly when using vertex buffer objects Quad behavior may vary Direct3D doesn’t support quadrilateral primitives So “first vertex” provoking vertex convention may or may not apply to quadrilateral primitives GeForce 8 say true for “quads follow the convention” GeForce 7 and earlier say false for “quads follow the convention” Check GL_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION boolean if you care
  • 25.
    Provoking Vertex Behaviorgeometry shader primitives Last vertex convention First vertex convention Primitive type of polygon i 2i+3 2i-1 GL_TRIANGLE_STRIP_ADJACENCY 6i-1 6i-5 GL_TRIANGLE_ADJACENCY i+2 i+1 GL_LINE_STRIP_ADJACENCY 4i-1 4i-2 GL_LINES_ADJACENCY i i GL_POLYGON 2i+2 , if quads follow provoking vertex 2i+2 , if not 2i-1 2i+2 GL_QUAD_STRIP 4i , if quads follow provoking vertex 4i , if not 4i-3 4i GL_QUADS i+2 i+1 GL_TRIANGLE_FAN i+2 i GL_TRIANGLE_STRIP 3i 3i-2 GL_TRIANGLES i+1 i GL_LINE_STRIP i+1 , if i<n 1, if i=n i GL_LINE_LOOP 2i 2i-1 GL_LINES i i GL_POINT same same same same
  • 26.
    Direct3D vs. OpenGLCoordinate System Conventions Window origin conventions Direct3D = upper-left origin OpenGL = lower-left origin Pixel center conventions Direct3D9 = pixel centers at integer locations OpenGL and Direct3D 10 = pixel centers at half-pixel locations Makes pixel centers for rasterization “match” texel centers for texturing Clip space conventions Direct3D = [-1,+1] for XY, [0,1] for Z OpenGL = [-1,+1] range for XYZ Affects How projection matrix is loaded Fragment shaders that access the window position Point sprites have upper-left texture coordinate origin OpenGL already lets application choose lower-left or upper-left
  • 27.
    3 APIs, 3Different Window Space Conventions Pixel center grids coordinate systems OpenGL Direct3D 9 Direct3D 10 Upper-left origin Lower-left origin = pixel sample center
  • 28.
    Direct3D 9 toOpenGL How to go from Direct3D ’s [-1,+1]x[-1,+1]x[0,1] clip space to OpenGL’s [-1,+1] 3 integer-centered pixel centers to OpenGL’s half-pixel centers Simple state adjustment Projection matrix fudge glMatrixLoadIdentityEXT(GL_PROJECTION); glMatrixScalefEXT(GL_PROJECTION, 1, -1, 2); glMatrixTranslatefEXT(GL_PROJECTION, 0.5/windowWidth, 0.5/windowHeight, -0.5); Reverse convention for what is front-facing glFrontFace(GL_CW); // OpenGL default is GL_CCW Compensates for y-flip that reverses coordinate system’s handedness No need for API additions to support Direct3D 9’s system
  • 29.
    Direct3D 10 toOpenGL How to go from Direct3D 10’s [-1,+1]x[-1,+1]x[0,1] clip space to OpenGL’s [-1,+1] 3 where both APIs have half-pixel centers Simple state adjustment Projection matrix fudge glMatrixLoadIdentityEXT(GL_PROJECTION); glMatrixScalefEXT(GL_PROJECTION, 1, -1, 2); glMatrixTranslatefEXT(GL_PROJECTION, 0, 0, // no half-pixel shift for Direct3 10 -0.5); Reverse convention for what is front-facing glFrontFace(GL_CW); // OpenGL default is GL_CCW Compensates for y-flip that reverses coordinate system’s handedness Again, no need for API additions to support Direct3D 10’s system
  • 30.
    Fragment Coordinate ConventionUsage Typically used in post-processing shaders Examples: Motion blur Depth-of-field Shader assumes a particular convention for the fragment coordinate origin Attempting to “re-write” Direct3D shader tends to Compromise shader performance Introduces new “window height” uniform that must be always set correctly Hard to do automatically and robustly Robust approach: Allow shader author (or automatic translator) to specify convention explicitly
  • 31.
    Fragment Shader CoordinateConventions Required GLSL introduction #extension GL_ARB_fragment_coord_conventions : require Pick one of the following GLSL declarations: // “native” OpenGL convention in vec4 gl_FragCoord; // DirectX 9 convention layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord; // DirectX 10 convention layout(origin_upper_left) in vec4 gl_FragCoord; Also supported by NVIDIA assembly extensions OPTION ARB_fragment_coord_origin_upper_left; OPTION ARB_fragment_coord_pixel_center_integer;
  • 32.
    Deprecation there’s “old”& there’s “still supported”
  • 33.
    Deprecation – OpenGLARB view OpenGL has never removed features. However, After 15+ years, defining new features to work with old features becomes increasingly difficult OpenGL 3.0 marks features as deprecated OpenGL 3.0 does not remove any features Redundant, legacy and obsolete features Parts of OpenGL unlikely to be accelerated Guidance to developers to prepare for future revisions
  • 34.
    Deprecation – OpenGLARB view OpenGL 3.1 removed these deprecated features Added support back with ARB_compatibility extension OpenGL 3.2 formalized this in two profiles “ Core” profile with features removes “ Compatibility” profile with all features present Implementation of “Core” mandatory “ Compatibility” optional
  • 35.
    Deprecation – NVIDIAview Set of removed functionality is in use by applications today, helping our customer’s business Using just “Core” OpenGL 3.2 is a huge effort in rewriting existing code OpenGL 3.2 “Core” not offering enough incentive to re-write existing code Deprecation is NOT in the best interest of ISVs and therefore not in NVIDIA’s business interest
  • 36.
    Deprecation – NVIDIAview We will not remove ANY feature from our drivers OpenGL on NVIDIA will be fully backwards compatible NVIDIA has and will ship the Compatibility profile NVIDIA will fully support, tune and bug fix all features See our public statement: http://developer.nvidia.com/object/opengl_3_driver.html
  • 37.
    Deprecation – MythsFeature removal will result in a faster driver Feature removal will result in a higher quality driver Feature removal will result in a cleaner API Not removing features means OpenGL will die Only useless features were deprecated Far from true
  • 38.
    So You canjust ignore Deprecation NVIDIA values OpenGL API backward compatibility We don’t take API functionality away from you We aren’t going to force you to re-write apps Does deprecated functionality “stay fast”? Yes, of course—and stays fully tested Bottom-line: Old & new features run fast
  • 39.
  • 40.
    Geometry Shaders viaOpenGL Programmability for geometric primitives one geometric primitive in, zero or more primitives out Supported by NVIDIA’s OpenGL driver since GeForce 8 launch NV_gpu_program4 for assembly Cg 2.x’s gp4gp “geometry” profile NV_geometry_shader4 / EXT_geometry_shader4 for (GLSL) Standardized as an ARB extension in OpenGL 3.1 timeframe ARB_geometry_shader4 Now finally core functionality in OpenGL 3.2 Essentially unchanged from EXT and ARB versions
  • 41.
    Geometry Shaders Newprogrammable shader domain Operates on assembled primitives Triangles, lines, points, and new adjacency primitives Outputs zero or more primitives Must be point, line stripes, or triangle strips Primitive restarts allowed Warning: Not well suited for unbounded tessellation application Vertex shader Primitive assembly Geometry shader Rasterizer Fragment shader Raster operations framebuffer application programmable
  • 42.
    Geometry Shader SilhouetteEdge Rendering silhouette edge detection geometry program Complete mesh Silhouette edges Useful for non-photorealistic rendering Looks like human sketching
  • 43.
    More Geometry ShaderExamples Shimmering point sprites Generate fins for lines Generate shells for fur rendering
  • 44.
    Improved Interpolation Usinggeometry shader functionality Quadratic normal interpolation True quadrilateral rendering with mean value coordinate interpolation
  • 45.
    “Fair” Quadrilateral InterpolationglBegin(GL_QUADS); glColor3fv(red); glVertex3fv(lowerLeft); glColor3fv(green); glVertex3fv(lowerRight); glColor3fv(red); glVertex3fv(upperRight); glColor3fv(blue); glVertex3fv(upperLeft); glEnd(); Geometry shader actually operates on 4-vertex GL_LINE_ADJACENCY primitives instead of quads Wrong , slash triangle split Wrong , backslash triangle split Better : Mean value coordinates
  • 46.
    Geometry Shader-based BumpMap Setup Vertex shader does skinning Problem: how does texture-space basis for bump mapping respond to arbitrary skinning? Solution: geometry shader constructs per-triangle texture-basis using post-skinning vertex positions and normals So geometry shader: Computes object-to-texture space basis for triangle Can account of texture mirroring in normal map Transforms object-space vectors to texture space Outputs triangle Fragment shader uses texture-space normals for bump map shading
  • 47.
    Cg Code Shaderperforms texture-basis setup Can compile to GLSL or HLSL 10 code Cg 2.2 feature See working example code in Cg 2.2 TRIANGLE void md2bump_geometry( AttribArray < float4 > position : POSITION , AttribArray < float2 > texCoord : TEXCOORD0 , AttribArray < float3 > objPosition : TEXCOORD1 , AttribArray < float3 > objNormal : TEXCOORD2 , AttribArray < float3 > objView : TEXCOORD3 , AttribArray < float3 > objLight : TEXCOORD4 ) { float3 dXYZdU = objPosition[1] - objPosition[0]; float dSdU = texCoord[1].s - texCoord[0].s; float3 dXYZdV = objPosition[2] - objPosition[0]; float dSdV = texCoord[2].s - texCoord[0].s; float3 tangent = normalize (dSdV * dXYZdU - dSdU * dXYZdV); float area = determinant ( float2x2 (dSTdV, dSTdU)); float3 orientedTangent = area >= 0 ? tangent : -tangent; for ( int i=0; i<3; i++) { float3 normal = objNormal[i], binormal = cross (tangent,normal); float3x3 basis = float3x3 (orientedTangent, binormal, normal); float3 surfaceLightVector : TEXCOORD1 = mul (basis, objLight[i]); float3 surfaceViewVector : TEXCOORD2 = mul (basis, objView[i]); emitVertex (position[i], texCoord[i], surfaceLightVector, surfaceViewVector); } }
  • 48.
    Geometry Shader-based ShadowVolume Generation un-shadowed bump-mapped shading via geometry shader texture-space basis setup shadow volume extrusion by geometry shader shadow region stencil multi-pass combination of shadowed and un-shadowed shading
  • 49.
  • 50.
    Tripped Up ByNear/Far Clipping Conventionally 3D APIs “clip” to near & far view frustum planes Results in classic artifacts Geometry is “cut open” by near clip plane Naïvely moving near plane closer poorly distributes depth buffer precision Alternatively, geometry is “lost” beyond the far clip plane no clipping problem closer to alien near clip plane cuts open alien head
  • 51.
    Depth Clamping tothe Rescue Depth clamping API Easy to enable/disable glEnable(GL_DEPTH_CLAMP); glDisable(GL_DEPTH_CLAMP); What it does Disables near & far clip planes But this allows depth values to interpolate beyond [0,1] representable range of the depth buffer So additionally clamps interpolated values to [0,1] range
  • 52.
    Depth Clamping ApplicationsAvoid near plane “cut opens” via depth clamping Fragment shader replaces color of z=0 fragments with black In GLSL: if (gl_FragCoord.z == 0) gl_FragColor = vec4(0,0,0,1); Alternatively, use Painter’s algorithm for objects at the near plane Last (or first) fragment at z=0 “wins” Infinite Z-fail Shadow volumes See [Everett & Kilgard 2002] Conserves depth buffer precision when eye-space infinity must be within depth range
  • 53.
    Near Plane DepthClamping Example without depth clamping depth clamping enabled * * simple situation because depth complexity at z=0 is a single layer
  • 54.
    Seam-free Cube MapEdges Cube maps have edges along each face Traditionally texture mapping hardware simply clamps to these seam edges Results in “seam” artifacts Particularly when level-of-detail bias is large Meaning very blurry levels But seams appear sharply Use glEnable( GL_TEXTURE_CUBE_MAP_SEAMLESS) to mitigate these artifacts seam
  • 55.
    Seamless Cube Maps:Before and After Before: with edge seams After: without seams
  • 56.
    Remaining OpenGL 3.2Features Async objects Synchronization of GPU completion Supports synchronization between multiple contexts Draw elements base index Provides a base added to all vertex indices Multisampled renderbuffers Also can query framebuffer’s sample locations
  • 57.
    Beyond OpenGL 3.2NVIDIA’s further contributions
  • 58.
    Texture arrays 1Dtexture array 2D texture array Cube map texture array Multisample 2D texture multisample 2D texture array multisample All of OpenGL’s Texture Targets Conventional targets 1D texture 2D texture 3D texture Special addressing Cube map texture cube face selection Rectangle texture [0..w]x[0..h] range Texture buffer 1D unfiltered buffer objects
  • 59.
    Bindless Graphics NVIDIAkeeps building faster and faster GPUs But that x86 core feeding the GPU isn’t getting faster at anything near the same rate! Makes your application more & more likely to be CPU limited, instead of GPU limited Bundling OpenGL state in objects helps But time goes on… GPUs keep getting faster… Eventually even binding to objects becomes a bottleneck Hence the desire for “bindless” graphics Extensions: NV_vertex_buffer_unified_memory (VBUM) for bindless vertex pulling NV_shader_buffer_load (SBL) for bindless buffer loads from shaders
  • 60.
    “Classic” OpenGL 1.0Model Application Driver GPU command buffer GPU Video memory wide stream of commands wide interconnect OpenGL commands contains data directly Examples: immediate mode vertices, pixels to draw, downloaded texels Inefficient All data flows through the CPU GPU can’t access the data directly from video memory
  • 61.
    Object Bind Modelof OpenGL 2.x/3.x OpenGL commands “name” objects to use Objects allow GPU to access object data (texels, vertices, pixels, constants, etc.) via fast video memory directly Driver must lookup and access object’s vital information Tends to generate lots of cache misses Cache misses are the bane of modern, fast CPUs Application Driver GPU command buffer GPU Video memory narrow stream of commands wide interconnect System memory expensive stream of cache misses
  • 62.
    Bindless Graphics Modelof OpenGL OpenGL commands and shaders can use GPU addresses of buffers So driver doesn’t have to translate to addresses & doesn’t take cache misses GPU addresses for Vertex buffer offsets Constant loads from buffers within shaders Application Driver GPU command buffer GPU Video memory narrow stream of commands wide interconnect feedback GPU address at creation time
  • 63.
    Direct State AccessExisting OpenGL model Bind-to-edit , bind-to-query , bind-to-use One bind operation for all three purposes To change a GL object, you must first “bind” to it Example glBindTexture(GL_TEXTURE_2D, obj); gl Tex Parameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); Bind-to-edit leads to unnecessary re-validations NEW additional Direct State Access (DSA) approach Edit-by-name To change a GL object, name the object to change Example gl Texture ParameteriEXT(obj, GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); Extension: EXT_direct_state_access
  • 64.
    What is theroot of the problem? “ Selectors” OpenGL state that tells which state other OpenGL commands should update Think of selectors as “sticky” phantom parameters to all your matrix, texture, program, buffers, etc. commands and queries Examples of selectors glMatrixMode glActiveTexture glBindTexture glBindProgramARB glUseProgram two distinct selectors for texture commands, extra confusing
  • 65.
    Reasons to AvoidSelectors Direct3D has an “edit-by-name” model of operation Means Direct3D has no selectors Having to manage selectors when porting Direct3D or console code to OpenGL is awkward Requires deferring updates to minimize selector and object bind changes Layered libraries can’t count of selector state To be safe when updating sate controlled by selectors, such libraries must use idiom Save selector, Set selector, Update state, Restore selector Bad for performance, particularly bad for dual-core drivers since queries are expensive Cg 2.2 October 2009 makes use of DSA automatically when available
  • 66.
    Direct State AccessAdvantages Less error-prone Consider this code glRotatef(phi, x,y,z); Which matrix did you change? Depends on how the matrix mode selector was last left! Instead consider the DSA version gl Matrix RotatefEXT( GL_MODELVIEW , phi, x,y,z); Another example Consider this code glActiveTexture(GL_TEXTURE3); some_function(); glBindTexture(GL_TEXTURE2D, 89); But what if some_function calls glActiveTexture ? It might not now, but could in the future! Instead use glBind Multi TextureEXT( GL_TEXTURE3 , GL_TEXTURE_2D, 89); Problem solved!
  • 67.
    Direct State AccessAdvantages More efficient layered libraries Consider a library that uses OpenGL commands to create a texture object from an image file Example: loadPNGtoGLtexture(GLuint texobj, …); Ideally, calling loadPNGtoGLtexture shouldn’t disturb the current bound texture Preserving the current bound texture requires a save-selector/change-state/restore-selector idiom GLint saved_current_binding; glGetIntegerv(GL_TEXTURE_BINDING_2D, &saved_current_binding); glBindTexture(GL_TEXTURE_2D, texobj); // now you can change texobj with bind-to-edit commands glBindTexture(GL_TEXTURE_2D, saved_current_binding); But save/change/restore undermines dual-core OpenGL operation Because GL queries of the selector sync the app and driver threads DSA routines avoid disturbing selectors Cg 2.2 October 2009 is an example of such a library
  • 68.
    Latched State DirectState Access solves another problem Some OpenGL state is “latched” by subsequent commands Think of latched state as phantom parameters to commands that come from the OpenGL state Examples: pixel store (pack/unpack) state, vertex array state Provides new commands glPushClientAttribDefaultEXT command Like glPushClientAttrib but also resets affected state to default Fast and efficient
  • 69.
    Copy Image Fastcopies of pixels between image objects 1D textures, 2D textures, 3D textures, cube maps, texture rectangles, 1D texture arrays, 2D texture arrays, cube map texture arrays, & render-buffers all work Pixel data can be 1D, 2D, or 3D Best part Image objects can belong to distinct OpenGL rendering contexts Even when contexts do not share objects! Even when contexts on system’s different physical GPUs Extension: NV_copy_image
  • 70.
    Basic Copy ImageCommand Basic prototype, for within a context void glCopyImageSubDataNV( GLuint srcName, GLenum srcTarget, GLint srcLevel, GLint srcX, GLint srcY, GLint srcZ, GLuint dstName, GLenum dstTarget, GLint dstLevel, GLint dstX, GLint dstY, GLint dstZ, GLsizei width, GLsizei height, GLsizei depth ); Color key: source arguments destination arguments sub-image dimensions
  • 71.
    Texture Barrier BackgroundFramebuffer objects allow rendering into textures Nothing keeps you from sampling a texture you are also bound to, though the behavior is specified to be undefined Provides a mechanism to avoid read-after-write hazards when rendering into a bound texture In limited circumstances Reads (including all filtered samples) and writes are to/from disjoint pixels There is only a single read and write of a pixel by a fragment shader “over” that pixel without an intervening glTextureBarrierNV() command Extension: NV_texture_barrier
  • 72.
    Improved: Parameter BufferObject Parameter buffer objects give shaders access to values stored in buffers Also called constant or uniform buffers Supported by Cg 2.2’s BUFFER semantics Originally just 32-bit scalars or 32-bit 4-component vectors Now 1, 2, 4, 8, or 16 byte accesses allowed Extension: NV_parameter_buffer_object2
  • 73.
    Separate Shader ObjectsCombining different GLSL shaders at once Needed linking Better to allow mixing and matching of shader objects Like Direct3D Like OpenGL assembly extensions Extension: EXT_separate_shader_objects (SSO) Specular brick bump mapping Red diffuse Wobbly torus Smooth torus Different GLSL vertex shaders Different GLSL fragment shaders
  • 74.
    Separate Shader ObjectBinding Per-domain binding glUseShaderProgramEXT(GL_VERTEX_SHADER, vprog); glUseShaderProgramEXT(GL_GEOMETRY_SHADER, gprog); glUseShaderProgramEXT(GL_FRAGMENT_SHADER, fprog); Uses a linked program object, but only the portion of that linked program for the specified domain Introduces selector for glUniform calls glActiveProgramEXT(program_updated_by_glUniform); Better to use DSA’s selector-free glProgramUniform*EXT commands
  • 75.
    glUseProgram Equivalence Question: What does the existing glUseProgram call “mean” in the context of SSO? glUseProgram(glsl_prog); Answer : It is exactly equivalent to these calls: glUseShaderProgramEXT(GL_VERTEX_SHADER, glsl_prog); glUseShaderProgramEXT(GL_GEOMETRY_SHADER, glsl_prog); glUseShaderProgramEXT(GL_FRAGMENT_SHADER, glsl_prog); glActiveProgramEXT(glsl_prog);
  • 76.
    Convenient 1-Step Single-domainShader Loading GLSL requires elaborate multi-step API for compiling/linking a shader Over-kill for separate shader objects Desirable to have an API more like glProgramStringARB 1-Step command glCreateShaderProgramEXT( GLenum domain, const char *shader_string); Just a convenience function You don’t have to use it for SSO You can still create separate shaders with multi-step API Sometimes necessary for binding attributes and fragment out locations
  • 77.
    glCreateShaderProgramEXT Equivalent to: const GLuint shader = glCreateShader (type); if (shader) { const GLint len = ( GLint ) strlen(string); glShaderSource (shader, 1, &string, &len); glCompileShader (shader); const GLuint program = glCreateProgram (); if (program) { GLint compiled = GL_FALSE ; glGetShaderiv (shader, GL_COMPILE_STATUS , &compiled); if (compiled) { glAttachShader( program, shader); glLinkProgram (program); glDetachShader (program, shader); } // Possibly... if ( active-user-defined-varyings-in-linked-program ) { append-error-to-info-log set-program-link-status-false } append-shader-info-log-to-program-info-log } glDeleteShader (shader); return program; } else { return 0; }
  • 78.
    Passing Varyings BetweenSeparate Shader Objects Programs in separate domains should pass varyings through builtin varyings (NOT user-specified varyings) So instead of varying float4 my_varying; Use a built-in such as gl_Texcoord[0] Guarantees up-stream and down-stream domains rendezvous with the same value Use of user-declared varyings are undefined Compiling Cg code to GLSL profiles guarantees this is the case Cg has semantics to indicate how varyings correspond to API resources Example Cg declaration: float4 my_varying : TEXCOORD0;
  • 79.
    Thoughts of OpenGLFuture what direction now?
  • 80.
    Where Do OpenGLExtensions Come From? 44% of extensions are “core” or multi-vendor Lots of vendors have initiated extensions Extending OpenGL is industry-wide collaboration EXT SGI SGIS SGIX ARB NV Others Others ATI APPLE MESA Source: http://www.opengl.org/registry (Dec 2008)
  • 81.
    What’s Driving OpenGLModernization? Human desire for Visual Intuition and Entertainment Embarrassing Parallelism of Graphics Increasing Semiconductor Density Particularly the hardware-amenable, latency tolerant nature of rasterization Particularly interactive video games
  • 82.
    Conclusions NVIDIA’s OpenGLdriver leads the industry Functional, performance, & semantic parity with Direct3D NVIDIA provides OpenGL 3.2 now If past is prologue…  NVIDIA OpenGL extensions: where to-be-core functionality shows up first Get a head-start by using the functionality now All new GPU functionality exposed for OpenGL in first shipping NVIDIA driver
  • 83.
    More Information NVIDIAOpenGL 3.2 driver Available now! http://developer.nvidia.com/object/opengl_3_driver.html OpenGL 3.2 specification http://www.opengl.org/registry/doc/glspec32.compatibility.20090803.pdf NVIDIA’s OpenGL extension registry http://developer.nvidia.com/object/nvidia_opengl_specs.html Cg Toolkit 2.2 October 2009 Includes geometry shader examples shown here http://developer.nvidia.com/object/ cg_toolkit.html
  • 84.
    Links to SpecificExtension Specifications Provoking Vertex Vertex Array BGRA Depth Clamp Texture Multisample Seamless Cube Map Fragment Coordinate Conventions Synchronization Objects Geometry Shaders Bindless graphics Shader Buffer Load Vertex Buffer Unified Memory Direct State Access Separate Shader Objects Copy Image Texture Barrier Draw Elements Base Vertex

[8]ページ先頭

©2009-2025 Movatter.jp