Movatterモバイル変換


[0]ホーム

URL:


Mark Kilgard, profile picture
Uploaded byMark Kilgard
PPT, PDF147,855 views

SIGGRAPH Asia 2008 Modern OpenGL

The document discusses the design and evolution of Modern OpenGL, detailing its history from Iris GL to OpenGL 3.0 and beyond. It features contributions from key figures like Kurt Akeley and Mark Kilgard, who played significant roles in its development at companies such as Silicon Graphics and NVIDIA. The document highlights OpenGL's competitive strengths, design philosophy, and the introduction of new features across various versions.

Embed presentation

Downloaded 899 times
1
2Mark J. Kilgard, NVIDIAKurt Akeley, Microsoft Research13 December 2008SingaporeModern OpenGL:Its Design and Evolution
3Introductions
4Kurt Akeley• Led development of OpenGL at Silicon Graphics (SGI)• Co-founded SGI• Lead development of SGI’s high-end graphics hardware• Co-author of OpenGL specification• Returned to Stanford University to complete Ph.D.• Co-developed Cg “C for graphics” language at NVIDIA• Principal Researcher, Microsoft Research Silicon Valley• Spent time at Microsoft Research Asia in Beijing• Member of US National Academy of Engineering
5Mark Kilgard• Principal System Software Engineer, NVIDIA, Austin, Texas• Developed original OpenGL driver for 1stGeForce GPU• Specified many key OpenGL extensions• Works on Cg for portable programmable shading• NVIDIA Distinguished Inventor• Before NVIDIA, worked at Silicon Graphics• Worked on X Window System integration for OpenGL• Developed popular OpenGL Utility Toolkit (GLUT)• Wrote book on OpenGL and X, co-authored Cg Tutorial
6Marc Levoy• Moderator for our facilitated discussion• Professor of Computer Science and ElectricalEngineering• Stanford University• SIGGRAPH Computer Graphics Achievement Award• ACM Fellow
7Course Schedule• Modern OpenGL (Kilgard)• OpenGL’s evolution: a personal retrospective (Akeley)• Writing better OpenGL (Kilgard)• Implementing OpenGL (Kilgard)• OpenGL’s future evolution (Kilgard)• OpenGL in Context (Akeley, Kilgard, Levoy)• Facilitated conversation– Mid-session break –
8Check Out the Course Notes (1)• Look to www.opengl.org web site for our final slides• New Material• “An Incomplete History of OpenGL” (Kilgard)• How the OpenGL graphics system developed• “Using Vertex Buffer Objects Well” (Kilgard)• Learn how to use Vertex Buffers objects for highvertex processing rates
9Check Out the Course Notes (2)• Paper Reprints• OpenGL design rationale from its specification co-authors (Segal, Akeley)• Realizing OpenGL: two implementations of onearchitecture (Kilgard)• Graphics hardware: GTX, RealityEngine,InfiniteReality, GeForce 6800• Key developments in graphics hardware designover last 20 years• GPU Programmability: “User-Programmable VertexEngine” and “Cg” SIGGAPH papers• “How GPUs Work” (Luebke, Humpherys)
10Modern OpenGLMark KilgardPrincipal System Software EngineerNVIDIA
11Modern OpenGL• History• How did OpenGL get where it is now?• Present• Version 3.0• Functionality beyond 3.0
12An Overview History of OpenGL• Pre-history 1991• IRIS GL, a proprietary Graphics Library by SGI• OpenGL, an open standard for 3D• Focus: procedural hardware-accelerated 3D graphics• Governed by Architectural Review Board (ARB)• Extensibility planned into design• Competition• Proprietary APIs (1991-1995)• PHIGS & PEX for X Window System (1992-1997)• Microsoft’s Direct3D (1998-)
13OpenGL’s Pre-historyIRIS GL 1Window system: MEXIRIS GL 2Window system: MEXOperating system: UNIXIRIS GL 3Window system: NeWS/X11Operating system: IRIX 3.xIRIS GL 4Window system: Native X11Operating system: IRIX 4.3OpenGL 1.0Window system: Native X11 with GLXOperating system: IRIX 5.119911993198819861983First work onGL 5.0 proposal1989Dates are forshipping commercialSGI implementation1983-2008 = 25 years
14OpenGL’s Design Philosophy• High-performance• Assumes hardwareacceleration• Defined by a specification• Rather than a de-factoimplementation• Rendering state machine• Procedural• Not a window system,not a scene graph• No initial sub-setting• Extensible• Data type rich• Cross-platform• Window system-independent core• X Window System,Microsoft Windows,OS/2, OS X, etc.• Multi-language bindings• C, FORTRAN, etc.• Not merely an API,rather a system
15Timeline of OpenGL’s Development1992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)
16Competitive 3D APIs• OpenGL has always existed incompetition with other APIs• Strengthened OpenGL by drivingfeature parity• OpenGL’s competitive strengths:1. Cross platform, open process2. API stability, extensibility3. Clean initial design & specification1992 1994 1996 1998 2000 2002 2004 2006 2008Proprietary Unix workstation 3D APIsXGLDoréStarbaseIRIS GLX Consortium 3D standardPEXMicrosoft Direct3DDirectX 3DirectX 5DirectX 6DirectX 7DirectX 8DirectX 9DirectX 10
17OpenGL 1.01992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)•Immediate mode•Vertex transformation and lighting•Points, lines, polygons•Stippling, wide points and lines•Bitmaps, image rectangles, and pixel reads•Pixel store and transfer•1D and 2D textures, fog, and scissor•Display lists and evaluators•RGBA and color index color models•Color, depth, stencil, and accumulation buffers•Selection and feedback modes•Queries
18OpenGL State Machine• From OpenGL 3.0 specification, unchanged since 1.0
19SGI “Classic” Hardware View of OpenGL3D Applicationor Game• Entirely fixed-function, no programmability• High-end SGI hardware manifested functionalityin distinct chipsOpenGL APIFront EndVertexAssemblyVertexTransform & LightingPrimitive Assembly,Clipping, Setup,and RasterizationTexture &FogTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceGraphics HardwareBoundary1992Graphics data flowMemory operationsFixed-function unitProgrammable unit
20OpenGL 1.11992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Vertex arrays• Texture objects• Texture internal formats• Texture sub-image updates• Texture proxies• Copy framebuffer-to-texture• Polygon offset• RGBA logical operations
21The Look of OpenGL 1.1SGI skyfly demoSGI skyfly demoStenciledStenciledshadow volumesshadow volumesIdeas in MotionIdeas in Motion
22OpenGL 1.21992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• 3D textures• Texture edge clamp wrap mode• Texture level-of-detail clamping• BGRA component order• Packed pixel formats• Imaging subset (optional)• Normal rescaling• Separate specular• Vertex array draw elements range
23Akeley’s (Modernized) OpenGL Data Flowvertexshadingrasterization& fragmentshadingtexturerasteroperationsframebufferpixelunpackpixelpackvertexpullerclientmemorypixeltransfer glReadPixels / glCopyPixels / glCopyTex{Sub}ImageglDrawPixelsglBitmapglCopyPixelsglTex{Sub}ImageglCopyTex{Sub}ImageglDrawElementsglDrawArraysselection / feedback / transform feedbackglVertex*glColor*glTexCoord*etc.blendingdepth testingstencil testingaccumulationstorageoperations
24OpenGL 1.2 Imaging SubsetColor TableConvolution(separable or general)Post-convolveScale & BiasPost-convolveColor TableColor MatrixPost-color matrixScale & BiasPost-color matrixColor TableHistogramMin-maxLook-up Table(RGBA-to-RGBA)Look-up Table(Index-to-RGBA)Scale & Bias Shift & AddIndex pixels RGBA pixelsPixel RectangleRasterizationcorefunctionalityARB_imagingsubsetdiscarddiscard
25OpenGL 1.2.11992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Multi-texture(optional)
26Multi-texture Poster Child:Quake 2 Light Maps×(modulate)=lightmaps onlylightmaps onlydecal onlydecal onlycombined scenecombined scene
27GeForce 256 (NV10) View of OpenGL3D Applicationor Game• Vertex pulling (vertex buffer objects) via DMA• Dual-texture, cube maps, and register combinersOpenGL APIGPUFront EndVertexAssemblyVertexTransform & LightingPrimitive Assembly,Clipping, Setup,and RasterizationTexture &FogTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary1999Attribute Fetch
28Hardware Cube MapsRendered sceneRendered sceneDynamicallyDynamicallycreatedcreatedcube map imagecube map imageImage credit:“Guts” GeForce 2 GTS demo,Thant Thessman
29OpenGL 1.31992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Multi-texture (required now)• Cube map texturing• Compressed texture formats• Texture border clamp• Texture environment functions• Add, combine, dot product• Multisample anti-aliasing• Transpose matrix
30GeForce 3 & 4 Ti (NV2x) View of OpenGL3D Applicationor Game• Programmable vertex processing• Highly configurable fragment processingOpenGL APIGPUFront EndVertexAssemblyVertexProgramPrimitive Assembly,Clipping, Setup,and RasterizationMulti-textureshaders &CombinersTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary2001Attribute Fetch
31Vertex ProgrammabilityPaletted matrixPaletted matrixskinningskinningTwister vertex programTwister vertex programPer-vertexPer-vertexcartooncartoonshadingshading
32Configurable Fragment ProcessingBumpy shiny environment mappingBumpy shiny environment mappingChromaticChromaticaberrationaberrationOffset 2D bumpOffset 2D bumpmappingmapping Depth spritesDepth sprites
33OpenGL 1.41992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Automatic mipmap generation• Shadow-mapping• Depth textures and shadow comparisons• Texture level-of-detail bias• Texture mirrored repeat wrap mode• Multi-texture combination• Fog coordinate• Secondary color• Configurable point size attenuation• Color blending improvements• Stencil wrap operations• Window-space raster position specification
34Hardware Shadow MappingWithout shadow mappingWithout shadow mapping WithWith shadow mappingshadow mappingDepth map from lightDepth map from lightsource’s viewsource’s viewDarker is closerDarker is closerlightlightpositionpositionProjective Texturing (1.0) &Polygon Offset (1.1)key enablers
35Shadow Mapping ExplainedPlanar distance from lightPlanar distance from light Depth map projected onto sceneDepth map projected onto scene≤≤ ==lesslessthanthanTrue “un-shadowed”True “un-shadowed”region shown greenregion shown greenequalsequals
36OpenGL 1.51992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Vertex buffer objects (VBOs)• Occlusion queries• Generalized shadow mapping functions
37GeForce FX (NV3x) View of OpenGL3D Applicationor Game• Programmable fragment processing• 16 texture units, IEEE 754 32-bit floating-point• Vertex program branchingOpenGL APIGPUFront EndVertexAssemblyVertexProgramPrimitive Assembly,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary2003Attribute Fetch
38Floating-point Fragment Programmability
39OpenGL FragmentProgram FlowchartMoreInstructions?Read Interpolantsand/or RegistersMap Input values:Swizzle, Negate, etc.Perform InstructionMath / OperationWrite OutputRegister withMaskingBeginFragmentFetch & DecodeNext InstructionTemporaryRegistersinitialized to0,0,0,0OutputDepth & ColorRegistersinitialized to 0,0,0,1InitializeParametersEmit OutputRegisters asTransformedVertexEndFragmentFragmentProgramInstructionLoopFragmentProgramInstructionMemoryTextureFetchInstruction?yesnonoCompute TextureAddress & Level-of-detail & FetchTexelsFilterTexelsyesTextureImagesPrimitiveInterpolants
40Key Trend:Configurability becomes ProgrammabilityFixed-function ProgrammableSimpleConfigurabilityComplexConfigurability
41Core OpenGL fragment texturing & coloringPointRasterizationLineRasterizationPolygonRasterizationPixel RectangleRasterizationBitmapRasterizationFromPrimitiveAssemblyDrawPixelsBitmapConventionalTexture FetchingTextureEnvironmentApplicationColor SumFogTo rasteroperationsCoverageApplicationTexture Unit 0Texture Unit 1Texture Unit 0Texture Unit 1
42 NV1x OpenGL fragment texturing & coloringPointRasterizationLineRasterizationPolygonRasterizationPixel RectangleRasterizationBitmapRasterizationFromPrimitiveAssemblyDrawPixelsBitmapConventionalTexture FetchingTextureEnvironmentApplicationColor SumFogTo rasteroperationsCoverageApplicationRegisterCombinersTexture Unit 0General Stage 1Final StageTexture Unit 1General Stage 0Texture Unit 0Texture Unit 1GL_REGISTER_COMBINERS_NVenable
43Texture Shader 3…Texture Shader 1Texture Shader 0RegisterCombinersNV2x OpenGL fragment texturing & colorinPointRasterizationLineRasterizationPolygonRasterizationPixel RectangleRasterizationBitmapRasterizationFromPrimitiveAssemblyDrawPixelsBitmapConventionalTexture FetchingTextureEnvironmentApplicationColor SumFogTo rasteroperationsCoverageApplicationTexture ShadersGeneral Stage 1Final CombinerGeneral Stage 0General Stage 7…Texture Unit 3…Texture Unit 1Texture Unit 0Texture Unit 3…Texture Unit 1Texture Unit 0GLTEXTURE_SHADER_NVenableGL_REGISTER_COMBINERS_NVenable
44Fragment ProgramInstruction 0Texture Shader 3…Texture Shader 1Texture Shader 0NV3x OpenGL fragment texturing & coloringPointRasterizationLineRasterizationPolygonRasterizationPixel RectangleRasterizationBitmapRasterizationFromPrimitiveAssemblyDrawPixelsBitmapConventionalTexture FetchingTextureEnvironmentApplicationColor SumFogTo rasteroperationsCoverageApplicationTexture ShadersGeneral Stage 1Final CombinerGeneral Stage 0General Stage 7…Texture Unit 3…Texture Unit 1Texture Unit 0Texture Unit 3…Texture Unit 1Texture Unit 0…Fragment ProgramFragment ProgramInstruction 1023GL_REGISTER_COMBINERS_NVenableGLTEXTURE_SHADER_NVenableGL_FRAGMENT_PROGRAM_NVenable!!FP1.0 or!!ARBfp1.0programs
45OpenGL 2.01992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Programmable shading• OpenGL Shading Language (GLSL)• Multiple color buffer rendering targets• Non-power-of-two texture dimensions• Point sprites• Separate blend equation• Two-sided stencil testing
46GeForce 6 & 7 (NV4x/G7x) View of OpenGL3D Applicationor Game• Limited vertex texturing• Fragment branching• Multiple render targets & floating-point blendingOpenGL APIGPUFront EndVertexAssemblyVertexProgramPrimitive Assembly,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary2004Attribute Fetch
47PrimitiveProgramGeForce 8 & 9 (G8x/G9x) View of OpenGL3D Applicationor Game• Primitive (geometry) programs• Parameter reads from buffer objects• Transform feedback (stream out)OpenGL APIGPUFront EndVertexAssemblyVertexProgram,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary2006Attribute FetchPrimitiveAssemblyParameter Buffer Read
48PrimitiveProgramOpenGL Pipeline Fixed-function Steps• Much of functional pipeline remains fixed-function• Vital to maintaining performance and data flow• Hard to compete with hard-wired rasterization, Zcull, and pixel compressionGPUFront EndVertexAssemblyVertexProgram,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory Interface 2006Attribute FetchPrimitiveAssemblyParameter Buffer Read
49PrimitiveProgramOpenGL Pipeline Programmable Domains• New geometry shader domain for per-primitive programmable processing• Unified Streaming Processor Array (SPA) architecture means same capabilitiesfor all domainsGPUFront EndVertexAssemblyVertexProgram,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory Interface 2006Attribute FetchPrimitiveAssemblyParameter Buffer ReadCan beunifiedhardware!
50OpenGL 2.11992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• OpenGL Shading Language(GLSL) improvements• Non-square matrices• Pixel buffer objects (PBOs)• sRGB color space texture formats
51OpenGL 3.01992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• OpenGL Shading Language (GLSL) improvements• New texture fetches• True integer data types and operators• switch/case/default flow control statements• Conditional rendering based on occlusion query results• Transform feedback• Vertex array objects• Floating-point textures, color buffers, and depth buffers• Half-precision vertex arrays• Texture arrays• Integer textures• Red and red-green texture formats• Compressed red and red-green formats• Framebuffer objects (FBOs)• Packed depth-stencil pixel formats• Per-color buffer clearing, blending, and masking• sRGB color space color buffers• Fine-grain buffer mapping and flushing
52Areas of 3.0 Functionality Improvement• Programmability• Shader Model 4.0 features• OpenGL Shading Language (GLSL) 1.30• Texturing• New texture representations and formats• Framebuffer operations• Framebuffer objects• New formats• New copy (blit), clear, blend, and masking operations• Buffer management• Non-blocking and fine-grain update of buffer object data stores• Vertex processing• Vertex array configuration objects• Conditional rendering for occlusion culling• New half-precision vertex attribute formats• Pixel processing• New half-precision external pixel formatsAll BrandNewCoreFeatures
53OpenGL 3.0 Programmability• Shader Model 4.0 additions• True signed & unsigned integer values• True integer operators: ^, &, |, <<. >>, %,~• Texture additions• Texture arrays• Base texture size queries• Texel offsets to fetches• Explicit LOD and derivative control• Integer samplers• Interpolation modifiers: centroid, noperspective, and flat• Vertex array element number: gl_VertexID• OpenGL Shading Language (GLSL) improvements• ## concatenation in pre-processor for macros• switch/case/default statements
54OpenGL 3.0 Texturing Functionality• Texture representation• Texture arrays: indexed access to a set of 1D or 2Dtexture images• Texture formats• Floating-point texture formats• Single-precision (32-bit, IEEE s23e8)• Half-precision (16-bit, s10e5)• Red & red/green texture formats• Intended as FBO framebuffer formats too• Compressed red & red/green texture formats• Shared exponent texture formats• Packed floating-point texture formats
55Texture Arrays• Conventional texture = One logical pre-filtered image• Texture array = index-able plurality of pre-filtered images• Rationale is fewer texture object binds when drawing different objects• No filtering between mipmap sets in a texture array• All mipmap sets in array share same format/border & base dimensions• Both 1D and 2D texture arrays• Require shaders, no fixed-function support• Texture image specification• Use glTexImage3D, glTexSubImage3D, etc. to load 2D texture arrays• No new OpenGL commands for texture arrays• 3rddimension specifies integer array index• No halving in 3rddimension for mipmaps• So 64×128x17 reduces to 32×64×17all the way to 1×1×17
56Texture Arrays Example• Multiple skins packed in texture array• Motivation: binding to one multi-skin texture array avoids texturebind per objectTexture array index0 1 2 3 401234Mipmaplevelindex
57Compact Floating-point Textures• Shared exponent & packed float representations are idealof High Dynamic Range (HDR) applications
58Compact Floating-point Texture Formats• Packed float format• No sign bit, independent exponents• Shared exponent format• No sign bit, shared exponent, no implied leading 15-bitmantissa5-bitexponent6-bitmantissa5-bitexponent6-bitmantissa5-bitexponentbit 31 bit 09-bitmantissa5-bitshared exponent9-bitmantissa9-bitmantissabit 31 bit 0
591- and 2-componentBlock Compression Scheme• Basic 1-component block compression format• Borrowed from alpha compression scheme of S3TC 58-bit B8-bit A2 min/maxvalues64 bits total per block+4x4 Pixel Decoded BlockEncoded Block16 pixels x 8-bit/componet = 128 bits decodedso effectively 2:1 compression16 bits
60Framebuffer Operations• Framebuffer objects• Standardized framebuffer objects (FBOs) for rendering to texturesand renderbuffers• Render-to-texture• Multisample renderbuffers for FBOs• Framebuffer operations• Copies from one FBO to another, including multisample data• Per-color attachment color clears, blending, and write masking• Framebuffer formats• Floating-point color buffers• Floating-point depth buffers• Rendering into framebuffer format with 3 small unsigned floating-point values packed in a 32-bit value• Rendering into sRGB color space framebuffers
61Framebuffer Object Example• Depth peeling for correctly ordered transparency• Great render-to-texture application for FBOs
62Depth Peeling Behind the Scenes• Depth buffer has closest fragment at all pixels• Save depth buffer• Render again, but use depth buffer asshadow map• Discard fragment in front of shadowmap’s depth value• Effectively peels one layer of depth!• Resulting color buffer is 2ndclosest fragment• And depth buffer for 2ndclosestfragments’ depth• Now repeat peeling more layers• Use ping-pong depth buffer scheme• Use occlusion query to detect when nomore fragments to peel• Composite color layers front-to-back (or back-to-front)• Front-to-back peeling can be done duringthe peeling process
63Delicate Color Fidelity with sRGB• Problem: PC display devices have non-linear (sRGB) display gamut—delicate color shading looks wrongConventionalrendering(uncorrectedcolor)Gammacorrect(sRGBrendered)SofterandmorenaturalUnnaturallydeep facialshadowsNVIDIA’s Adriana GeForce 8 Launch Demo
64What is sRGB?• A standard color space• Intended for monitors, printers, and the Internet• Created cooperatively by HP and Microsoft• Non-linear, roughly gamma of 2.2• Intuitively “encodes more dark values”• OpenGL 2.1 already added sRGB texture formats• Texture fetch converts sRGB to linear RGB, then filters• Result takes more than 8-bit fixed-point to represent in shader• 3.0 adds complementary sRGB framebuffer support• “sRGB correct blending” converts framebuffer sRGB to linear,blend with linear color from shader, then convert back to sRGB• Works with FrameBuffer Objects (FBOs)sRGB chromaticity
65So why sRGB? Standard Windows Displayis Not Gamma Corrected• 25+ years of PC graphics, icons, and images depend on not gammacorrecting displays• sRGB textures and color buffers compensates for this“Expected” appearance ofWindows desktop & iconsbut 3D lighting too darkWash-ed out desktop appearance ifcolor response was linearbut 3D lighting is correctGamma1.0Gamma2.2linearcolorresponse
66Vertex Processing• Vertex array configuration• Objects to manage vertex array configuration clientstate• Half-precision floating-point vertex array formats• Vertex output streaming• Stream transformed vertex results into buffer objectdata stores• Occlusion culling• Skip rendering based on occlusion query result
67Miscellaneous• Pixel Processing• Half-precision floating-point pixel external formats• Buffer Management• Non-blocking and fine-grain update of buffer object datastores
68ARB Extensions to OpenGL 3.0• OpenGL 3.0 standard provides new ARB extensions• Extensions go beyond OpenGL 3.0• Standardized at same time as OpenGL 3.0• Support features in hardware today• Specifically• ARB_geometry_shader4—provides per-primitive programmableprocessing• ARB_draw_instanced—gives shader access to instance ID• ARB_texture_buffer_object—allows buffer object to be sampledas a huge 1D unfiltered texture• Shipping today• NVIDIA driver provides all three
69Transform Feedback for Terrain Generationby Recursive Subdivision• Geometry shaders + transform feedback1. Render quads (use 4-vertex line adjacencyprimitive) from vertex buffer object2. Fetch height field3. Stream subdivided positions and normalsto transform feedback “other” bufferobject4. Use buffer object as vertex buffer5. Repeat, ping-pong buffer objectsComputation and data all stays on the GPU!
70Skin Deformation• Capture & re-use geometric deformationsTransformfeedback allowsthe GPU tocalculate theinteractive,deforming elasticskin of the frog
71Silhouette Edge Rendering• Uses geometry shadersilhouetteedgedetectiongeometryshaderComplete meshSilhouette edgesUseful for non-photorealisticrenderingLooks like human sketching
72More Geometry Shader ExamplesShimmeringpoint spritesGeneratefins forlinesGenerateshells forfurrendering
73Improved Interpolation Techniques•Using geometry shader functionalityQuadratic normalinterpolationTrue quadrilateral rendering withmean value coordinate interpolation
74“Fair” Quadrilateral Interpolation• glBegin(GL_QUADS);• glColor3fv(red);glVertex3fv(lowerLeft);• glColor3fv(green);glVertex3fv(lowerRight);• glColor3fv(red);glVertex3fv(upperRight);• glColor3fv(blue);glVertex3fv(upperLeft);• glEnd();• Geometry shader actually operates on4-vertex GL_LINE_ADJACENCYprimitives instead of quadsWrong, slashtriangle splitWrong, backslashtriangle splitBetter: Mean valuecoordinates
75OpenGL 2.x ARB Extensions• Many OpenGL 3.0 extensions have corresponding ARB extensions forOpenGL 2.1 implementations to advertise• Helps get 3.0 functionality out sooner, rather than later• New ARB extensions for 3.0 functionality• ARB_framebuffer_object—framebuffer objects (FBOs) for render-to-texture• ARB_texture_rg—red and red/green texture formats• ARB_map_buffer_region—non-blocking and fine-grain update of bufferobject data stores• ARB_instanced_arrays—instance ID available to shaders• ARB_half_float_vertex—half-precision floating-point vertex array formats• ARB_framebuffer_sRGB—rendering into sRGB color space framebuffers• ARB_texture_compression_rgtc—compressed red and red/green textureformats• ARB_depth_buffer_float—floating-point depth buffers• ARB_vertex_array_object—objects to manage vertex array configurationclient state
76Beyond OpenGL 3.0OpenGL 3.0• EXT_gpu_shader4• NV_conditional_render• ARB_color_buffer_float• NV_depth_buffer_float• ARB_texture_float• EXT_packed_float• EXT_texture_shared_exponent• NV_half_float• ARB_half_float_pixel• EXT_framebuffer_object• EXT_framebuffer_multisample• EXT_framebuffer_blit• EXT_texture_integer• EXT_texture_array• EXT_packed_depth_stencil• EXT_draw_buffers2• EXT_texture_compression_rgtc• EXT_transform_feedback• APPLE_vertex_array_object• EXT_framebuffer_sRGB• APPLE_flush_buffer_range (modified)In GeForce 8, 9, & 2xx Seriesbut not yet core• EXT_geometry_shader4 (now ARB)• EXT_bindable_uniform• NV_gpu_program4• NV_parameter_buffer_object• EXT_texture_compression_latc• EXT_texture_buffer_object (now ARB)• NV_framebuffer_multisample_coverage• NV_transform_feedback2• NV_explicit_multisample• NV_multisample_coverage• EXT_draw_instanced (now ARB)• EXT_direct_state_access• EXT_vertex_array_bgra• EXT_texture_swizzlePlenty of proven OpenGL extensionsfor OpenGL Working Groupto draw upon for OpenGL 3.1
77OpenGL Version Evolution• Now OpenGL is part of Khronos Group• Previously OpenGL’s evolution was governed by the OpenGLArchitectural Review Board (ARB)• Now officially a Khronos working group• Khronos also standardizes OpenCL, OpenVG, etc.• How OpenGL version updates happen• OpenGL participants proposing extensions• Successful extensions are polished and incorporated into core• OpenGL 3.0 is great example of this process• Roughly 20 extensions folded into “core”• Just 3 of those previously unimplemented
7829%17%15%15%4%2%2%2%2%2%2%2%1% 1%4%15%Multi-vendorSilicon GraphicsArchitectural Review BoardNVIDIAATIAppleMesa3DSun MicrosystemsOpenGL ESOpenMLIBMIntense3DHewlett Packard3DfxOtherEXTSGISGISSGIXARBNVOthersOthersOpenGL Extensions by Source• 44% of extensions are “core” or multi-vendor• Lots of vendors have initiated extensions• Extending OpenGL is industry-wide collaborationATIAPPLEMESASource: http://www.opengl.org/registry (Dec 2008)
79What’s Driving OpenGL Modernization?Human desire for VisualIntuition and EntertainmentEmbarrassingParallelism ofGraphicsIncreasingSemiconductorDensityParticularly thehardware-amenable,latency tolerantnature of rasterization Particularlyinteractive video games
80Kurt AkeleyPrincipal ResearcherMicrosoft Research Silicon ValleyOpenGL’s Evolution:A Personal Retrospective
81AA personalpersonal retrospectiveretrospective• My background:• Silicon Graphics, 1982-2001• OpenGL, 1990-2004• Today’s topics:• Computer architecture• Culture and process• For a more complete coverage see:• https://graphics.stanford.edu/wikis/cs448-07-spring/• Mark Kilgard’s excellent course notes
82Jim Clark and the Geometry EngineJim Clark and the Geometry Engine• This text is 24 points– Sub bullets look like thisThe Geometry Engine: A VLSI Geometry System for GraphicsComputer Graphics, Volume 16, Number 3(Proceedings of SIGGRAPH 1982) p127-133, 1982
83Jim’s helpers: the Stanford gangJim’s helpers: the Stanford gangIRIS GLGeometry EngineIRIS GLHardware back-endHardware front-end
84Success!Success! (in 1995)(in 1995)
85Computer Architecture
86What is computer architecture?What is computer architecture?• Architecture: “the minimal set ofproperties that determine what programswill run and what results they will produce”• Implementation: “the logicalorganization of the [computer’s] dataflowand controls”• Realization: “the physical structureembodying the implementation”
87Example: the analog clockExample: the analog clock• Architecture• Circular dial divided into twelfths• Hour hand (short) and minute hand (long)Example from Computer Architecture, Concepts and Evolution,Gerrit A. Blaauw and Frederick P. Brooks, Jr., Addison-Wesley, 1997• Implementation• A weight, driving a pendulum, or• A spring, driving a balance wheel, or• A battery, driving an oscillator, or ….• Realization• Gear ratios, pendulum lengths, battery sizes, ...1211106897 54213
88A useful distinctionA useful distinction• NVIDIA 8800• SIMD, or• SPMD ?L2FBSP SPL1TFThreadProcessorVertex Thread IssueSetup / Rasterization / ZCullPrimitive Thread Issue Fragment Thread IssueData AssemblerApplicationSP SPL1TFSP SPL1TFSP SPL1TFSP SPL1TFSP SPL1TFSP SPL1TFSP SPL1TFL2FBL2FBL2FBL2FBL2FB• Architecture:• SPMD• Implementation:• SIMD• Realization:• ASICSIMD = Single Instruction, Multiple DataSPMD = Single Program, Multiple DataASIC = Application Specific Integrated Circuit
89The mainstream viewThe mainstream view• Table of Contents:• Fundamentals• Instruction Sets• Pipelining• Advanced Pipelining and ILP• Memory-Hierarchy Design• Storage Systems• Interconnection Networks• Multiprocessors
90OpenGL is an architectureBlaauw/Brooks OpenGLDifferentimplementationsIBM 360 30/40/50/65/75AmdahlSGI Indy/Indigo/InfiniteRealityNVIDIA GeForce, ATI Radeon, …CompatibilityCode runs equivalently on allimplementationsTop-level goalConformance tests, …Intentional designIt’s an architecture, whether it wasplanned or not .Carefully planned, though mistakeswere madeConfigurationCan vary amount of resource (e.g.,memory)No feature sub-settingConfiguration attributes (e.g.,framebuffer)Speed Not a formal aspect of architecture No performance queriesValidity of inputs No undefined operationAll errors specifiedNo side effectsLittle undefined operationEnforcementWhen implementation errors arefound, they are fixed.Specification rules!
91But OpenGL is an APIBut OpenGL is an API(Application Programming Interface)(Application Programming Interface)• Yes, Blaauw and Brooks talk about (computer) architectureas though it is always expressed as ISA (Instruction-SetArchitecture)• But …• API is just a higher-level programming interface• “Instruction-Set” Architecture implies other types ofcomputer architectures (such as “API” Architecture)• OpenGL has evolved to include ISA-like interfaces(e.g., the interface below GLSL)
92We didn’t know …We didn’t know …• No mention in spec (even 3.0)• “We view OpenGL as a state …”• First use in “ARB”• Architecture Review Board• Coined by Bill Glazier from “PaloAlto Architecture Review Board”• First formal usage (I know of)• Mark J. Kilgard, Realizing OpenGL: two implementations of onearchitecture, Proceedings of the ACM SIGGRAPH/EUROGRAPHICSworkshop on Graphics hardware, p.45-55, August 03-04, 1997,Los Angeles, California, United States.
93Fred is magnanimousFred is magnanimous
94What is implied by “programmable”?What is implied by “programmable”?• What does it mean to teach programming?• Does running a microwave oven count?• Does defining the geometry of a game “level” count?• Does specifying OpenGL modes count?• This seems to be a somewhat open question• Butler Lampson couldn’t tell me .• Microsoft developers of teaching tools couldn’t tell me.• An online search wasn’t very helpful.• Do we just “know it when we see it”?• Justice Potter Stewart’s definition of pornography
95My try at some formalizationMy try at some formalization• Key ideas:• Composition  choice of placement, sequence• Non-obvious  semantics are interesting and novel• Imperative  maybe there are other kinds of programming“Composition, the organization of elementaloperations into a non-obvious whole, is theessence of imperative programming.”-- Kurt Akeley (Foreword to GPU Gems 3)
96OpenGL has always been programmableOpenGL has always been programmable• Follows directly from being an “architecture”• OpenGL commands are instructions (API as an ISA)• They can be “composed” to create programs• Multi-pass rendering is the prototypical example• But Peercy et al. implemented a RenderMan shader compiler• Invariance was specified from the start (e.g., same fragments)• We set out to enable “usage that we didn’t anticipate”• Obvious for a traditional ISA (e.g., IA32)• Not so obvious for a graphics API• Example: texture applies to all primitives, not just triangles
97Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”glEnable(GL_DEPTH_TEST);glDisable(GL_LIGHTING);glColorMask(false, false, false, false);glEnable(GL_POLYGON_OFFSET_FILL);glPolygonOffset(maxwidth/2, 1);draw solid objectsglDepthMask(GL_FALSE);glColorMask(true, true, true, true);glColor3f(linecolor);glDisable(GL_POLYGON_OFFSET_FILL);glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);draw solid objects againglDisable(GL_DEPTH_TEST);glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);glDepthMask(GL_TRUE);Hidden-line rendering
98Example multi-pass OpenGL “program”Example multi-pass OpenGL “program”glEnable(GL_DEPTH_TEST);glDisable(GL_LIGHTING);glColorMask(false, false, false, false);glEnable(GL_POLYGON_OFFSET_FILL);glPolygonOffset(maxwidth/2, 1);draw solid objectsglDepthMask(GL_FALSE);glColorMask(true, true, true, true);glColor3f(1, 1, 1);glDisable(GL_POLYGON_OFFSET_FILL);glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);glEnable(GL_CULL_FACE);glCullFace(GL_FRONT);draw solid objects againdraw true edges // for a complete hidden-line drawingglDisable(GL_DEPTH_TEST);glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);glDepthMask(GL_TRUE);glDisable(GL_CULL_FACE);Additions to thehidden-line algorithm(previous slide)highlighted in redSilhouette rendering
99InvarianceInvarianceCorollary 1 Fragment generation is invariant with respect tothe state values marked with in Rule 2.
100• Intended to capture completesequence of operations• Also inspired design changes
101Vertex assemblyPrimitive assemblyRasterizationFragment operationsDisplayVertex operationsApplicationPrimitive operationsFramebufferTexture memoryPixel assembly(unpack)Pixel operationsPixel packVertex pipelinePixel pipelineApplicationAll primitives(including pixels) arerasterizedAll vertexes aretreated equally(e.g., lighted)All fragments aretreated equally (e.g.,texture mapped anddepth-buffered)Not a requiredimplementation,but “abstractiondistance” matters
102Culture and Process
103Suppose …Suppose …http://www.opengl.org/registry/NameARB_texture_cube_mapName StringsGL_ARB_texture_cube_mapNoticeCopyright OpenGL Architectural Review Board, 1999.ContactMichael Gold, NVIDIA (gold 'at' nvidia.com)StatusComplete. Approved by ARB on 12/8/1999VersionLast Modified Date: December 14, 1999NumberARB Extension #7DependenciesNone.Written based on the wording of the OpenGL 1.2.1 specification but not dependent on it.OverviewThis extension provides a new texture generation scheme for cube map textures. Instead of thecurrent texture providing a 1D, 2D, or 3D lookup into a 1D, 2D, or 3D texture image, the texture is aset of six 2D images representing the faces of a cube. The (s,t,r) texture coordinates …
104Complete specificationComplete specificationNameName StringsNoticeContactStatusVersionNumberDependenciesOverviewIssuesNew Procedures and FunctionsNew TokensAdditions to Chapter 2 of the OpenGL SpecificationAdditions to Chapter 3 of the OpenGL SpecificationAdditions to Chapter 4 of the OpenGL SpecificationAdditions to Chapter 5 of the OpenGL SpecificationAdditions to Chapter 6 of the OpenGL SpecificationAdditions to the GLX SpecificationErrorsNew State (type, query mechanism, initial value, attribute set, specification section)Usage Examples
10519 issues19 issuesThe spec just linearly interpolates the reflection vectors computedper-vertex across polygons. Is there a problem interpolatingreflection vectors in this way?Probably. The better approach would be to interpolate the eyevector and normal vector over the polygon and perform the reflectionvector computation on a per-fragment basis. Not doing so is likelyto lead to artifacts because angular changes in the normal vectorresult in twice as large a change in the reflection vector as normalvector changes. The effect is likely to be reflections that becomeglancing reflections too fast over the surface of the polygon.Note that this is an issue for REFLECTION_MAP_ARB, but notNORMAL_MAP_ARB.
10619 issues …19 issues …What happens if an (s,t,q) is passed to cube map generation thatis close to (0,0,0), ie. a degenerate direction vector?RESOLUTION: Leave undefined what happens in this case (butmay not lead to GL interruption or termination).Note that a vector close to (0,0,0) may be generated as aresult of the per-fragment interpolation of (s,t,r) betweenvertices.
107Trust and integrityTrust and integrity• Lots of collaboration during the initial design• But final decisions made by a small group• SGI played fair• OpenGL 1.0 didn’t favor SGI equipment (our ports were late )• SGI obeyed all conformance rules• SGI didn’t adjust the spec to match our equipment• The ARB avoided marketing tasks such as benchmarks• We stuck with technical design issues• We documented rigorously• Specification, man pages, …
108Five Kinkos in Austin TexasFive Kinkos in Austin TexasThe OpenGL Graphics System: A Specification (Version 1.1)Mark SegalKurt AkeleyEditor: Chris FrazierCopyright © 1992-1997 Silicon Graphics, Inc.This document contains unpublished information ofSilicon Graphics, Inc.
109Extension factsExtension facts• 442 Vendor and “EXT” extension specifications• Vendor: specific to a single vendor• EXT: shared by two or more vendors• 56 “ARB” extensions• Standardized , likely to be in the next spec revision• Lots of text …Source: OpenGL extension registry, December 2008
110““Specification” sizesSpecification” sizesLines Words Chars56 ARB Extensions 48,674 263,908 2,221,347All 442 Extensions 209,426 1,076,008 9,079,063King James Bible 114,535 823,647 5,214,085New Testament 27,319 188,430 1,197,812Old Testament 86,783 632,515 3,998,303
111Beyond the specificationBeyond the specification• The ARB (now replaced with Khronos)• Rules of order, secretary, IP, …• The extension process• Categories, token syntax, spec templates, enums,registry, …• Licensing• Conformance• …
112SummarySummary• Many mistakes made (see other presentations for lists)• Created a sustainable culture that values quality andrigorous documentation• Defined and evolved the architecture for interactive 3-Dcomputer graphics
113Writing better OpenGLMark KilgardPrincipal System Software EngineerNVIDIA
114Motivation• Complex APIs and systems have pitfalls• After 17 years of designed evolution, OpenGLcertainly has its share• Normal documentation focus:• What can you do?• Rather than: What should you do?
115Communicating Vertex Data• The way you learn OpenGL:• Immediate mode• glBegin, glColor3f, glVertex3f, glEnd• Straightforward—no ambiguity about vertex data is• All vertex components are function parameters• The problem—too function call intensive• And all vertex data must flow through CPU
116Example Scenario• An OpenGL application has to render a set of rectangles• Rectangle with its parameters• x, y, height, width, left color, right color, depth(x,y)depth order0.01.0left side colorright side colorheightwidth
117Scene Representation• Each rectangle specified by following RectInfo structure:• Array of RectInfo structures describes “scene”• Simplistic scene for sake of teachingtypedef struct {GLfloat x, y, width, height;GLfloat depth_order;GLfloat left_side_color[3]; // red, green, thenblueGLfloat right_side_color[3]; // red, green, thenblue} RectInfo;
118Example Scene and Rendering Result• Scene of 4 rectangles:RectInfo rect_list[4] = {{ 10, 20, 180, 140, 0.5,{ 1, 1, 1 }, { 1, 0, 1 } },{ 30, 40, 100, 60, 0.5,{ 1, 0, 0 }, { 0, 0, 1 } },{ 140, 60, 100, 80, 0.5,{ 0, 0, 1 }, { 0, 1, 0 } },{ 70, 120, 80, 60, 0.7,{ 1, 1, 0 }, { 0, 1, 1 } },};• OpenGL-rendered result
119Immediate Mode Rectangle Rendering• Given sized RectInfo array, render vertices of quads1stvertex2ndvertex3rdvertex4thvertexvoid drawRectangles(int count, const RectInfo *list){glBegin(GL_QUADS);for (int i=0; i<count; i++) {const RectInfo *r = &list[i];glColor3fv(r->left_side_color);glVertex3f(r->x, r->y, r->depth_order);glColor3fv(r->right_side_color);glVertex3f(r->x+r->width, r->y, r->depth_order);// right_side_color “sticks”glVertex3f(r->x+r->width, r->y+r->height, r->depth_order);glColor3fv(r->left_side_color);glVertex3f(r->x, r->y+r->height, r->depth_order);}glEnd();}Foreachrectangle
120Critique of Immediate Mode• Advantages• Straightforward to code and debug• Easy-to-understand conceptual model• Building stream of vertices with OpenGL commands• Avoids driver & application copies of vertex data• Flexible, allowing totally dynamic vertex generation• Disadvantages• Rendering continuously streams attributes through CPU• Pollutes CPU cache with vertex data• Function call intensive• Unable to saturate fast graphics hardware• CPUs just too slow• Contrast with vertex array approach…
121Vertex Array Approach• Step 1: Copy vertex attributes into vertex arrays• From: RectInfo array (CPU memory)• To: interleaved arrays of vertex attributes (CPUmemory)• Step 2: To render• Configure OpenGL vertex array client state• Use glEnableClientState, glVertexPointer,glColorPointer• Render quads based on indices into vertex arrays• Use glDrawArrays
122Vertex Array Format• Interleave vertex attributes in color & position arrayscolorpositionfloat = 4 bytesvertex 0vertex 1redgreenbluexyzredgreenbluexyzcolorposition24 bytesper vertex
123Step 1:Copy Rectangle Attributes to Vertex Arraysvoid *initVarrayRectangles(int count, const RectInfo *list){void *varray = (char*) malloc(sizeof(GLfloat)*6*4*count);GLfloat *p = varray;for (int i=0; i<count; i++, p+=24) {const RectInfo *r = &list[i];// quad vertex #1memcpy(&p[0], r->left_side_color, sizeof(GLfloat)*3);p[3] = r->x; p[4] = r->y; p[5] = r->depth_order;// quad vertex #2memcpy(&p[6], r->right_side_color, sizeof(GLfloat)*3);p[9] = r->x+r->width; p[10] = r->y; p[11] = r->depth_order;// quad vertex #3memcpy(&p[12], r->right_side_color, sizeof(GLfloat)*3);p[15] = r->x+r->width; p[16] = r->y+r->height; p[17] = r->depth_order;// quad vertex #4memcpy(&p[18], r-> left_side_color, sizeof(GLfloat)*3);p[21] = r->x; p[22] = r->y+r->height; p[23] = r->depth_order;}return varray;}
124Step 2:Configure & Render from Vertex Arraysvoid drawVarrayRectangles(int count, const RectInfo *list){char *varray = initVarrayRectangles(count, list);const GLfloat *p = (const GLfloat*) varray;const GLsizei stride = sizeof(GLfloat)*6;//3 RGB floats,3 XYZ floatsglColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0);glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3);glEnableClientState(GL_COLOR_ARRAY);glEnableClientState(GL_VERTEX_ARRAY);glDrawArrays(GL_QUADS, /*firstIndex*/0,/*indexCount*/count*4);free(varray);}
125Critique ofSimplistic Vertex Array Rendering• Advantages• Far fewer OpenGL commands issued• Disadvantages• Every render with drawVarrayRectangles callsinitVarrayRectangles• Allocates, initializes, & frees vertex array memoryevery render• Improve by separating vertex array construction fromrendering
126Initialize Once, Render Many Approach• This routine expects base pointer returned byinitVarrayRectanglesvoid drawInitializedVarrayRectangles(int count, const void *varray){const GLfloat *p = (const GLfloat*) varray;const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floatsglColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0);glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3);// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);}
127Client Memory Vertex Attribute TransferGPUProcessorcommandprocessorvertexpullerhardwarerenderingpipelineCPUcommand queueCPU writes ofcommand + vertex dataGPU DMA transfer ofcommand + vertex dataapplication(client)memoryvertexarrayvertexdata travelsthroughCPUmemoryreadsCPU
128Vertex Buffer Object Vertex Attribute PullingOpenGL(vertex)bufferobjectGPUcommandprocessorvertexpullerhardwarerenderingpipelineCPUcommand queueCPU writes ofcommand + vertex indicesvertexarrayGPU DMA transfer ofcommand dataapplication(client)memorymemoryreadsCPUGPU DMAtransferof vertexdata—CPU never reads data
129Initializing Vertex Buffer Objects (VBOs)• Once using vertex arrays, easy to switch to VBOs• Make the vertex array as before• Then bind to buffer object and copy data to the buffervoid initVarrayRectanglesInVBO(GLuint bufferName,int count, const RectInfo *list){char *varray = initVarrayRectangles(count, list);const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floatsconst GLint numVertices = 4*count;const GLsizeiptr bufferSize = stride*numVertices;glBindBuffer(GL_ARRAY_BUFFER, bufferName);glBufferData(GL_ARRAY_BUFFER, bufferSize, varray, GL_STATIC_DRAW);free(varray);}
130Rendering from Vertex Buffer Objects• Once initialized, glBindBuffer to bind to buffer ahead ofvertex array configuration• Send offsets instead of pointsvoid drawVarrayRectanglesFromVBO(GLuint bufferName,int count){const char *base = NULL;const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floatsglBindBuffer(GL_ARRAY_BUFFER, bufferName);glColorPointer(/*rgb*/3, GL_FLOAT, stride, base+0*sizeof(GLfloat));glVertexPointer(/*xyz*/3, GL_FLOAT, stride, base+3*sizeof(GLfloat));// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);}
131Understanding glBindBuffer• Buffer object bindings are frequent point of confusion forprogrammers• What does glBindBuffer do really?• Lots of buffer binding targets:• GL_ARRAY_BUFFER target—for vertex attribute arrays• Query with GL_ARRAY_BUFFER_BINDING• GL_ARRAY_ELEMENT_BUFFER target—for vertex indices,effectively topology• Query with GL_ELEMENT_ARRAY_BUFFER_BINDING• Each vertex array has its own buffer, query with• GL_VERTEX_ARRAY_BUFFER_BINDING• GL_COLOR_ARRAY_BUFFER_BINDING• GL_TEXCOORD_ARRAY_BUFFER_BINDING, etc.
132Bind and Query Buffer TargetsBuffer Bind Tokens• GL_ARRAY_BUFFER• GL_ELEMENT_ARRAY_BUFFERBuffer Query Tokens• GL_ARRAY_BUFFER_BINDING• GL_ELEMENT_ARRAY_BUFFER_BINDING• GL_COLOR_ARRAY_BUFFER_BINDING• GL_VERTEX_ARRAY_BUFFER_BINDING• GL_FOGCOORD_ARRAY_BUFFER_BINDING• GL_TEXCOORD_ARRAY_BUFFER_BINDING• GL_VERTEX_ATTRIB_ARRRAY_BUFFER_BINDINGTarget tokensfor glBindBufferQuery tokensto glGetIntegervQuery tokensto glGetVertexAttribiv
133Latched Vertex Array Buffer Bindings• Here’s the confusing part:glBindBuffer(GL_ARRAY_BUFFER, 34);glColorPointer(3, GL_FLOAT, color_stride,(void*)color_offset);• The glBindBuffer doesn’t change any vertex arraybinding• The GL_ARRAY_BUFFER_BINDING state thatglBindBuffer sets does not itself affect rendering• It is the glColorPointer call that latches the array bufferbinding to change the color array’s buffer binding!• Same with all vertex array buffer bindings
134Binding Buffer Zero is Special• By default, vertex arrays don’t access buffer objects• Instead client memory is accessed• This is because• The initial buffer binding for a context is zero• And zero is special• Zero means access client memory• You can always resume client memory vertex array access for a given array like thisglBindBuffer(GL_ARRAY_BUFFER, 0); // use client memoryglColorPointer(3, GL_FLOAT, color_stride, color_pointer);• Different treatment of the “pointer” parameter to vertex array specification commands• When the current array buffer binding is zero, the pointer value is a clientmemory pointer• When the current array buffer binding is non-zero (meaning it names a bufferobject), the pointer value is “recast” as an offset from the beginning of the buffer• Once again• The glBindBuffer(GL_ARRAY_BUFFER,0) call alone doesn’t change any vertexarray buffer bindings• It takes a vertex array specification command such as glColorPointer to latch thezeroensures compatibilitywith pre-VBO OpenGL
135Texture Coordinate Set Selector• A selector in OpenGL is• A state variable that controls what state a subsequent commandupdates• Examples of commands that modify selectors• glMatrixMode, glActiveTexture, glClientActiveTexture• A selector is different from latched state• Latched state is a specified value that is set (or “latched”) whena subsequent command is called• Pitfall warning: glTexCoordPointer both• Relies on the glClientActiveTexture command’s selector• And latches the current array buffer binding for the selectedtexture coordinate vertex array• ExampleglBindBuffer(GL_ARRAY_BUFFER, 34);glClientActiveTexture(GL_TEXTURE3);glTexCoordPointer(2, GL_FLOAT, uv_stride, (void*)buffer_offset);buffer value glTexCoordPointer latchesselector glTexCoordPointer uses
136OpenGL’s Modern Buffer-centricProcessing ModelVertex Array BufferObject (VaBO)Transform FeedbackBuffer (XBO)ParameterBuffer (PaBO)Pixel UnpackBuffer (PuBO)Pixel PackBuffer (PpBO)BindableUniform Buffer(BUB)Texture BufferObject (TexBO)Vertex PullerVertex ShadingGeometryShadingFragmentShadingTexturingArray Element BufferObject (VeBO)PixelPipelinevertex datatexel datapixel dataparameter data(not ARB functionality yet)glBegin, glDrawElements, etc.glDrawPixels, glTexImage2D, etc.glReadPixels,etc.Framebuffer
137Usages of OpenGL Buffers Objects• Vertex uses (VBOs)• Input to GL: Vertex attribute buffer objects• Color, position, texture coordinate sets, etc.• Input to GL: Vertex element buffer objects• Indices• Output from GL: Transform feedback• Streaming vertex attributes out• Texture uses (TexBOs)• Texturing from: Texture buffer objects• Pixel uses (PBOs)• Output from GL: Pixel pack buffer objects• glReadPixels• Input from GL: Pixel unpack buffer objects• glDrawPixels, glBitmap, glTexImage2D, etc.• Shader uses (PaBOs, UBOs)• Input to assembly program: Parameter buffer objects• Input to GLSL program: Bind-able uniform buffer objectsKey point: OpenGLbuffers are containers forbytes; a buffer is not tiedto any particular usage
138Continuum of OpenGL UsageTweak-able PerformanceImmediatemodeClient vertexarraysVertex bufferobjects (VBOs)Display lists
139Mid-session break15 minutes
140Implementing OpenGLMark KilgardPrincipal System Software EngineerNVIDIA
141Topics in OpenGL Implementation• Dual-core OpenGL driver operation• What goes into a texture fetch?• You give me some texture coordinates• I give you back a color• Could it be any simpler?
142OpenGL Drivers for Multi-core CPUs• Today dual-core processors in PCs is nearly ubiquitous• 4, 6, 8, and more cores are clearly coming• How does OpenGL implementation exploit this trend?• Answer: develop dual-core OpenGL driver
143Dual-core OpenGL Driver ArchitectureApplication thread …Application thread DContext 1Application thread AApplicationrendering threadAppICDICD’s app thread(tokenize thread)Worker thread 1(server thread)Application thread CApplication audiothread (noOpenGL)Context 2Application thread BApplicationrendering threadICD’s app thread(tokenize thread)Worker thread 2(server thread)Circularcommand FIFOCircularcommand FIFO
144Dual-core Performance Results• A well-behaved OpenGL application benefiting from adual-core mode of OpenGL driver operations050100150200250Single core Dual core Null driverFramesper secondMode of OpenGL driver operation
145Good Dual-core Driver Practices• General advice• Display lists execute on the driver’s worker thread!• You want to avoid situations where the application thread must“sync” with the driver thread• Specific advice• Avoid OpenGL state queries• More on this later• Avoid querying OpenGL errors in production code• Bad behavior is detected automatically and leads to exit from thedual-core mode• Back to the standard single-core driver mode of operation• “Do no harm”
146Consider an OpenGL texture fetch• Seems very simple• Input: texture coordinates (s,t,r,q)• Output: some color (r,g,b,a)• Just a simple function, written in Cg/HLSL:uniform sampler2D decal : TEXUNIT2;float4 texcoord : TEXCOORD3;float4 rgba = tex2D(decal, texcoordset.st);• Compiles to single instruction:TEX o[COLR], f[TEX3], TEX2, 2D;• Implementation is much more involved!
147Anatomy of a Texture FetchFilteredtexelvectorTexelSelectionTexelCombinationTexeloffsetsTexeldataTexture imagesCombinationparametersTexturecoordinatevectorTexture parameters
148Texture Fetch Functionality (1)• Texture coordinate processing• Projective texturing (OpenGL 1.0)• Cube map face selection (OpenGL 1.3)• Texture array indexing (OpenGL 2.1)• Coordinate scale: normalization (ARB_texture_rectangle)• Level-of-detail (LOD) computation• Log of maximum texture coordinate partial derivative (OpenGL 1.0)• LOD clamping (OpenGL 1.2)• LOD bias (OpenGL 1.3)• Anisotropic scaling of partial derivatives (SGIX_texture_lod_bias)• Wrap modes• Repeat, clamp (OpenGL 1.0)• Clamp to edge (OpenGL 1.2), Clamp to border (OpenGL 1.3)• Mirrored repeat (OpenGL 1.4)• Fully generalized clamped mirror repeat (EXT_texture_mirror_clamp)• Wrap to adjacent cube map face• Region clamp & mirror (PlayStation 2)
149Texture Fetch Functionality (2)• Filter modes• Minification / magnification transition (OpenGL 1.0)• Nearest, linear, mipmap (OpenGL 1.0)• 1D & 2D (OpenGL 1.0), 3D (OpenGL 1.2), 4D (SGIS_texture4D)• Anisotropic (EXT_texture_filter_anisotropic)• Fixed-weights: Quincunx, 3x3 Gaussian• Used for multi-sample resolves• Detail texture magnification (SGIS_detail_texture)• Sharpen texture magnification (SGIS_sharpen_texture)• 4x4 filter (SGIS_texture_filter4)• Sharp-edge texture magnification (E&S Harmony)• Floating-point texture filtering (ARB_texture_float, OpenGL 3.0)
150Texture Fetch Functionality (3)• Texture formats• Uncompressed• Packing: RGBA8, RGB5A1, etc. (OpenGL 1.1)• Type: unsigned, signed (NV_texture_shader)• Normalized: fixed-point vs. integer (OpenGL 3.0)• Compressed• DXT compression formats (EXT_texture_compression_s3tc)• 4:2:2 video compression (various extensions)• 1- and 2-component compression (EXT_texture_compression_latc,OpenGL 3.0)• Other approaches: IDCT, VQ, differential encoding, normal maps,separable decompositions• Alternate encodings• RGB9 with 5-bit shared exponent (EXT_texture_shared_exponent)• Spherical harmonics• Sum of product decompositions
151Texture Fetch Functionality (4)• Pre-filtering operations• Gamma correction (OpenGL 2.1)• Table: sRGB / arbitrary• Shadow map comparison (OpenGL 1.4)• Compare functions: LEQUAL, GREATER, etc.(OpenGL 1.5)• Needs “R” depth value per texel• Palette lookup (EXT_paletted_texture)• Thresh-holding• Color key• Generalized thresh-holding
152Texture Fetch Functionality (5)• Optimizations• Level-of-detail weighting adjustments• Mid-maps (extra pre-filtered levels in-between existing levels)• Unconventional uses• Bitmap textures for fonts with large filters (Direct3D 10)• Rip-mapping• Non-uniform texture border color• Clip-mapping (SGIX_clipmap)• Multi-texel borders• Silhouette maps (Pardeep Sen’s work)• Shadow mapping• Sharp piecewise linear magnification
153Phased Data Flow• Must hide long memory read latency between Selectionand Combination phasesTexelSelectionTexelCombinationTexeloffsetsTexeldataTexture imagesCombinationparametersTexturecoordinatevectorTexture parametersMemoryreads forsamplesFIFOing ofcombinationparameters
154What really happens?• Let’s consider a simple tri-linear mip-mapped 2Dprojective texture fetch• Logically just one instructionTXP o[COLR], f[TEX3], TEX2, 2D;• Logically• Texel selection• Texel combination• How many operations are involved?
155Medium-Level Dissectionof a Texture FetchConverttexelcoordstotexeloffsetsinteger /fixed-pointtexelcombinationtexeloffsetstexel datatexture imagescombinationparametersinterpolatedtexture coordsvectortexture parametersConverttexturecoordstotexelcoordsfilteredtexelvectortexelcoordsfloor /frac integercoords &fractionalweightsfloating-pointscalingandcombinationinteger /fixed-pointtexelintermediates
156Interpolation• First we need to interpolate (s,t,r,q)• This is the f[TEX3] part of the TXP instruction• Projective texturing means we want (s/q, t/q)• And possible r/q if shadow mapping• In order to correct for perspective, hardware actually interpolates• (s/w, t/w, r/w, q/w)• If not projective texturing, could linearly interpolate inverse w (or 1/w)• Then compute its reciprocal to get w• Since 1/(1/w) equals w• Then multiply (s/w,t/w,r/w,q/w) times w• To get (s,t,r,q)• If projective texturing, we can instead• Compute reciprocal of q/w to get w/q• Then multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q)Observe projectivetexturing is samecost as perspectivecorrection
157Interpolation Operations• Ax + By + C per scalar linear interpolation• 2 MADs• One reciprocal to invert q/w for projective texturing• Or one reciprocal to invert 1/w for perspectivetexturing• Then 1 MUL per component for s/w * w/q• Or s/w * w• For (s,t) means• 4 MADs, 2 MULs, & 1 RCP• (s,t,r) requires 6 MADs, 3 MULs, & 1 RCP• All floating-point operations
158Texture Space Mapping• Have interpolated & projected coordinates• Now need to determine what texels to fetch• Multiple (s,t) by (width,height) of texture base level• Could convert (s,t) to fixed-point first• Or do math in floating-point• Say based texture is 256x256 so• So compute (s*256, t*256)=(u,v)
159Mipmap Level-of-detail Selection• Tri-linear mip-mapping means compute appropriatemipmap level• Hardware rasterizes in 2x2 pixel entities• Typically called quad-pixels or just quad• Finite difference with neighbors to get change in uand v with respect to window space• Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂y• Means 4 subtractions per quad (1 per pixel)• Now compute approximation to gradient length• p = max(sqrt((∂u/∂x)2+(∂u/∂y)2),sqrt((∂v/∂x)2+(∂v/∂y)2))one-pixel separation
160Level-of-detail Bias and Clamping• Convert p length to power-of-two level-of-detail andapply LOD bias• λ = log2(p) + lodBias• Now clamp λ to valid LOD range• λ’ = max(minLOD, min(maxLOD, λ))
161Determine Mipmap Levels andLevel Filtering Weight• Determine lower and upper mipmap levels• b = floor(λ’)) is bottom mipmap level• t = floor(λ’+1) is top mipmap level• Determine filter weight between levels• w = frac(λ’) is filter weight
162Determine Texture Sample Point• Get (u,v) for selected top and bottom mipmap levels• Consider a level l which could be either level t or b• With (u,v) locations (ul,vl)• Perform GL_CLAMP_TO_EDGE wrap modes• uw = max(1/2*widthOfLevel(l),min(1-1/2*widthOfLevel(l), u))• vw = max(1/2*heightOfLevel(l),min(1-1/2*heightOfLevel(l), v))• Get integer location (i,j) within each level• (i,j) = ( floor(uw* widthOfLevel(l)),floor(vw* ) )borderedgest
163Determine Texel Locations• Bilinear sample needs 4 texel locations• (i0,j0), (i0,j1), (i1,j0), (i1,j1)• With integer texel coordinates• i0 = floor(i-1/2)• i1 = floor(i+1/2)• j0 = floor(j-1/2)• j1 = floor(j+1/2)• Also compute fractional weights for bilinear filtering• a = frac(i-1/2)• b = frac(j-1/2)
164Determine Texel Addresses• Assuming a texture level image’s base pointer, compute a texeladdress of each texel to fetch• Assume bytesPerTexel = 4 bytes for RGBA8 texture• Example• addr00 = baseOfLevel(l) +bytesPerTexel*(i0+j0*widthOfLevel(l))• addr01 = baseOfLevel(l) +bytesPerTexel*(i0+j1*widthOfLevel(l))• addr10 = baseOfLevel(l) +bytesPerTexel*(i1+j0*widthOfLevel(l))• addr11 = baseOfLevel(l) +bytesPerTexel*(i1+j1*widthOfLevel(l))• More complicated address schemes are needed for good texturelocality!
165Initiate Texture Reads• Initiate texture memory reads at the 8 texel addresses• addr00, addr01, addr10, addr11 for the upper level• addr00, addr01, addr10, addr11 for the lower level• Queue the weights a, b, and w• Latency FIFO in hardware makes these weightsavailable when texture reads complete
166Phased Data Flow• Must hide long memory read latency between Selectionand Combination phasesTexelSelectionTexelCombinationTexeloffsetsTexeldataTexture imagesCombinationparametersTexturecoordinatevectorTexture parametersMemoryreads forsamplesFIFOing ofcombinationparameters
167Texel Combination• When texels reads are returned, begin filtering• Assume results are• Top texels: t00, t01, t10, t11• Bottom texels: b00, b01, b10, b11• Per-component filtering math is tri-linear filter• RGBA8 is four components• result = (1-a)*(1-b)*(1-w)*b00 +(1-a)*b*(1-w)*b*b01 +a*(1-b)*(1-w)*b10 +a*b*(1-w)*b11 +(1-a)*(1-b)*w*t00 +(1-a)*b*w*t01 +a*(1-b)*w*t10 +a*b*w*t11;• 24 MADs per component, or 96 for RGBA• Lerp-tree could do 14 MADs per component, or 56 for RGBA
168Total Texture Fetch Operations• Interpolation• 6 MADs, 3 MULs, & 1 RCP (floating-point)• Texel selection• Texture space mapping• 2 MULs (fixed-point)• LOD determination (floating-point)• 1 pixel difference, 2 SQRTs, 4 MULs, 1 LOG2• LOD bias and clamping (fixed-point)• 1 ADD, 1 MIN, 1 MAX• Level determination and level weighting (fixed-point)• 1 FLOOR, 1 ADD, 1 FRAC• Texture sample point• 4 MAXs, 4 MINs, 2 FLOORs (fixed-point)• Texel locations and bi-linear weights• 8 FLOORs, 4 FRACs, 8 ADDs (fixed-point)• Addressing• 16 integer MADs (integer)• Texel combination• 56 fixed-point MADs (fixed-point)
169Observations about the Texture Fetch• Lots of ways to implement the math• Lots of clever ways to be efficient• Lots more texture operations not considered in this analysis• Compression• Anisotropic filtering• sRGB• Shadow mapping• Arguably TEX instructions are “world’s most CISC instructions”• Texture fetches are incredibly complex instructions• Good deal of GPU’s superiority at graphics operations over CPUs isattributable to TEX instruction efficiency• Good for compute too
170OpenGL’s Future EvolutionMark KilgardPrincipal System Software EngineerNVIDIA
171What drives OpenGL’s future?• GPU graphics functionality• Tessellation & geometry amplification• Ratio of GPU to single-core CPU performance• Compatibility• Direct3Disms• OpenGLisms• Deprecation• Compute support• OpenCL, CUDA, Stream processing• Unconventional graphics devices
172Better Graphics Functionality• Expect more graphics performance• Easy prediction• Rasterization nowhere near peaked• Ray tracing fans—GPUs make rays and trianglesfaster– Market still values triangles more than rays• Expect more generalized graphics functionality• Trend for texture enhancements likely to continue
173Geometry Amplification• Tessellation• Programmable hardware support coming• True market demand probably not tessellation per se• Games want visual richness• Texture and shading have created much richness– Often “pixel richness” as substitute for geometry richness• Increasingly “visual richness” means geometric complexity• Geometry Amplification may be better term• Tessellation is one way to improve tessellation– Recognize the limits of bi-variate patches forrepresenting geometry
174Programmable Tessellation• Stunning real-time geometric detail + animation possible• Programmable tessellation + vertex textured displacements
175Continuous Level-of-detail for TessellationIncreasing tessellation level-of-detail• Same patch mesh for all 3 scenes
176Adaptive Programmable TessellationProgrammable level-of-detail determination allowsmore tessellation along silhouette edges
177Limits of Patch Tessellation• What games tend to want• Here’s 8 vertices (boundingbox), go draw a fire truck• Here’s a few vertices, go drawa tree
178Tessellation Not New to OpenGL• At least three different bi-variate patch tessellation schemes havebeen added to OpenGL• Evaluators (OpenGL 1.0)• NV_evaluators (GeForce 3)• water-tight• adaptive level-of-detail• forward differencing approach• ATI_pn_triangles Curved PN Triangles (Radeon)• tessellated triangle based on positions+normals• None succeeded• Hard to integrate into art pipelines• Didn’t offer enough performance advantageGLUT’s wire-frameteapot[Moreton 20001][Vlachos 20001]
179Ratio of CPU core-to-GPU Performance• Well known computer architecture trends now• Single-threaded CPU performance trends are stalled• Multi-core is CPU designer response• GPU performance continues on-trend• What does this mean for graphics API design?• CPUs must generate more visually rich API commandstreams to saturate GPUs• Can’t just send more commands faster• Single-threaded CPUs can only do so much• So must send more powerful commands
180Déjà vu• We’ve been here before• Early 1980s: Graphics terminals used to beconnected to minicomputers by slow speedinterconnects• CPUs themselves far too slow for real-timerendering• Resulting rendering model• Download scene database to graphics terminal• Adjust viewing and modeling parameters• Send “redraw scene” command
181What Happened• Such “scene processor” hardware not very flexible• Difficult to animate anything beyond rigid dynamics• Eventually SGI and others matched CPUs and interconnects tographics performance• Result was IRIS GL’s immediate mode• CPU fast enough to send geometry every frame• OpenGL took this model• Over time added vertex arrays, vertex buffers, texturing,programmable shading, and more performance• CPU performance became limiter still• Better graphics driver tuning helped• Dual-core drivers help some more
182OpenGL’s Most Powerful Command• Available since OpenGL 1.0• Can render essentially anything OpenGL can render!• Takes just one parameter• The commandglCallList(GLuint displayListName);• Power of display lists comes from• Playing back arbitrary compiled commands• Allowing for hierarchical calling of display list• A display list can contain glCallList or glCallLists• Ability of application to re-define display lists• No editing, but can be re-defined
183Enhanced Display Lists• OpenGL 1.0 display lists are too inflexible• Pixel & vertex data “compiled into” display lists• Binding objects always “by name”• Rather than “by reference• These problems can be fixed• Modern OpenGL supports buffers for transferring vertices andpixels• Compile commands into display lists that defer vertex andpixel transfers until execute-time– Rather than compile-time• Allow objects (textures, buffers, programs) to be bound “byreference” or “by name”
184Other Display List Enhancements• Conditional display list execution• Relaxed vertex index and command order• Parallel construction of display lists by multiple threadsGeneral insight: Easier for driver to optimize application’sgraphics command stream if it gets to1) see the repetition in the command stream clearly2) take time to analyze and optimize usage
185Conditional Display List Execution• Today’s occlusion query• Application must “query” to learn occlusion result• Latency too great to respond• Application can use OpenGL 3.0’s conditional rendercapability• But just skips vertex pulling, not state changes• Conditional display list execution• Allow a glCallList to depend on the occlusion resultfrom an occlusion query object• Allows in-band occlusion querying• Skip both vertex pulling and state changes
186Relaxed Vertex Index and Command Order• OpenGL today always executes commands “in order”• Sequentially requirement• Provide compile-time specification of re-ordering allowances• Allows GL implementation to re-order• Vertex indices within display list’s vertex batch• Commands within display list• Key rule: state vector rendering command executes in mustmatch the state if command was rendered sequentially• Allow static or dynamic re-ordering• Static re-ordering needed for multi-pass invariances• Past practice• IRIS Performer would sort rendering by state changes forperformance• [Sander 2007] show substantial benefit for vertex ordering
187Parallel Display List Construction• Today’s model• Single thread makes all OpenGL rendering calls• Minimizes GPU context switch overhead• Ties command generation rate to single core’sCPU performance• Enhanced display list model• Multiple threads can build display lists in parallel• Single thread still executes display lists• Countable semaphore objects used to synchronizehand-off of display lists built by other threads withmain rendering thread
188Rethinking Display Lists• Display lists have been proposed for deprecation• Right as we really need them!• Much more interesting to enhance display lists• Dual-core driver already off-loads display list traversalto driver’s thread• Multi-core driver could scan frequently executeddisplay lists to optimize their order and errorprocessing• Includes adding pre-fetching to avoid stalling CPUon cache misses for object accesses
189Direct3Disms• Developing a shader-rich game title costs $$$• For top titles, often US$ 5,000,000+• Investment typically amortized over multiple platforms• Consoles are primary target, then PCs• PC version typically developed for Direct3D• Reality: OpenGL is often 3rdor worse priority• API differences = porting & performance pitfalls• Stops or slows Direct3D-developed 3D content fromworking easily on OpenGL platforms
190Supporting Direct3D: Not New• OpenGL has always supported multiple formats well• OpenGL’s plethora of pixel and vertex formats• Very first OpenGL extension: EXT_bgra• Provides a pixel component ordering to match thecolor component ordering of Windows for 2D GDIrendering• Made core functionality by OpenGL 1.3• Many OpenGL extensions have embraced Direct3Disms• Secondary color• Fog coordinate• Point sprites
191Direct3D vs. OpenGLCoordinate System Conventions• Window origin conventions• Direct3D = upper-left origin• OpenGL = lower-left origin• Pixel center conventions• Direct3D9 = pixel centers at integer locations• OpenGL (and Direct3D 10) = pixel centers at half-pixel locations• Clip space conventions• Direct3D = [-1,+1] for XY, [0,1] for Z• OpenGL = [-1,+1] range for XYZ• Affects• How projection matrix is loaded• Fragment shaders that access the window position• Point sprites have upper-left texture coordinate origin• OpenGL already lets application choose lower-left or upper-left
192Direct3D vs. OpenGLProvoking Vertex Conventions• Direct3D uses “first” vertex of a triangle or line todetermine which color is used for flat shading• OpenGL uses “last” vertex for lines, triangles, and quads• Except for polygons (GL_POLYGON) mode that use thefirst vertexDirect3D 9pDev->SetRenderState(D3DRS_SHADEMODE,D3DSHADE_FLAT);OpenGLglShadeModel(GL_FLAT);Input triangle stripwith per-vertex colors
193BGRA Vertex Array Order• Direct3D 9’s most common usage for sending per-vertexcolors is 32-bit D3DCOLOR data type:• Red in bits 16:23• Green in bits 8:15• Blue in bits 0:7• Alpha in bits 24:31• Laid in memory, looks like BGRA order• OpenGL assumes RGBA order for all vertex arrays• Direct3Dism EXT_vertex_array_bgra extension allows:glColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);glSecondaryColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);glVertexAttribPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);8-bitred8-bitalpha8-bitgreen8-bitbluebit 31bit 0
194OpenGLisms• Things about OpenGL’s operation that make it hard fornon-OpenGL applications to port to OpenGL• Examples• Selectors• Linked GLSL program objects
195Eliminating Selectors from OpenGL• OpenGL has lots of selectors• Selectors set state that indicates what state subsequentcommands will update• Already mentioned selectors: glClientActiveTexture• Other examples: glActiveTexture, glMatrixMode,glBindTexture, glBindBuffer, glUseProgram,glBindProgramARB• OpenGL is full of selectors– Partly OpenGL’s extensibility strategy– Partly because objects are bound into context» Bind-to-edit objects» Rather than edit-by-name• Direct State Access extension: EXT_direct_state_access• Provides complete selector-free additional API for OpenGL• Shipping in NVIDIA’s 180.43 drivers
196Reasons to Eliminate Selectors• Direct3D has an “edit-by-name” model of operation• Means Direct3D has no selectors• Having to manage selectors when porting Direct3D or consolecode to OpenGL is awkward• Requires deferring updates to minimize selector and objectbind changes• Layered libraries can’t count of selector state• To be safe when updating sate controlled by selectors, suchlibraries must use idiom• Save selector, Set selector, Update state, Restore selector• Bad for performance, particularly bad for dual-core driverssince queries are expensive
197GLSL Program Object Linking• GLSL requires shader objects from different domains(vertex, geometry, fragment) to be linked into singleGLSL program object• Means you can’t mix-and-match shaders easily• Other APIs don’t have this limitation• Direct3D• Prior OpenGL assembly language extensions• Consoles• Have a “separate shader objects” extension could fix thisproblem
198Separate Shader Objects Example• Combining different GLSL shaders at onceSpecular brickbump mappingRed diffuseWobbly torusSmooth torusDifferentGLSLvertexshadersDifferent GLSL fragment shaders
199Deprecation• Part of OpenGL 3.0 is a marking of features for deprecation• LOTS of functionality is marked for deprecation• I contend no real application today uses the non-deprecatedsubset of OpenGL—all apps would have to change due todeprecation• Some vendors believe getting rid of features will make OpenGLbetter in some way• NVIDIA does not believe in abandoning API compatibility thisway• OpenGL is part of a large ecosystem so removing features this wayundermines the substantial investment partners have made inOpenGL over years• API compatibility and stability is one of OpenGL’s greatstrengths
200Synergy between OpenGL and OpenCL• Complimentary capabilities• OpenGL 3.0 = state-of-the-art, cross-platform graphics• OpenCL 1.0 = state-of-the-art, cross-platform compute• Computation & Graphics should work together• Most natural way to intuit compute results is with graphics• When Compute is done on a GPU, there’s no need to “copy” thedata to see it visualized• Appendix B of OpenCL specification• Details with sharing objects between OpenGL and OpenCL• Called “GL” and “CL” from here on…
201Four Kinds of Shared ObjectsOpenCL 3D image objectcl_memOpenGL renderbuffer objectGLuint renderbufferOpenGL buffer objectGLuint bufferobjOpenCL buffer objectcl_memOpenGL texture 2D objectGLenum targetGLuint textureGLint miplevelOpenGL texture 3D objectGLenum targetGLuint textureGLintOpenCL 2D image objectcl_mem2D image objectcl_memclCreateFromGLBufferclCreateFromGLTexture2DclCreateFromGLTexture3DclCreateFromGLRenderbufferOpenGL OpenCL
202OpenGL / OpenCL Sharing• Requirements for GL object sharing with CL• CL context must be created with an OpenGL context• Each platform-specific API will provide its appropriateway to create an OpenGL-compatible CL context• For WGL (Windows), CGL (OS X), GLX (X11/Linux),EGL (OpenGL ES), etc.• Creating cl_mem for GL Objects does two things1.Ensures CL has a reference to the GL objects2.Provides cl_mem handle to acquire GL object for CL’suse• clRetainMemObject & clReleaseMemObject can createcounted references to cl_mem objects
203Acquiring GL Objects for Compute Access• Still must “enqueue acquire” GL objects for compute kernels touse them• Otherwise reading or writing GL objects with CL is undefined• Enqueue acquire and release provide sequential consistencywith GL command processing• Enqueue commands for GL objects• clEnqueueAcquireGLObjects• Takes list of cl_mem objects for GL objects & list ofcl_events that must complete before acquire• Returns a cl_event for this acquire operation• clEnqueueReleaseGLObjects• Takes list of cl_mem objects for GL objects & list ofcl_events that must complete before release• Returns a cl_event for this release operation
204Unconventional OpenGL Deployments• Workstation PCs—Quadro• Consumer PCs—GeForce• High-end Visualization—QuadroPlex VisualComputing Solution (VCS)• Embedded Applications• Handheld Devices• Game ConsolesConventionalPCOpenGLProductsUnconventional
205OpenGL in ContextA facilitated conversationwith Dr. Marc Levoy, Stanford University
206Questions?

Recommended

PPTX
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
PDF
Best Practices for Shader Graph
PDF
Siggraph2016 - The Devil is in the Details: idTech 666
PPTX
Beyond porting
PPTX
Stochastic Screen-Space Reflections
PDF
Forward+ (EUROGRAPHICS 2012)
PPTX
Frostbite on Mobile
PPTX
Approaching zero driver overhead
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPTX
Parallel Futures of a Game Engine (v2.0)
 
PPTX
FrameGraph: Extensible Rendering Architecture in Frostbite
PDF
Game Engine Overview
PDF
Rendering Techniques in Rise of the Tomb Raider
PPT
Secrets of CryENGINE 3 Graphics Technology
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
PPTX
Tips and experience of DX12 Engine development .
PDF
Checkerboard Rendering in Dark Souls: Remastered by QLOC
 
PPT
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
PPTX
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
PDF
Advanced Scenegraph Rendering Pipeline
PPT
Z Buffer Optimizations
PDF
Rendering AAA-Quality Characters of Project A1
PPTX
Triangle Visibility buffer
PPTX
The Rendering Technology of Killzone 2
PDF
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
PDF
Efficient Rendering with DirectX* 12 on Intel® Graphics
PDF
Rendering Tech of Space Marine
PPT
Anatomy of a Texture Fetch
PPT
CS 354 Texture Mapping

More Related Content

PPTX
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
PDF
Best Practices for Shader Graph
PDF
Siggraph2016 - The Devil is in the Details: idTech 666
PPTX
Beyond porting
PPTX
Stochastic Screen-Space Reflections
PDF
Forward+ (EUROGRAPHICS 2012)
PPTX
Frostbite on Mobile
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Best Practices for Shader Graph
Siggraph2016 - The Devil is in the Details: idTech 666
Beyond porting
Stochastic Screen-Space Reflections
Forward+ (EUROGRAPHICS 2012)
Frostbite on Mobile

What's hot

PPTX
Approaching zero driver overhead
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPTX
Parallel Futures of a Game Engine (v2.0)
 
PPTX
FrameGraph: Extensible Rendering Architecture in Frostbite
PDF
Game Engine Overview
PDF
Rendering Techniques in Rise of the Tomb Raider
PPT
Secrets of CryENGINE 3 Graphics Technology
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
PPTX
Tips and experience of DX12 Engine development .
PDF
Checkerboard Rendering in Dark Souls: Remastered by QLOC
 
PPT
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
PPTX
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
PDF
Advanced Scenegraph Rendering Pipeline
PPT
Z Buffer Optimizations
PDF
Rendering AAA-Quality Characters of Project A1
PPTX
Triangle Visibility buffer
PPTX
The Rendering Technology of Killzone 2
PDF
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
PDF
Efficient Rendering with DirectX* 12 on Intel® Graphics
PDF
Rendering Tech of Space Marine
Approaching zero driver overhead
Optimizing the Graphics Pipeline with Compute, GDC 2016
Parallel Futures of a Game Engine (v2.0)
 
FrameGraph: Extensible Rendering Architecture in Frostbite
Game Engine Overview
Rendering Techniques in Rise of the Tomb Raider
Secrets of CryENGINE 3 Graphics Technology
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Tips and experience of DX12 Engine development .
Checkerboard Rendering in Dark Souls: Remastered by QLOC
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
Advanced Scenegraph Rendering Pipeline
Z Buffer Optimizations
Rendering AAA-Quality Characters of Project A1
Triangle Visibility buffer
The Rendering Technology of Killzone 2
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
Efficient Rendering with DirectX* 12 on Intel® Graphics
Rendering Tech of Space Marine

Viewers also liked

PPT
Anatomy of a Texture Fetch
PPT
CS 354 Texture Mapping
DOCX
glut dev c++ membuat nama
PDF
Anthony de Mello - Bewustzijn
PDF
Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)
PPTX
B-Plan Pitch Deck Template
PPTX
vSphere vStorage: Troubleshooting Performance
PDF
2010-JOGL-11-Toon-Shading
PDF
2010-JOGL-09-Texture-Mapping
PPTX
Texture mapping in_opengl
PPT
CS 354 Blending, Compositing, Anti-aliasing
PDF
Yoda - HTML5 Content Authoring Tool
PDF
TWJournal2
PPTX
Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto
PDF
Designing for Sensors 
& the Future of Experiences
PDF
Adobe Digital Publishing Solution
PDF
VMsoft clairview 제품소개서 (2014.03)
PDF
Sengketa jual beli tanah adat
PDF
Final field semantics
PDF
내손남 Solution
Anatomy of a Texture Fetch
CS 354 Texture Mapping
glut dev c++ membuat nama
Anthony de Mello - Bewustzijn
Anthony de-mello-constienta-capcanele-si-sansele-realitatii-160620161428(1)
B-Plan Pitch Deck Template
vSphere vStorage: Troubleshooting Performance
2010-JOGL-11-Toon-Shading
2010-JOGL-09-Texture-Mapping
Texture mapping in_opengl
CS 354 Blending, Compositing, Anti-aliasing
Yoda - HTML5 Content Authoring Tool
TWJournal2
Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto
Designing for Sensors 
& the Future of Experiences
Adobe Digital Publishing Solution
VMsoft clairview 제품소개서 (2014.03)
Sengketa jual beli tanah adat
Final field semantics
내손남 Solution

Similar to SIGGRAPH Asia 2008 Modern OpenGL

PPT
Programming with OpenGL
PPTX
Slideshare
PDF
Angel
PPTX
What is OpenGL ?
PPTX
Open gl introduction
PPTX
Graphics Libraries
PDF
Lecture1 open GL introduction.pdf open GL CGIP
PDF
OpenGL_Programming_Guide.pdf
PDF
CG3_ch3+ch4computergraphicsbreesenhan.pdf
PDF
Bouncing ball content management system project report.pdf
PDF
AN INTERNSHIP REPORT ON AIRPLANE GAME MANAGEMENT SYSTEM PROJECT REPORT.
PDF
Open gl
PDF
Airplane game management system project report .pdf
PPT
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
DOCX
Computer graphics workbook
PDF
Tictactoe game management system project report.pdf
PDF
TICTACTOE GAME MANAGEMENT SYSTEM PROJECT REPORT.
PDF
Opengl basics
PPT
OpenGL 4 for 2010
PDF
18csl67 vtu lab manual
Programming with OpenGL
Slideshare
Angel
What is OpenGL ?
Open gl introduction
Graphics Libraries
Lecture1 open GL introduction.pdf open GL CGIP
OpenGL_Programming_Guide.pdf
CG3_ch3+ch4computergraphicsbreesenhan.pdf
Bouncing ball content management system project report.pdf
AN INTERNSHIP REPORT ON AIRPLANE GAME MANAGEMENT SYSTEM PROJECT REPORT.
Open gl
Airplane game management system project report .pdf
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and Beyond
Computer graphics workbook
Tictactoe game management system project report.pdf
TICTACTOE GAME MANAGEMENT SYSTEM PROJECT REPORT.
Opengl basics
OpenGL 4 for 2010
18csl67 vtu lab manual

More from Mark Kilgard

PDF
D11: a high-performance, protocol-optional, transport-optional, window system...
PPT
Computers, Graphics, Engineering, Math, and Video Games for High School Students
PPT
NVIDIA OpenGL and Vulkan Support for 2017
PPT
NVIDIA OpenGL 4.6 in 2017
PPT
NVIDIA OpenGL in 2016
PPT
Virtual Reality Features of NVIDIA GPUs
PPTX
Migrating from OpenGL to Vulkan
PPT
EXT_window_rectangles
PPT
OpenGL for 2015
PPT
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
PDF
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
PPT
NV_path rendering Functional Improvements
PPTX
OpenGL 4.5 Update for NVIDIA GPUs
PPT
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
PDF
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
PPT
GPU accelerated path rendering fastforward
PDF
GPU-accelerated Path Rendering
PPT
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
PPT
SIGGRAPH 2012: NVIDIA OpenGL for 2012
PPT
GTC 2012: GPU-Accelerated Path Rendering
D11: a high-performance, protocol-optional, transport-optional, window system...
Computers, Graphics, Engineering, Math, and Video Games for High School Students
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL in 2016
Virtual Reality Features of NVIDIA GPUs
Migrating from OpenGL to Vulkan
EXT_window_rectangles
OpenGL for 2015
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline
NV_path rendering Functional Improvements
OpenGL 4.5 Update for NVIDIA GPUs
SIGGRAPH Asia 2012: GPU-accelerated Path Rendering
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
GPU accelerated path rendering fastforward
GPU-accelerated Path Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: NVIDIA OpenGL for 2012
GTC 2012: GPU-Accelerated Path Rendering

Recently uploaded

PDF
Cybersecurity Prevention and Detection: Unit 2
PDF
How Much Does It Cost to Build an eCommerce Website in 2025.pdf
PDF
ODSC AI West: Agent Optimization: Beyond Context engineering
PDF
Lets Build a Serverless Function with Kiro
PDF
[BDD 2025 - Mobile Development] Crafting Immersive UI with E2E and AGSL Shade...
PDF
The partnership effect: Libraries and publishers on collaborating and thrivin...
PDF
Mulesoft Meetup Online Portuguese: MCP e IA
PDF
"DISC as GPS for team leaders: how to lead a team from storming to performing...
 
PDF
Dev Dives: Build smarter agents with UiPath Agent Builder
PDF
[BDD 2025 - Artificial Intelligence] Building AI Systems That Users (and Comp...
PPTX
Connecting the unconnectable: Exploring LoRaWAN for IoT
PDF
Beyond Basics: How to Build Scalable, Intelligent Imagery Pipelines
PPTX
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
PDF
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
PPTX
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
PDF
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
PDF
[BDD 2025 - Mobile Development] Exploring Apple’s On-Device FoundationModels
PDF
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
PDF
The Necessity of Digital Forensics, the Digital Forensics Process & Laborator...
PDF
Transforming Content Operations in the Age of AI
Cybersecurity Prevention and Detection: Unit 2
How Much Does It Cost to Build an eCommerce Website in 2025.pdf
ODSC AI West: Agent Optimization: Beyond Context engineering
Lets Build a Serverless Function with Kiro
[BDD 2025 - Mobile Development] Crafting Immersive UI with E2E and AGSL Shade...
The partnership effect: Libraries and publishers on collaborating and thrivin...
Mulesoft Meetup Online Portuguese: MCP e IA
"DISC as GPS for team leaders: how to lead a team from storming to performing...
 
Dev Dives: Build smarter agents with UiPath Agent Builder
[BDD 2025 - Artificial Intelligence] Building AI Systems That Users (and Comp...
Connecting the unconnectable: Exploring LoRaWAN for IoT
Beyond Basics: How to Build Scalable, Intelligent Imagery Pipelines
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
MuleSoft Meetup: Dreamforce'25 Tour- Vibing With AI & Agents.pdf
[BDD 2025 - Mobile Development] Exploring Apple’s On-Device FoundationModels
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
The Necessity of Digital Forensics, the Digital Forensics Process & Laborator...
Transforming Content Operations in the Age of AI

SIGGRAPH Asia 2008 Modern OpenGL

  • 1.
  • 2.
    2Mark J. Kilgard,NVIDIAKurt Akeley, Microsoft Research13 December 2008SingaporeModern OpenGL:Its Design and Evolution
  • 3.
  • 4.
    4Kurt Akeley• Leddevelopment of OpenGL at Silicon Graphics (SGI)• Co-founded SGI• Lead development of SGI’s high-end graphics hardware• Co-author of OpenGL specification• Returned to Stanford University to complete Ph.D.• Co-developed Cg “C for graphics” language at NVIDIA• Principal Researcher, Microsoft Research Silicon Valley• Spent time at Microsoft Research Asia in Beijing• Member of US National Academy of Engineering
  • 5.
    5Mark Kilgard• PrincipalSystem Software Engineer, NVIDIA, Austin, Texas• Developed original OpenGL driver for 1stGeForce GPU• Specified many key OpenGL extensions• Works on Cg for portable programmable shading• NVIDIA Distinguished Inventor• Before NVIDIA, worked at Silicon Graphics• Worked on X Window System integration for OpenGL• Developed popular OpenGL Utility Toolkit (GLUT)• Wrote book on OpenGL and X, co-authored Cg Tutorial
  • 6.
    6Marc Levoy• Moderatorfor our facilitated discussion• Professor of Computer Science and ElectricalEngineering• Stanford University• SIGGRAPH Computer Graphics Achievement Award• ACM Fellow
  • 7.
    7Course Schedule• ModernOpenGL (Kilgard)• OpenGL’s evolution: a personal retrospective (Akeley)• Writing better OpenGL (Kilgard)• Implementing OpenGL (Kilgard)• OpenGL’s future evolution (Kilgard)• OpenGL in Context (Akeley, Kilgard, Levoy)• Facilitated conversation– Mid-session break –
  • 8.
    8Check Out theCourse Notes (1)• Look to www.opengl.org web site for our final slides• New Material• “An Incomplete History of OpenGL” (Kilgard)• How the OpenGL graphics system developed• “Using Vertex Buffer Objects Well” (Kilgard)• Learn how to use Vertex Buffers objects for highvertex processing rates
  • 9.
    9Check Out theCourse Notes (2)• Paper Reprints• OpenGL design rationale from its specification co-authors (Segal, Akeley)• Realizing OpenGL: two implementations of onearchitecture (Kilgard)• Graphics hardware: GTX, RealityEngine,InfiniteReality, GeForce 6800• Key developments in graphics hardware designover last 20 years• GPU Programmability: “User-Programmable VertexEngine” and “Cg” SIGGAPH papers• “How GPUs Work” (Luebke, Humpherys)
  • 10.
    10Modern OpenGLMark KilgardPrincipalSystem Software EngineerNVIDIA
  • 11.
    11Modern OpenGL• History•How did OpenGL get where it is now?• Present• Version 3.0• Functionality beyond 3.0
  • 12.
    12An Overview Historyof OpenGL• Pre-history 1991• IRIS GL, a proprietary Graphics Library by SGI• OpenGL, an open standard for 3D• Focus: procedural hardware-accelerated 3D graphics• Governed by Architectural Review Board (ARB)• Extensibility planned into design• Competition• Proprietary APIs (1991-1995)• PHIGS & PEX for X Window System (1992-1997)• Microsoft’s Direct3D (1998-)
  • 13.
    13OpenGL’s Pre-historyIRIS GL1Window system: MEXIRIS GL 2Window system: MEXOperating system: UNIXIRIS GL 3Window system: NeWS/X11Operating system: IRIX 3.xIRIS GL 4Window system: Native X11Operating system: IRIX 4.3OpenGL 1.0Window system: Native X11 with GLXOperating system: IRIX 5.119911993198819861983First work onGL 5.0 proposal1989Dates are forshipping commercialSGI implementation1983-2008 = 25 years
  • 14.
    14OpenGL’s Design Philosophy•High-performance• Assumes hardwareacceleration• Defined by a specification• Rather than a de-factoimplementation• Rendering state machine• Procedural• Not a window system,not a scene graph• No initial sub-setting• Extensible• Data type rich• Cross-platform• Window system-independent core• X Window System,Microsoft Windows,OS/2, OS X, etc.• Multi-language bindings• C, FORTRAN, etc.• Not merely an API,rather a system
  • 15.
    15Timeline of OpenGL’sDevelopment1992 1994 1996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)
  • 16.
    16Competitive 3D APIs•OpenGL has always existed incompetition with other APIs• Strengthened OpenGL by drivingfeature parity• OpenGL’s competitive strengths:1. Cross platform, open process2. API stability, extensibility3. Clean initial design & specification1992 1994 1996 1998 2000 2002 2004 2006 2008Proprietary Unix workstation 3D APIsXGLDoréStarbaseIRIS GLX Consortium 3D standardPEXMicrosoft Direct3DDirectX 3DirectX 5DirectX 6DirectX 7DirectX 8DirectX 9DirectX 10
  • 17.
    17OpenGL 1.01992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)•Immediate mode•Vertex transformation and lighting•Points, lines, polygons•Stippling, wide points and lines•Bitmaps, image rectangles, and pixel reads•Pixel store and transfer•1D and 2D textures, fog, and scissor•Display lists and evaluators•RGBA and color index color models•Color, depth, stencil, and accumulation buffers•Selection and feedback modes•Queries
  • 18.
    18OpenGL State Machine•From OpenGL 3.0 specification, unchanged since 1.0
  • 19.
    19SGI “Classic” HardwareView of OpenGL3D Applicationor Game• Entirely fixed-function, no programmability• High-end SGI hardware manifested functionalityin distinct chipsOpenGL APIFront EndVertexAssemblyVertexTransform & LightingPrimitive Assembly,Clipping, Setup,and RasterizationTexture &FogTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceGraphics HardwareBoundary1992Graphics data flowMemory operationsFixed-function unitProgrammable unit
  • 20.
    20OpenGL 1.11992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Vertex arrays• Texture objects• Texture internal formats• Texture sub-image updates• Texture proxies• Copy framebuffer-to-texture• Polygon offset• RGBA logical operations
  • 21.
    21The Look ofOpenGL 1.1SGI skyfly demoSGI skyfly demoStenciledStenciledshadow volumesshadow volumesIdeas in MotionIdeas in Motion
  • 22.
    22OpenGL 1.21992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• 3D textures• Texture edge clamp wrap mode• Texture level-of-detail clamping• BGRA component order• Packed pixel formats• Imaging subset (optional)• Normal rescaling• Separate specular• Vertex array draw elements range
  • 23.
    23Akeley’s (Modernized) OpenGLData Flowvertexshadingrasterization& fragmentshadingtexturerasteroperationsframebufferpixelunpackpixelpackvertexpullerclientmemorypixeltransfer glReadPixels / glCopyPixels / glCopyTex{Sub}ImageglDrawPixelsglBitmapglCopyPixelsglTex{Sub}ImageglCopyTex{Sub}ImageglDrawElementsglDrawArraysselection / feedback / transform feedbackglVertex*glColor*glTexCoord*etc.blendingdepth testingstencil testingaccumulationstorageoperations
  • 24.
    24OpenGL 1.2 ImagingSubsetColor TableConvolution(separable or general)Post-convolveScale & BiasPost-convolveColor TableColor MatrixPost-color matrixScale & BiasPost-color matrixColor TableHistogramMin-maxLook-up Table(RGBA-to-RGBA)Look-up Table(Index-to-RGBA)Scale & Bias Shift & AddIndex pixels RGBA pixelsPixel RectangleRasterizationcorefunctionalityARB_imagingsubsetdiscarddiscard
  • 25.
    25OpenGL 1.2.11992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Multi-texture(optional)
  • 26.
    26Multi-texture Poster Child:Quake2 Light Maps×(modulate)=lightmaps onlylightmaps onlydecal onlydecal onlycombined scenecombined scene
  • 27.
    27GeForce 256 (NV10)View of OpenGL3D Applicationor Game• Vertex pulling (vertex buffer objects) via DMA• Dual-texture, cube maps, and register combinersOpenGL APIGPUFront EndVertexAssemblyVertexTransform & LightingPrimitive Assembly,Clipping, Setup,and RasterizationTexture &FogTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary1999Attribute Fetch
  • 28.
    28Hardware Cube MapsRenderedsceneRendered sceneDynamicallyDynamicallycreatedcreatedcube map imagecube map imageImage credit:“Guts” GeForce 2 GTS demo,Thant Thessman
  • 29.
    29OpenGL 1.31992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Multi-texture (required now)• Cube map texturing• Compressed texture formats• Texture border clamp• Texture environment functions• Add, combine, dot product• Multisample anti-aliasing• Transpose matrix
  • 30.
    30GeForce 3 &4 Ti (NV2x) View of OpenGL3D Applicationor Game• Programmable vertex processing• Highly configurable fragment processingOpenGL APIGPUFront EndVertexAssemblyVertexProgramPrimitive Assembly,Clipping, Setup,and RasterizationMulti-textureshaders &CombinersTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary2001Attribute Fetch
  • 31.
    31Vertex ProgrammabilityPaletted matrixPalettedmatrixskinningskinningTwister vertex programTwister vertex programPer-vertexPer-vertexcartooncartoonshadingshading
  • 32.
    32Configurable Fragment ProcessingBumpyshiny environment mappingBumpy shiny environment mappingChromaticChromaticaberrationaberrationOffset 2D bumpOffset 2D bumpmappingmapping Depth spritesDepth sprites
  • 33.
    33OpenGL 1.41992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Automatic mipmap generation• Shadow-mapping• Depth textures and shadow comparisons• Texture level-of-detail bias• Texture mirrored repeat wrap mode• Multi-texture combination• Fog coordinate• Secondary color• Configurable point size attenuation• Color blending improvements• Stencil wrap operations• Window-space raster position specification
  • 34.
    34Hardware Shadow MappingWithoutshadow mappingWithout shadow mapping WithWith shadow mappingshadow mappingDepth map from lightDepth map from lightsource’s viewsource’s viewDarker is closerDarker is closerlightlightpositionpositionProjective Texturing (1.0) &Polygon Offset (1.1)key enablers
  • 35.
    35Shadow Mapping ExplainedPlanardistance from lightPlanar distance from light Depth map projected onto sceneDepth map projected onto scene≤≤ ==lesslessthanthanTrue “un-shadowed”True “un-shadowed”region shown greenregion shown greenequalsequals
  • 36.
    36OpenGL 1.51992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Vertex buffer objects (VBOs)• Occlusion queries• Generalized shadow mapping functions
  • 37.
    37GeForce FX (NV3x)View of OpenGL3D Applicationor Game• Programmable fragment processing• 16 texture units, IEEE 754 32-bit floating-point• Vertex program branchingOpenGL APIGPUFront EndVertexAssemblyVertexProgramPrimitive Assembly,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary2003Attribute Fetch
  • 38.
  • 39.
    39OpenGL FragmentProgram FlowchartMoreInstructions?ReadInterpolantsand/or RegistersMap Input values:Swizzle, Negate, etc.Perform InstructionMath / OperationWrite OutputRegister withMaskingBeginFragmentFetch & DecodeNext InstructionTemporaryRegistersinitialized to0,0,0,0OutputDepth & ColorRegistersinitialized to 0,0,0,1InitializeParametersEmit OutputRegisters asTransformedVertexEndFragmentFragmentProgramInstructionLoopFragmentProgramInstructionMemoryTextureFetchInstruction?yesnonoCompute TextureAddress & Level-of-detail & FetchTexelsFilterTexelsyesTextureImagesPrimitiveInterpolants
  • 40.
    40Key Trend:Configurability becomesProgrammabilityFixed-function ProgrammableSimpleConfigurabilityComplexConfigurability
  • 41.
    41Core OpenGL fragmenttexturing & coloringPointRasterizationLineRasterizationPolygonRasterizationPixel RectangleRasterizationBitmapRasterizationFromPrimitiveAssemblyDrawPixelsBitmapConventionalTexture FetchingTextureEnvironmentApplicationColor SumFogTo rasteroperationsCoverageApplicationTexture Unit 0Texture Unit 1Texture Unit 0Texture Unit 1
  • 42.
    42 NV1x OpenGLfragment texturing & coloringPointRasterizationLineRasterizationPolygonRasterizationPixel RectangleRasterizationBitmapRasterizationFromPrimitiveAssemblyDrawPixelsBitmapConventionalTexture FetchingTextureEnvironmentApplicationColor SumFogTo rasteroperationsCoverageApplicationRegisterCombinersTexture Unit 0General Stage 1Final StageTexture Unit 1General Stage 0Texture Unit 0Texture Unit 1GL_REGISTER_COMBINERS_NVenable
  • 43.
    43Texture Shader 3…TextureShader 1Texture Shader 0RegisterCombinersNV2x OpenGL fragment texturing & colorinPointRasterizationLineRasterizationPolygonRasterizationPixel RectangleRasterizationBitmapRasterizationFromPrimitiveAssemblyDrawPixelsBitmapConventionalTexture FetchingTextureEnvironmentApplicationColor SumFogTo rasteroperationsCoverageApplicationTexture ShadersGeneral Stage 1Final CombinerGeneral Stage 0General Stage 7…Texture Unit 3…Texture Unit 1Texture Unit 0Texture Unit 3…Texture Unit 1Texture Unit 0GLTEXTURE_SHADER_NVenableGL_REGISTER_COMBINERS_NVenable
  • 44.
    44Fragment ProgramInstruction 0TextureShader 3…Texture Shader 1Texture Shader 0NV3x OpenGL fragment texturing & coloringPointRasterizationLineRasterizationPolygonRasterizationPixel RectangleRasterizationBitmapRasterizationFromPrimitiveAssemblyDrawPixelsBitmapConventionalTexture FetchingTextureEnvironmentApplicationColor SumFogTo rasteroperationsCoverageApplicationTexture ShadersGeneral Stage 1Final CombinerGeneral Stage 0General Stage 7…Texture Unit 3…Texture Unit 1Texture Unit 0Texture Unit 3…Texture Unit 1Texture Unit 0…Fragment ProgramFragment ProgramInstruction 1023GL_REGISTER_COMBINERS_NVenableGLTEXTURE_SHADER_NVenableGL_FRAGMENT_PROGRAM_NVenable!!FP1.0 or!!ARBfp1.0programs
  • 45.
    45OpenGL 2.01992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• Programmable shading• OpenGL Shading Language (GLSL)• Multiple color buffer rendering targets• Non-power-of-two texture dimensions• Point sprites• Separate blend equation• Two-sided stencil testing
  • 46.
    46GeForce 6 &7 (NV4x/G7x) View of OpenGL3D Applicationor Game• Limited vertex texturing• Fragment branching• Multiple render targets & floating-point blendingOpenGL APIGPUFront EndVertexAssemblyVertexProgramPrimitive Assembly,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary2004Attribute Fetch
  • 47.
    47PrimitiveProgramGeForce 8 &9 (G8x/G9x) View of OpenGL3D Applicationor Game• Primitive (geometry) programs• Parameter reads from buffer objects• Transform feedback (stream out)OpenGL APIGPUFront EndVertexAssemblyVertexProgram,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory InterfaceCPU – GPUBoundary2006Attribute FetchPrimitiveAssemblyParameter Buffer Read
  • 48.
    48PrimitiveProgramOpenGL Pipeline Fixed-functionSteps• Much of functional pipeline remains fixed-function• Vital to maintaining performance and data flow• Hard to compete with hard-wired rasterization, Zcull, and pixel compressionGPUFront EndVertexAssemblyVertexProgram,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory Interface 2006Attribute FetchPrimitiveAssemblyParameter Buffer Read
  • 49.
    49PrimitiveProgramOpenGL Pipeline ProgrammableDomains• New geometry shader domain for per-primitive programmable processing• Unified Streaming Processor Array (SPA) architecture means same capabilitiesfor all domainsGPUFront EndVertexAssemblyVertexProgram,Clipping, Setup,and RasterizationFragmentProgramTexture FetchRasterOperationsFramebuffer AccessMemory Interface 2006Attribute FetchPrimitiveAssemblyParameter Buffer ReadCan beunifiedhardware!
  • 50.
    50OpenGL 2.11992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• OpenGL Shading Language(GLSL) improvements• Non-square matrices• Pixel buffer objects (PBOs)• sRGB color space texture formats
  • 51.
    51OpenGL 3.01992 19941996 1998 2000 2002 2004 2006 2008OpenGL 1.0 approvedOpenGL 1.1OpenGL 1.2Multitexture added (1.2.1)OpenGL 1.3OpenGL 1.4OpenGL 1.5OpenGL 2.0OpenGL 2.1OpenGL 3.0SGIInfinite-RealityOpenGL UtilityToolkit (GLUT)releasedMesa3DopensourceKhronoscontrolsOpenGL1stGPU for PCswith single-chiptransform &lighting forOpenGL(GeForce)NT 3.51bringOpenGLto PCsOpenGL ES for embedded devices1stcommercialOpenGLimplementation(DEC)• OpenGL Shading Language (GLSL) improvements• New texture fetches• True integer data types and operators• switch/case/default flow control statements• Conditional rendering based on occlusion query results• Transform feedback• Vertex array objects• Floating-point textures, color buffers, and depth buffers• Half-precision vertex arrays• Texture arrays• Integer textures• Red and red-green texture formats• Compressed red and red-green formats• Framebuffer objects (FBOs)• Packed depth-stencil pixel formats• Per-color buffer clearing, blending, and masking• sRGB color space color buffers• Fine-grain buffer mapping and flushing
  • 52.
    52Areas of 3.0Functionality Improvement• Programmability• Shader Model 4.0 features• OpenGL Shading Language (GLSL) 1.30• Texturing• New texture representations and formats• Framebuffer operations• Framebuffer objects• New formats• New copy (blit), clear, blend, and masking operations• Buffer management• Non-blocking and fine-grain update of buffer object data stores• Vertex processing• Vertex array configuration objects• Conditional rendering for occlusion culling• New half-precision vertex attribute formats• Pixel processing• New half-precision external pixel formatsAll BrandNewCoreFeatures
  • 53.
    53OpenGL 3.0 Programmability•Shader Model 4.0 additions• True signed & unsigned integer values• True integer operators: ^, &, |, <<. >>, %,~• Texture additions• Texture arrays• Base texture size queries• Texel offsets to fetches• Explicit LOD and derivative control• Integer samplers• Interpolation modifiers: centroid, noperspective, and flat• Vertex array element number: gl_VertexID• OpenGL Shading Language (GLSL) improvements• ## concatenation in pre-processor for macros• switch/case/default statements
  • 54.
    54OpenGL 3.0 TexturingFunctionality• Texture representation• Texture arrays: indexed access to a set of 1D or 2Dtexture images• Texture formats• Floating-point texture formats• Single-precision (32-bit, IEEE s23e8)• Half-precision (16-bit, s10e5)• Red & red/green texture formats• Intended as FBO framebuffer formats too• Compressed red & red/green texture formats• Shared exponent texture formats• Packed floating-point texture formats
  • 55.
    55Texture Arrays• Conventionaltexture = One logical pre-filtered image• Texture array = index-able plurality of pre-filtered images• Rationale is fewer texture object binds when drawing different objects• No filtering between mipmap sets in a texture array• All mipmap sets in array share same format/border & base dimensions• Both 1D and 2D texture arrays• Require shaders, no fixed-function support• Texture image specification• Use glTexImage3D, glTexSubImage3D, etc. to load 2D texture arrays• No new OpenGL commands for texture arrays• 3rddimension specifies integer array index• No halving in 3rddimension for mipmaps• So 64×128x17 reduces to 32×64×17all the way to 1×1×17
  • 56.
    56Texture Arrays Example•Multiple skins packed in texture array• Motivation: binding to one multi-skin texture array avoids texturebind per objectTexture array index0 1 2 3 401234Mipmaplevelindex
  • 57.
    57Compact Floating-point Textures•Shared exponent & packed float representations are idealof High Dynamic Range (HDR) applications
  • 58.
    58Compact Floating-point TextureFormats• Packed float format• No sign bit, independent exponents• Shared exponent format• No sign bit, shared exponent, no implied leading 15-bitmantissa5-bitexponent6-bitmantissa5-bitexponent6-bitmantissa5-bitexponentbit 31 bit 09-bitmantissa5-bitshared exponent9-bitmantissa9-bitmantissabit 31 bit 0
  • 59.
    591- and 2-componentBlockCompression Scheme• Basic 1-component block compression format• Borrowed from alpha compression scheme of S3TC 58-bit B8-bit A2 min/maxvalues64 bits total per block+4x4 Pixel Decoded BlockEncoded Block16 pixels x 8-bit/componet = 128 bits decodedso effectively 2:1 compression16 bits
  • 60.
    60Framebuffer Operations• Framebufferobjects• Standardized framebuffer objects (FBOs) for rendering to texturesand renderbuffers• Render-to-texture• Multisample renderbuffers for FBOs• Framebuffer operations• Copies from one FBO to another, including multisample data• Per-color attachment color clears, blending, and write masking• Framebuffer formats• Floating-point color buffers• Floating-point depth buffers• Rendering into framebuffer format with 3 small unsigned floating-point values packed in a 32-bit value• Rendering into sRGB color space framebuffers
  • 61.
    61Framebuffer Object Example•Depth peeling for correctly ordered transparency• Great render-to-texture application for FBOs
  • 62.
    62Depth Peeling Behindthe Scenes• Depth buffer has closest fragment at all pixels• Save depth buffer• Render again, but use depth buffer asshadow map• Discard fragment in front of shadowmap’s depth value• Effectively peels one layer of depth!• Resulting color buffer is 2ndclosest fragment• And depth buffer for 2ndclosestfragments’ depth• Now repeat peeling more layers• Use ping-pong depth buffer scheme• Use occlusion query to detect when nomore fragments to peel• Composite color layers front-to-back (or back-to-front)• Front-to-back peeling can be done duringthe peeling process
  • 63.
    63Delicate Color Fidelitywith sRGB• Problem: PC display devices have non-linear (sRGB) display gamut—delicate color shading looks wrongConventionalrendering(uncorrectedcolor)Gammacorrect(sRGBrendered)SofterandmorenaturalUnnaturallydeep facialshadowsNVIDIA’s Adriana GeForce 8 Launch Demo
  • 64.
    64What is sRGB?•A standard color space• Intended for monitors, printers, and the Internet• Created cooperatively by HP and Microsoft• Non-linear, roughly gamma of 2.2• Intuitively “encodes more dark values”• OpenGL 2.1 already added sRGB texture formats• Texture fetch converts sRGB to linear RGB, then filters• Result takes more than 8-bit fixed-point to represent in shader• 3.0 adds complementary sRGB framebuffer support• “sRGB correct blending” converts framebuffer sRGB to linear,blend with linear color from shader, then convert back to sRGB• Works with FrameBuffer Objects (FBOs)sRGB chromaticity
  • 65.
    65So why sRGB?Standard Windows Displayis Not Gamma Corrected• 25+ years of PC graphics, icons, and images depend on not gammacorrecting displays• sRGB textures and color buffers compensates for this“Expected” appearance ofWindows desktop & iconsbut 3D lighting too darkWash-ed out desktop appearance ifcolor response was linearbut 3D lighting is correctGamma1.0Gamma2.2linearcolorresponse
  • 66.
    66Vertex Processing• Vertexarray configuration• Objects to manage vertex array configuration clientstate• Half-precision floating-point vertex array formats• Vertex output streaming• Stream transformed vertex results into buffer objectdata stores• Occlusion culling• Skip rendering based on occlusion query result
  • 67.
    67Miscellaneous• Pixel Processing•Half-precision floating-point pixel external formats• Buffer Management• Non-blocking and fine-grain update of buffer object datastores
  • 68.
    68ARB Extensions toOpenGL 3.0• OpenGL 3.0 standard provides new ARB extensions• Extensions go beyond OpenGL 3.0• Standardized at same time as OpenGL 3.0• Support features in hardware today• Specifically• ARB_geometry_shader4—provides per-primitive programmableprocessing• ARB_draw_instanced—gives shader access to instance ID• ARB_texture_buffer_object—allows buffer object to be sampledas a huge 1D unfiltered texture• Shipping today• NVIDIA driver provides all three
  • 69.
    69Transform Feedback forTerrain Generationby Recursive Subdivision• Geometry shaders + transform feedback1. Render quads (use 4-vertex line adjacencyprimitive) from vertex buffer object2. Fetch height field3. Stream subdivided positions and normalsto transform feedback “other” bufferobject4. Use buffer object as vertex buffer5. Repeat, ping-pong buffer objectsComputation and data all stays on the GPU!
  • 70.
    70Skin Deformation• Capture& re-use geometric deformationsTransformfeedback allowsthe GPU tocalculate theinteractive,deforming elasticskin of the frog
  • 71.
    71Silhouette Edge Rendering•Uses geometry shadersilhouetteedgedetectiongeometryshaderComplete meshSilhouette edgesUseful for non-photorealisticrenderingLooks like human sketching
  • 72.
    72More Geometry ShaderExamplesShimmeringpoint spritesGeneratefins forlinesGenerateshells forfurrendering
  • 73.
    73Improved Interpolation Techniques•Usinggeometry shader functionalityQuadratic normalinterpolationTrue quadrilateral rendering withmean value coordinate interpolation
  • 74.
    74“Fair” Quadrilateral Interpolation•glBegin(GL_QUADS);• glColor3fv(red);glVertex3fv(lowerLeft);• glColor3fv(green);glVertex3fv(lowerRight);• glColor3fv(red);glVertex3fv(upperRight);• glColor3fv(blue);glVertex3fv(upperLeft);• glEnd();• Geometry shader actually operates on4-vertex GL_LINE_ADJACENCYprimitives instead of quadsWrong, slashtriangle splitWrong, backslashtriangle splitBetter: Mean valuecoordinates
  • 75.
    75OpenGL 2.x ARBExtensions• Many OpenGL 3.0 extensions have corresponding ARB extensions forOpenGL 2.1 implementations to advertise• Helps get 3.0 functionality out sooner, rather than later• New ARB extensions for 3.0 functionality• ARB_framebuffer_object—framebuffer objects (FBOs) for render-to-texture• ARB_texture_rg—red and red/green texture formats• ARB_map_buffer_region—non-blocking and fine-grain update of bufferobject data stores• ARB_instanced_arrays—instance ID available to shaders• ARB_half_float_vertex—half-precision floating-point vertex array formats• ARB_framebuffer_sRGB—rendering into sRGB color space framebuffers• ARB_texture_compression_rgtc—compressed red and red/green textureformats• ARB_depth_buffer_float—floating-point depth buffers• ARB_vertex_array_object—objects to manage vertex array configurationclient state
  • 76.
    76Beyond OpenGL 3.0OpenGL3.0• EXT_gpu_shader4• NV_conditional_render• ARB_color_buffer_float• NV_depth_buffer_float• ARB_texture_float• EXT_packed_float• EXT_texture_shared_exponent• NV_half_float• ARB_half_float_pixel• EXT_framebuffer_object• EXT_framebuffer_multisample• EXT_framebuffer_blit• EXT_texture_integer• EXT_texture_array• EXT_packed_depth_stencil• EXT_draw_buffers2• EXT_texture_compression_rgtc• EXT_transform_feedback• APPLE_vertex_array_object• EXT_framebuffer_sRGB• APPLE_flush_buffer_range (modified)In GeForce 8, 9, & 2xx Seriesbut not yet core• EXT_geometry_shader4 (now ARB)• EXT_bindable_uniform• NV_gpu_program4• NV_parameter_buffer_object• EXT_texture_compression_latc• EXT_texture_buffer_object (now ARB)• NV_framebuffer_multisample_coverage• NV_transform_feedback2• NV_explicit_multisample• NV_multisample_coverage• EXT_draw_instanced (now ARB)• EXT_direct_state_access• EXT_vertex_array_bgra• EXT_texture_swizzlePlenty of proven OpenGL extensionsfor OpenGL Working Groupto draw upon for OpenGL 3.1
  • 77.
    77OpenGL Version Evolution•Now OpenGL is part of Khronos Group• Previously OpenGL’s evolution was governed by the OpenGLArchitectural Review Board (ARB)• Now officially a Khronos working group• Khronos also standardizes OpenCL, OpenVG, etc.• How OpenGL version updates happen• OpenGL participants proposing extensions• Successful extensions are polished and incorporated into core• OpenGL 3.0 is great example of this process• Roughly 20 extensions folded into “core”• Just 3 of those previously unimplemented
  • 78.
    7829%17%15%15%4%2%2%2%2%2%2%2%1% 1%4%15%Multi-vendorSilicon GraphicsArchitecturalReview BoardNVIDIAATIAppleMesa3DSun MicrosystemsOpenGL ESOpenMLIBMIntense3DHewlett Packard3DfxOtherEXTSGISGISSGIXARBNVOthersOthersOpenGL Extensions by Source• 44% of extensions are “core” or multi-vendor• Lots of vendors have initiated extensions• Extending OpenGL is industry-wide collaborationATIAPPLEMESASource: http://www.opengl.org/registry (Dec 2008)
  • 79.
    79What’s Driving OpenGLModernization?Human desire for VisualIntuition and EntertainmentEmbarrassingParallelism ofGraphicsIncreasingSemiconductorDensityParticularly thehardware-amenable,latency tolerantnature of rasterization Particularlyinteractive video games
  • 80.
    80Kurt AkeleyPrincipal ResearcherMicrosoftResearch Silicon ValleyOpenGL’s Evolution:A Personal Retrospective
  • 81.
    81AA personalpersonal retrospectiveretrospective•My background:• Silicon Graphics, 1982-2001• OpenGL, 1990-2004• Today’s topics:• Computer architecture• Culture and process• For a more complete coverage see:• https://graphics.stanford.edu/wikis/cs448-07-spring/• Mark Kilgard’s excellent course notes
  • 82.
    82Jim Clark andthe Geometry EngineJim Clark and the Geometry Engine• This text is 24 points– Sub bullets look like thisThe Geometry Engine: A VLSI Geometry System for GraphicsComputer Graphics, Volume 16, Number 3(Proceedings of SIGGRAPH 1982) p127-133, 1982
  • 83.
    83Jim’s helpers: theStanford gangJim’s helpers: the Stanford gangIRIS GLGeometry EngineIRIS GLHardware back-endHardware front-end
  • 84.
  • 85.
  • 86.
    86What is computerarchitecture?What is computer architecture?• Architecture: “the minimal set ofproperties that determine what programswill run and what results they will produce”• Implementation: “the logicalorganization of the [computer’s] dataflowand controls”• Realization: “the physical structureembodying the implementation”
  • 87.
    87Example: the analogclockExample: the analog clock• Architecture• Circular dial divided into twelfths• Hour hand (short) and minute hand (long)Example from Computer Architecture, Concepts and Evolution,Gerrit A. Blaauw and Frederick P. Brooks, Jr., Addison-Wesley, 1997• Implementation• A weight, driving a pendulum, or• A spring, driving a balance wheel, or• A battery, driving an oscillator, or ….• Realization• Gear ratios, pendulum lengths, battery sizes, ...1211106897 54213
  • 88.
    88A useful distinctionAuseful distinction• NVIDIA 8800• SIMD, or• SPMD ?L2FBSP SPL1TFThreadProcessorVertex Thread IssueSetup / Rasterization / ZCullPrimitive Thread Issue Fragment Thread IssueData AssemblerApplicationSP SPL1TFSP SPL1TFSP SPL1TFSP SPL1TFSP SPL1TFSP SPL1TFSP SPL1TFL2FBL2FBL2FBL2FBL2FB• Architecture:• SPMD• Implementation:• SIMD• Realization:• ASICSIMD = Single Instruction, Multiple DataSPMD = Single Program, Multiple DataASIC = Application Specific Integrated Circuit
  • 89.
    89The mainstream viewThemainstream view• Table of Contents:• Fundamentals• Instruction Sets• Pipelining• Advanced Pipelining and ILP• Memory-Hierarchy Design• Storage Systems• Interconnection Networks• Multiprocessors
  • 90.
    90OpenGL is anarchitectureBlaauw/Brooks OpenGLDifferentimplementationsIBM 360 30/40/50/65/75AmdahlSGI Indy/Indigo/InfiniteRealityNVIDIA GeForce, ATI Radeon, …CompatibilityCode runs equivalently on allimplementationsTop-level goalConformance tests, …Intentional designIt’s an architecture, whether it wasplanned or not .Carefully planned, though mistakeswere madeConfigurationCan vary amount of resource (e.g.,memory)No feature sub-settingConfiguration attributes (e.g.,framebuffer)Speed Not a formal aspect of architecture No performance queriesValidity of inputs No undefined operationAll errors specifiedNo side effectsLittle undefined operationEnforcementWhen implementation errors arefound, they are fixed.Specification rules!
  • 91.
    91But OpenGL isan APIBut OpenGL is an API(Application Programming Interface)(Application Programming Interface)• Yes, Blaauw and Brooks talk about (computer) architectureas though it is always expressed as ISA (Instruction-SetArchitecture)• But …• API is just a higher-level programming interface• “Instruction-Set” Architecture implies other types ofcomputer architectures (such as “API” Architecture)• OpenGL has evolved to include ISA-like interfaces(e.g., the interface below GLSL)
  • 92.
    92We didn’t know…We didn’t know …• No mention in spec (even 3.0)• “We view OpenGL as a state …”• First use in “ARB”• Architecture Review Board• Coined by Bill Glazier from “PaloAlto Architecture Review Board”• First formal usage (I know of)• Mark J. Kilgard, Realizing OpenGL: two implementations of onearchitecture, Proceedings of the ACM SIGGRAPH/EUROGRAPHICSworkshop on Graphics hardware, p.45-55, August 03-04, 1997,Los Angeles, California, United States.
  • 93.
  • 94.
    94What is impliedby “programmable”?What is implied by “programmable”?• What does it mean to teach programming?• Does running a microwave oven count?• Does defining the geometry of a game “level” count?• Does specifying OpenGL modes count?• This seems to be a somewhat open question• Butler Lampson couldn’t tell me .• Microsoft developers of teaching tools couldn’t tell me.• An online search wasn’t very helpful.• Do we just “know it when we see it”?• Justice Potter Stewart’s definition of pornography
  • 95.
    95My try atsome formalizationMy try at some formalization• Key ideas:• Composition  choice of placement, sequence• Non-obvious  semantics are interesting and novel• Imperative  maybe there are other kinds of programming“Composition, the organization of elementaloperations into a non-obvious whole, is theessence of imperative programming.”-- Kurt Akeley (Foreword to GPU Gems 3)
  • 96.
    96OpenGL has alwaysbeen programmableOpenGL has always been programmable• Follows directly from being an “architecture”• OpenGL commands are instructions (API as an ISA)• They can be “composed” to create programs• Multi-pass rendering is the prototypical example• But Peercy et al. implemented a RenderMan shader compiler• Invariance was specified from the start (e.g., same fragments)• We set out to enable “usage that we didn’t anticipate”• Obvious for a traditional ISA (e.g., IA32)• Not so obvious for a graphics API• Example: texture applies to all primitives, not just triangles
  • 97.
    97Example multi-pass OpenGL“program”Example multi-pass OpenGL “program”glEnable(GL_DEPTH_TEST);glDisable(GL_LIGHTING);glColorMask(false, false, false, false);glEnable(GL_POLYGON_OFFSET_FILL);glPolygonOffset(maxwidth/2, 1);draw solid objectsglDepthMask(GL_FALSE);glColorMask(true, true, true, true);glColor3f(linecolor);glDisable(GL_POLYGON_OFFSET_FILL);glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);draw solid objects againglDisable(GL_DEPTH_TEST);glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);glDepthMask(GL_TRUE);Hidden-line rendering
  • 98.
    98Example multi-pass OpenGL“program”Example multi-pass OpenGL “program”glEnable(GL_DEPTH_TEST);glDisable(GL_LIGHTING);glColorMask(false, false, false, false);glEnable(GL_POLYGON_OFFSET_FILL);glPolygonOffset(maxwidth/2, 1);draw solid objectsglDepthMask(GL_FALSE);glColorMask(true, true, true, true);glColor3f(1, 1, 1);glDisable(GL_POLYGON_OFFSET_FILL);glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);glEnable(GL_CULL_FACE);glCullFace(GL_FRONT);draw solid objects againdraw true edges // for a complete hidden-line drawingglDisable(GL_DEPTH_TEST);glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);glDepthMask(GL_TRUE);glDisable(GL_CULL_FACE);Additions to thehidden-line algorithm(previous slide)highlighted in redSilhouette rendering
  • 99.
    99InvarianceInvarianceCorollary 1 Fragmentgeneration is invariant with respect tothe state values marked with in Rule 2.
  • 100.
    100• Intended tocapture completesequence of operations• Also inspired design changes
  • 101.
    101Vertex assemblyPrimitive assemblyRasterizationFragmentoperationsDisplayVertex operationsApplicationPrimitive operationsFramebufferTexture memoryPixel assembly(unpack)Pixel operationsPixel packVertex pipelinePixel pipelineApplicationAll primitives(including pixels) arerasterizedAll vertexes aretreated equally(e.g., lighted)All fragments aretreated equally (e.g.,texture mapped anddepth-buffered)Not a requiredimplementation,but “abstractiondistance” matters
  • 102.
  • 103.
    103Suppose …Suppose …http://www.opengl.org/registry/NameARB_texture_cube_mapNameStringsGL_ARB_texture_cube_mapNoticeCopyright OpenGL Architectural Review Board, 1999.ContactMichael Gold, NVIDIA (gold 'at' nvidia.com)StatusComplete. Approved by ARB on 12/8/1999VersionLast Modified Date: December 14, 1999NumberARB Extension #7DependenciesNone.Written based on the wording of the OpenGL 1.2.1 specification but not dependent on it.OverviewThis extension provides a new texture generation scheme for cube map textures. Instead of thecurrent texture providing a 1D, 2D, or 3D lookup into a 1D, 2D, or 3D texture image, the texture is aset of six 2D images representing the faces of a cube. The (s,t,r) texture coordinates …
  • 104.
    104Complete specificationComplete specificationNameNameStringsNoticeContactStatusVersionNumberDependenciesOverviewIssuesNew Procedures and FunctionsNew TokensAdditions to Chapter 2 of the OpenGL SpecificationAdditions to Chapter 3 of the OpenGL SpecificationAdditions to Chapter 4 of the OpenGL SpecificationAdditions to Chapter 5 of the OpenGL SpecificationAdditions to Chapter 6 of the OpenGL SpecificationAdditions to the GLX SpecificationErrorsNew State (type, query mechanism, initial value, attribute set, specification section)Usage Examples
  • 105.
    10519 issues19 issuesThespec just linearly interpolates the reflection vectors computedper-vertex across polygons. Is there a problem interpolatingreflection vectors in this way?Probably. The better approach would be to interpolate the eyevector and normal vector over the polygon and perform the reflectionvector computation on a per-fragment basis. Not doing so is likelyto lead to artifacts because angular changes in the normal vectorresult in twice as large a change in the reflection vector as normalvector changes. The effect is likely to be reflections that becomeglancing reflections too fast over the surface of the polygon.Note that this is an issue for REFLECTION_MAP_ARB, but notNORMAL_MAP_ARB.
  • 106.
    10619 issues …19issues …What happens if an (s,t,q) is passed to cube map generation thatis close to (0,0,0), ie. a degenerate direction vector?RESOLUTION: Leave undefined what happens in this case (butmay not lead to GL interruption or termination).Note that a vector close to (0,0,0) may be generated as aresult of the per-fragment interpolation of (s,t,r) betweenvertices.
  • 107.
    107Trust and integrityTrustand integrity• Lots of collaboration during the initial design• But final decisions made by a small group• SGI played fair• OpenGL 1.0 didn’t favor SGI equipment (our ports were late )• SGI obeyed all conformance rules• SGI didn’t adjust the spec to match our equipment• The ARB avoided marketing tasks such as benchmarks• We stuck with technical design issues• We documented rigorously• Specification, man pages, …
  • 108.
    108Five Kinkos inAustin TexasFive Kinkos in Austin TexasThe OpenGL Graphics System: A Specification (Version 1.1)Mark SegalKurt AkeleyEditor: Chris FrazierCopyright © 1992-1997 Silicon Graphics, Inc.This document contains unpublished information ofSilicon Graphics, Inc.
  • 109.
    109Extension factsExtension facts•442 Vendor and “EXT” extension specifications• Vendor: specific to a single vendor• EXT: shared by two or more vendors• 56 “ARB” extensions• Standardized , likely to be in the next spec revision• Lots of text …Source: OpenGL extension registry, December 2008
  • 110.
    110““Specification” sizesSpecification” sizesLinesWords Chars56 ARB Extensions 48,674 263,908 2,221,347All 442 Extensions 209,426 1,076,008 9,079,063King James Bible 114,535 823,647 5,214,085New Testament 27,319 188,430 1,197,812Old Testament 86,783 632,515 3,998,303
  • 111.
    111Beyond the specificationBeyondthe specification• The ARB (now replaced with Khronos)• Rules of order, secretary, IP, …• The extension process• Categories, token syntax, spec templates, enums,registry, …• Licensing• Conformance• …
  • 112.
    112SummarySummary• Many mistakesmade (see other presentations for lists)• Created a sustainable culture that values quality andrigorous documentation• Defined and evolved the architecture for interactive 3-Dcomputer graphics
  • 113.
    113Writing better OpenGLMarkKilgardPrincipal System Software EngineerNVIDIA
  • 114.
    114Motivation• Complex APIsand systems have pitfalls• After 17 years of designed evolution, OpenGLcertainly has its share• Normal documentation focus:• What can you do?• Rather than: What should you do?
  • 115.
    115Communicating Vertex Data•The way you learn OpenGL:• Immediate mode• glBegin, glColor3f, glVertex3f, glEnd• Straightforward—no ambiguity about vertex data is• All vertex components are function parameters• The problem—too function call intensive• And all vertex data must flow through CPU
  • 116.
    116Example Scenario• AnOpenGL application has to render a set of rectangles• Rectangle with its parameters• x, y, height, width, left color, right color, depth(x,y)depth order0.01.0left side colorright side colorheightwidth
  • 117.
    117Scene Representation• Eachrectangle specified by following RectInfo structure:• Array of RectInfo structures describes “scene”• Simplistic scene for sake of teachingtypedef struct {GLfloat x, y, width, height;GLfloat depth_order;GLfloat left_side_color[3]; // red, green, thenblueGLfloat right_side_color[3]; // red, green, thenblue} RectInfo;
  • 118.
    118Example Scene andRendering Result• Scene of 4 rectangles:RectInfo rect_list[4] = {{ 10, 20, 180, 140, 0.5,{ 1, 1, 1 }, { 1, 0, 1 } },{ 30, 40, 100, 60, 0.5,{ 1, 0, 0 }, { 0, 0, 1 } },{ 140, 60, 100, 80, 0.5,{ 0, 0, 1 }, { 0, 1, 0 } },{ 70, 120, 80, 60, 0.7,{ 1, 1, 0 }, { 0, 1, 1 } },};• OpenGL-rendered result
  • 119.
    119Immediate Mode RectangleRendering• Given sized RectInfo array, render vertices of quads1stvertex2ndvertex3rdvertex4thvertexvoid drawRectangles(int count, const RectInfo *list){glBegin(GL_QUADS);for (int i=0; i<count; i++) {const RectInfo *r = &list[i];glColor3fv(r->left_side_color);glVertex3f(r->x, r->y, r->depth_order);glColor3fv(r->right_side_color);glVertex3f(r->x+r->width, r->y, r->depth_order);// right_side_color “sticks”glVertex3f(r->x+r->width, r->y+r->height, r->depth_order);glColor3fv(r->left_side_color);glVertex3f(r->x, r->y+r->height, r->depth_order);}glEnd();}Foreachrectangle
  • 120.
    120Critique of ImmediateMode• Advantages• Straightforward to code and debug• Easy-to-understand conceptual model• Building stream of vertices with OpenGL commands• Avoids driver & application copies of vertex data• Flexible, allowing totally dynamic vertex generation• Disadvantages• Rendering continuously streams attributes through CPU• Pollutes CPU cache with vertex data• Function call intensive• Unable to saturate fast graphics hardware• CPUs just too slow• Contrast with vertex array approach…
  • 121.
    121Vertex Array Approach•Step 1: Copy vertex attributes into vertex arrays• From: RectInfo array (CPU memory)• To: interleaved arrays of vertex attributes (CPUmemory)• Step 2: To render• Configure OpenGL vertex array client state• Use glEnableClientState, glVertexPointer,glColorPointer• Render quads based on indices into vertex arrays• Use glDrawArrays
  • 122.
    122Vertex Array Format•Interleave vertex attributes in color & position arrayscolorpositionfloat = 4 bytesvertex 0vertex 1redgreenbluexyzredgreenbluexyzcolorposition24 bytesper vertex
  • 123.
    123Step 1:Copy RectangleAttributes to Vertex Arraysvoid *initVarrayRectangles(int count, const RectInfo *list){void *varray = (char*) malloc(sizeof(GLfloat)*6*4*count);GLfloat *p = varray;for (int i=0; i<count; i++, p+=24) {const RectInfo *r = &list[i];// quad vertex #1memcpy(&p[0], r->left_side_color, sizeof(GLfloat)*3);p[3] = r->x; p[4] = r->y; p[5] = r->depth_order;// quad vertex #2memcpy(&p[6], r->right_side_color, sizeof(GLfloat)*3);p[9] = r->x+r->width; p[10] = r->y; p[11] = r->depth_order;// quad vertex #3memcpy(&p[12], r->right_side_color, sizeof(GLfloat)*3);p[15] = r->x+r->width; p[16] = r->y+r->height; p[17] = r->depth_order;// quad vertex #4memcpy(&p[18], r-> left_side_color, sizeof(GLfloat)*3);p[21] = r->x; p[22] = r->y+r->height; p[23] = r->depth_order;}return varray;}
  • 124.
    124Step 2:Configure &Render from Vertex Arraysvoid drawVarrayRectangles(int count, const RectInfo *list){char *varray = initVarrayRectangles(count, list);const GLfloat *p = (const GLfloat*) varray;const GLsizei stride = sizeof(GLfloat)*6;//3 RGB floats,3 XYZ floatsglColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0);glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3);glEnableClientState(GL_COLOR_ARRAY);glEnableClientState(GL_VERTEX_ARRAY);glDrawArrays(GL_QUADS, /*firstIndex*/0,/*indexCount*/count*4);free(varray);}
  • 125.
    125Critique ofSimplistic VertexArray Rendering• Advantages• Far fewer OpenGL commands issued• Disadvantages• Every render with drawVarrayRectangles callsinitVarrayRectangles• Allocates, initializes, & frees vertex array memoryevery render• Improve by separating vertex array construction fromrendering
  • 126.
    126Initialize Once, RenderMany Approach• This routine expects base pointer returned byinitVarrayRectanglesvoid drawInitializedVarrayRectangles(int count, const void *varray){const GLfloat *p = (const GLfloat*) varray;const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floatsglColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0);glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3);// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);}
  • 127.
    127Client Memory VertexAttribute TransferGPUProcessorcommandprocessorvertexpullerhardwarerenderingpipelineCPUcommand queueCPU writes ofcommand + vertex dataGPU DMA transfer ofcommand + vertex dataapplication(client)memoryvertexarrayvertexdata travelsthroughCPUmemoryreadsCPU
  • 128.
    128Vertex Buffer ObjectVertex Attribute PullingOpenGL(vertex)bufferobjectGPUcommandprocessorvertexpullerhardwarerenderingpipelineCPUcommand queueCPU writes ofcommand + vertex indicesvertexarrayGPU DMA transfer ofcommand dataapplication(client)memorymemoryreadsCPUGPU DMAtransferof vertexdata—CPU never reads data
  • 129.
    129Initializing Vertex BufferObjects (VBOs)• Once using vertex arrays, easy to switch to VBOs• Make the vertex array as before• Then bind to buffer object and copy data to the buffervoid initVarrayRectanglesInVBO(GLuint bufferName,int count, const RectInfo *list){char *varray = initVarrayRectangles(count, list);const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floatsconst GLint numVertices = 4*count;const GLsizeiptr bufferSize = stride*numVertices;glBindBuffer(GL_ARRAY_BUFFER, bufferName);glBufferData(GL_ARRAY_BUFFER, bufferSize, varray, GL_STATIC_DRAW);free(varray);}
  • 130.
    130Rendering from VertexBuffer Objects• Once initialized, glBindBuffer to bind to buffer ahead ofvertex array configuration• Send offsets instead of pointsvoid drawVarrayRectanglesFromVBO(GLuint bufferName,int count){const char *base = NULL;const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floatsglBindBuffer(GL_ARRAY_BUFFER, bufferName);glColorPointer(/*rgb*/3, GL_FLOAT, stride, base+0*sizeof(GLfloat));glVertexPointer(/*xyz*/3, GL_FLOAT, stride, base+3*sizeof(GLfloat));// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);}
  • 131.
    131Understanding glBindBuffer• Bufferobject bindings are frequent point of confusion forprogrammers• What does glBindBuffer do really?• Lots of buffer binding targets:• GL_ARRAY_BUFFER target—for vertex attribute arrays• Query with GL_ARRAY_BUFFER_BINDING• GL_ARRAY_ELEMENT_BUFFER target—for vertex indices,effectively topology• Query with GL_ELEMENT_ARRAY_BUFFER_BINDING• Each vertex array has its own buffer, query with• GL_VERTEX_ARRAY_BUFFER_BINDING• GL_COLOR_ARRAY_BUFFER_BINDING• GL_TEXCOORD_ARRAY_BUFFER_BINDING, etc.
  • 132.
    132Bind and QueryBuffer TargetsBuffer Bind Tokens• GL_ARRAY_BUFFER• GL_ELEMENT_ARRAY_BUFFERBuffer Query Tokens• GL_ARRAY_BUFFER_BINDING• GL_ELEMENT_ARRAY_BUFFER_BINDING• GL_COLOR_ARRAY_BUFFER_BINDING• GL_VERTEX_ARRAY_BUFFER_BINDING• GL_FOGCOORD_ARRAY_BUFFER_BINDING• GL_TEXCOORD_ARRAY_BUFFER_BINDING• GL_VERTEX_ATTRIB_ARRRAY_BUFFER_BINDINGTarget tokensfor glBindBufferQuery tokensto glGetIntegervQuery tokensto glGetVertexAttribiv
  • 133.
    133Latched Vertex ArrayBuffer Bindings• Here’s the confusing part:glBindBuffer(GL_ARRAY_BUFFER, 34);glColorPointer(3, GL_FLOAT, color_stride,(void*)color_offset);• The glBindBuffer doesn’t change any vertex arraybinding• The GL_ARRAY_BUFFER_BINDING state thatglBindBuffer sets does not itself affect rendering• It is the glColorPointer call that latches the array bufferbinding to change the color array’s buffer binding!• Same with all vertex array buffer bindings
  • 134.
    134Binding Buffer Zerois Special• By default, vertex arrays don’t access buffer objects• Instead client memory is accessed• This is because• The initial buffer binding for a context is zero• And zero is special• Zero means access client memory• You can always resume client memory vertex array access for a given array like thisglBindBuffer(GL_ARRAY_BUFFER, 0); // use client memoryglColorPointer(3, GL_FLOAT, color_stride, color_pointer);• Different treatment of the “pointer” parameter to vertex array specification commands• When the current array buffer binding is zero, the pointer value is a clientmemory pointer• When the current array buffer binding is non-zero (meaning it names a bufferobject), the pointer value is “recast” as an offset from the beginning of the buffer• Once again• The glBindBuffer(GL_ARRAY_BUFFER,0) call alone doesn’t change any vertexarray buffer bindings• It takes a vertex array specification command such as glColorPointer to latch thezeroensures compatibilitywith pre-VBO OpenGL
  • 135.
    135Texture Coordinate SetSelector• A selector in OpenGL is• A state variable that controls what state a subsequent commandupdates• Examples of commands that modify selectors• glMatrixMode, glActiveTexture, glClientActiveTexture• A selector is different from latched state• Latched state is a specified value that is set (or “latched”) whena subsequent command is called• Pitfall warning: glTexCoordPointer both• Relies on the glClientActiveTexture command’s selector• And latches the current array buffer binding for the selectedtexture coordinate vertex array• ExampleglBindBuffer(GL_ARRAY_BUFFER, 34);glClientActiveTexture(GL_TEXTURE3);glTexCoordPointer(2, GL_FLOAT, uv_stride, (void*)buffer_offset);buffer value glTexCoordPointer latchesselector glTexCoordPointer uses
  • 136.
    136OpenGL’s Modern Buffer-centricProcessingModelVertex Array BufferObject (VaBO)Transform FeedbackBuffer (XBO)ParameterBuffer (PaBO)Pixel UnpackBuffer (PuBO)Pixel PackBuffer (PpBO)BindableUniform Buffer(BUB)Texture BufferObject (TexBO)Vertex PullerVertex ShadingGeometryShadingFragmentShadingTexturingArray Element BufferObject (VeBO)PixelPipelinevertex datatexel datapixel dataparameter data(not ARB functionality yet)glBegin, glDrawElements, etc.glDrawPixels, glTexImage2D, etc.glReadPixels,etc.Framebuffer
  • 137.
    137Usages of OpenGLBuffers Objects• Vertex uses (VBOs)• Input to GL: Vertex attribute buffer objects• Color, position, texture coordinate sets, etc.• Input to GL: Vertex element buffer objects• Indices• Output from GL: Transform feedback• Streaming vertex attributes out• Texture uses (TexBOs)• Texturing from: Texture buffer objects• Pixel uses (PBOs)• Output from GL: Pixel pack buffer objects• glReadPixels• Input from GL: Pixel unpack buffer objects• glDrawPixels, glBitmap, glTexImage2D, etc.• Shader uses (PaBOs, UBOs)• Input to assembly program: Parameter buffer objects• Input to GLSL program: Bind-able uniform buffer objectsKey point: OpenGLbuffers are containers forbytes; a buffer is not tiedto any particular usage
  • 138.
    138Continuum of OpenGLUsageTweak-able PerformanceImmediatemodeClient vertexarraysVertex bufferobjects (VBOs)Display lists
  • 139.
  • 140.
    140Implementing OpenGLMark KilgardPrincipalSystem Software EngineerNVIDIA
  • 141.
    141Topics in OpenGLImplementation• Dual-core OpenGL driver operation• What goes into a texture fetch?• You give me some texture coordinates• I give you back a color• Could it be any simpler?
  • 142.
    142OpenGL Drivers forMulti-core CPUs• Today dual-core processors in PCs is nearly ubiquitous• 4, 6, 8, and more cores are clearly coming• How does OpenGL implementation exploit this trend?• Answer: develop dual-core OpenGL driver
  • 143.
    143Dual-core OpenGL DriverArchitectureApplication thread …Application thread DContext 1Application thread AApplicationrendering threadAppICDICD’s app thread(tokenize thread)Worker thread 1(server thread)Application thread CApplication audiothread (noOpenGL)Context 2Application thread BApplicationrendering threadICD’s app thread(tokenize thread)Worker thread 2(server thread)Circularcommand FIFOCircularcommand FIFO
  • 144.
    144Dual-core Performance Results•A well-behaved OpenGL application benefiting from adual-core mode of OpenGL driver operations050100150200250Single core Dual core Null driverFramesper secondMode of OpenGL driver operation
  • 145.
    145Good Dual-core DriverPractices• General advice• Display lists execute on the driver’s worker thread!• You want to avoid situations where the application thread must“sync” with the driver thread• Specific advice• Avoid OpenGL state queries• More on this later• Avoid querying OpenGL errors in production code• Bad behavior is detected automatically and leads to exit from thedual-core mode• Back to the standard single-core driver mode of operation• “Do no harm”
  • 146.
    146Consider an OpenGLtexture fetch• Seems very simple• Input: texture coordinates (s,t,r,q)• Output: some color (r,g,b,a)• Just a simple function, written in Cg/HLSL:uniform sampler2D decal : TEXUNIT2;float4 texcoord : TEXCOORD3;float4 rgba = tex2D(decal, texcoordset.st);• Compiles to single instruction:TEX o[COLR], f[TEX3], TEX2, 2D;• Implementation is much more involved!
  • 147.
    147Anatomy of aTexture FetchFilteredtexelvectorTexelSelectionTexelCombinationTexeloffsetsTexeldataTexture imagesCombinationparametersTexturecoordinatevectorTexture parameters
  • 148.
    148Texture Fetch Functionality(1)• Texture coordinate processing• Projective texturing (OpenGL 1.0)• Cube map face selection (OpenGL 1.3)• Texture array indexing (OpenGL 2.1)• Coordinate scale: normalization (ARB_texture_rectangle)• Level-of-detail (LOD) computation• Log of maximum texture coordinate partial derivative (OpenGL 1.0)• LOD clamping (OpenGL 1.2)• LOD bias (OpenGL 1.3)• Anisotropic scaling of partial derivatives (SGIX_texture_lod_bias)• Wrap modes• Repeat, clamp (OpenGL 1.0)• Clamp to edge (OpenGL 1.2), Clamp to border (OpenGL 1.3)• Mirrored repeat (OpenGL 1.4)• Fully generalized clamped mirror repeat (EXT_texture_mirror_clamp)• Wrap to adjacent cube map face• Region clamp & mirror (PlayStation 2)
  • 149.
    149Texture Fetch Functionality(2)• Filter modes• Minification / magnification transition (OpenGL 1.0)• Nearest, linear, mipmap (OpenGL 1.0)• 1D & 2D (OpenGL 1.0), 3D (OpenGL 1.2), 4D (SGIS_texture4D)• Anisotropic (EXT_texture_filter_anisotropic)• Fixed-weights: Quincunx, 3x3 Gaussian• Used for multi-sample resolves• Detail texture magnification (SGIS_detail_texture)• Sharpen texture magnification (SGIS_sharpen_texture)• 4x4 filter (SGIS_texture_filter4)• Sharp-edge texture magnification (E&S Harmony)• Floating-point texture filtering (ARB_texture_float, OpenGL 3.0)
  • 150.
    150Texture Fetch Functionality(3)• Texture formats• Uncompressed• Packing: RGBA8, RGB5A1, etc. (OpenGL 1.1)• Type: unsigned, signed (NV_texture_shader)• Normalized: fixed-point vs. integer (OpenGL 3.0)• Compressed• DXT compression formats (EXT_texture_compression_s3tc)• 4:2:2 video compression (various extensions)• 1- and 2-component compression (EXT_texture_compression_latc,OpenGL 3.0)• Other approaches: IDCT, VQ, differential encoding, normal maps,separable decompositions• Alternate encodings• RGB9 with 5-bit shared exponent (EXT_texture_shared_exponent)• Spherical harmonics• Sum of product decompositions
  • 151.
    151Texture Fetch Functionality(4)• Pre-filtering operations• Gamma correction (OpenGL 2.1)• Table: sRGB / arbitrary• Shadow map comparison (OpenGL 1.4)• Compare functions: LEQUAL, GREATER, etc.(OpenGL 1.5)• Needs “R” depth value per texel• Palette lookup (EXT_paletted_texture)• Thresh-holding• Color key• Generalized thresh-holding
  • 152.
    152Texture Fetch Functionality(5)• Optimizations• Level-of-detail weighting adjustments• Mid-maps (extra pre-filtered levels in-between existing levels)• Unconventional uses• Bitmap textures for fonts with large filters (Direct3D 10)• Rip-mapping• Non-uniform texture border color• Clip-mapping (SGIX_clipmap)• Multi-texel borders• Silhouette maps (Pardeep Sen’s work)• Shadow mapping• Sharp piecewise linear magnification
  • 153.
    153Phased Data Flow•Must hide long memory read latency between Selectionand Combination phasesTexelSelectionTexelCombinationTexeloffsetsTexeldataTexture imagesCombinationparametersTexturecoordinatevectorTexture parametersMemoryreads forsamplesFIFOing ofcombinationparameters
  • 154.
    154What really happens?•Let’s consider a simple tri-linear mip-mapped 2Dprojective texture fetch• Logically just one instructionTXP o[COLR], f[TEX3], TEX2, 2D;• Logically• Texel selection• Texel combination• How many operations are involved?
  • 155.
    155Medium-Level Dissectionof aTexture FetchConverttexelcoordstotexeloffsetsinteger /fixed-pointtexelcombinationtexeloffsetstexel datatexture imagescombinationparametersinterpolatedtexture coordsvectortexture parametersConverttexturecoordstotexelcoordsfilteredtexelvectortexelcoordsfloor /frac integercoords &fractionalweightsfloating-pointscalingandcombinationinteger /fixed-pointtexelintermediates
  • 156.
    156Interpolation• First weneed to interpolate (s,t,r,q)• This is the f[TEX3] part of the TXP instruction• Projective texturing means we want (s/q, t/q)• And possible r/q if shadow mapping• In order to correct for perspective, hardware actually interpolates• (s/w, t/w, r/w, q/w)• If not projective texturing, could linearly interpolate inverse w (or 1/w)• Then compute its reciprocal to get w• Since 1/(1/w) equals w• Then multiply (s/w,t/w,r/w,q/w) times w• To get (s,t,r,q)• If projective texturing, we can instead• Compute reciprocal of q/w to get w/q• Then multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q)Observe projectivetexturing is samecost as perspectivecorrection
  • 157.
    157Interpolation Operations• Ax+ By + C per scalar linear interpolation• 2 MADs• One reciprocal to invert q/w for projective texturing• Or one reciprocal to invert 1/w for perspectivetexturing• Then 1 MUL per component for s/w * w/q• Or s/w * w• For (s,t) means• 4 MADs, 2 MULs, & 1 RCP• (s,t,r) requires 6 MADs, 3 MULs, & 1 RCP• All floating-point operations
  • 158.
    158Texture Space Mapping•Have interpolated & projected coordinates• Now need to determine what texels to fetch• Multiple (s,t) by (width,height) of texture base level• Could convert (s,t) to fixed-point first• Or do math in floating-point• Say based texture is 256x256 so• So compute (s*256, t*256)=(u,v)
  • 159.
    159Mipmap Level-of-detail Selection•Tri-linear mip-mapping means compute appropriatemipmap level• Hardware rasterizes in 2x2 pixel entities• Typically called quad-pixels or just quad• Finite difference with neighbors to get change in uand v with respect to window space• Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂y• Means 4 subtractions per quad (1 per pixel)• Now compute approximation to gradient length• p = max(sqrt((∂u/∂x)2+(∂u/∂y)2),sqrt((∂v/∂x)2+(∂v/∂y)2))one-pixel separation
  • 160.
    160Level-of-detail Bias andClamping• Convert p length to power-of-two level-of-detail andapply LOD bias• λ = log2(p) + lodBias• Now clamp λ to valid LOD range• λ’ = max(minLOD, min(maxLOD, λ))
  • 161.
    161Determine Mipmap LevelsandLevel Filtering Weight• Determine lower and upper mipmap levels• b = floor(λ’)) is bottom mipmap level• t = floor(λ’+1) is top mipmap level• Determine filter weight between levels• w = frac(λ’) is filter weight
  • 162.
    162Determine Texture SamplePoint• Get (u,v) for selected top and bottom mipmap levels• Consider a level l which could be either level t or b• With (u,v) locations (ul,vl)• Perform GL_CLAMP_TO_EDGE wrap modes• uw = max(1/2*widthOfLevel(l),min(1-1/2*widthOfLevel(l), u))• vw = max(1/2*heightOfLevel(l),min(1-1/2*heightOfLevel(l), v))• Get integer location (i,j) within each level• (i,j) = ( floor(uw* widthOfLevel(l)),floor(vw* ) )borderedgest
  • 163.
    163Determine Texel Locations•Bilinear sample needs 4 texel locations• (i0,j0), (i0,j1), (i1,j0), (i1,j1)• With integer texel coordinates• i0 = floor(i-1/2)• i1 = floor(i+1/2)• j0 = floor(j-1/2)• j1 = floor(j+1/2)• Also compute fractional weights for bilinear filtering• a = frac(i-1/2)• b = frac(j-1/2)
  • 164.
    164Determine Texel Addresses•Assuming a texture level image’s base pointer, compute a texeladdress of each texel to fetch• Assume bytesPerTexel = 4 bytes for RGBA8 texture• Example• addr00 = baseOfLevel(l) +bytesPerTexel*(i0+j0*widthOfLevel(l))• addr01 = baseOfLevel(l) +bytesPerTexel*(i0+j1*widthOfLevel(l))• addr10 = baseOfLevel(l) +bytesPerTexel*(i1+j0*widthOfLevel(l))• addr11 = baseOfLevel(l) +bytesPerTexel*(i1+j1*widthOfLevel(l))• More complicated address schemes are needed for good texturelocality!
  • 165.
    165Initiate Texture Reads•Initiate texture memory reads at the 8 texel addresses• addr00, addr01, addr10, addr11 for the upper level• addr00, addr01, addr10, addr11 for the lower level• Queue the weights a, b, and w• Latency FIFO in hardware makes these weightsavailable when texture reads complete
  • 166.
    166Phased Data Flow•Must hide long memory read latency between Selectionand Combination phasesTexelSelectionTexelCombinationTexeloffsetsTexeldataTexture imagesCombinationparametersTexturecoordinatevectorTexture parametersMemoryreads forsamplesFIFOing ofcombinationparameters
  • 167.
    167Texel Combination• Whentexels reads are returned, begin filtering• Assume results are• Top texels: t00, t01, t10, t11• Bottom texels: b00, b01, b10, b11• Per-component filtering math is tri-linear filter• RGBA8 is four components• result = (1-a)*(1-b)*(1-w)*b00 +(1-a)*b*(1-w)*b*b01 +a*(1-b)*(1-w)*b10 +a*b*(1-w)*b11 +(1-a)*(1-b)*w*t00 +(1-a)*b*w*t01 +a*(1-b)*w*t10 +a*b*w*t11;• 24 MADs per component, or 96 for RGBA• Lerp-tree could do 14 MADs per component, or 56 for RGBA
  • 168.
    168Total Texture FetchOperations• Interpolation• 6 MADs, 3 MULs, & 1 RCP (floating-point)• Texel selection• Texture space mapping• 2 MULs (fixed-point)• LOD determination (floating-point)• 1 pixel difference, 2 SQRTs, 4 MULs, 1 LOG2• LOD bias and clamping (fixed-point)• 1 ADD, 1 MIN, 1 MAX• Level determination and level weighting (fixed-point)• 1 FLOOR, 1 ADD, 1 FRAC• Texture sample point• 4 MAXs, 4 MINs, 2 FLOORs (fixed-point)• Texel locations and bi-linear weights• 8 FLOORs, 4 FRACs, 8 ADDs (fixed-point)• Addressing• 16 integer MADs (integer)• Texel combination• 56 fixed-point MADs (fixed-point)
  • 169.
    169Observations about theTexture Fetch• Lots of ways to implement the math• Lots of clever ways to be efficient• Lots more texture operations not considered in this analysis• Compression• Anisotropic filtering• sRGB• Shadow mapping• Arguably TEX instructions are “world’s most CISC instructions”• Texture fetches are incredibly complex instructions• Good deal of GPU’s superiority at graphics operations over CPUs isattributable to TEX instruction efficiency• Good for compute too
  • 170.
    170OpenGL’s Future EvolutionMarkKilgardPrincipal System Software EngineerNVIDIA
  • 171.
    171What drives OpenGL’sfuture?• GPU graphics functionality• Tessellation & geometry amplification• Ratio of GPU to single-core CPU performance• Compatibility• Direct3Disms• OpenGLisms• Deprecation• Compute support• OpenCL, CUDA, Stream processing• Unconventional graphics devices
  • 172.
    172Better Graphics Functionality•Expect more graphics performance• Easy prediction• Rasterization nowhere near peaked• Ray tracing fans—GPUs make rays and trianglesfaster– Market still values triangles more than rays• Expect more generalized graphics functionality• Trend for texture enhancements likely to continue
  • 173.
    173Geometry Amplification• Tessellation•Programmable hardware support coming• True market demand probably not tessellation per se• Games want visual richness• Texture and shading have created much richness– Often “pixel richness” as substitute for geometry richness• Increasingly “visual richness” means geometric complexity• Geometry Amplification may be better term• Tessellation is one way to improve tessellation– Recognize the limits of bi-variate patches forrepresenting geometry
  • 174.
    174Programmable Tessellation• Stunningreal-time geometric detail + animation possible• Programmable tessellation + vertex textured displacements
  • 175.
    175Continuous Level-of-detail forTessellationIncreasing tessellation level-of-detail• Same patch mesh for all 3 scenes
  • 176.
    176Adaptive Programmable TessellationProgrammablelevel-of-detail determination allowsmore tessellation along silhouette edges
  • 177.
    177Limits of PatchTessellation• What games tend to want• Here’s 8 vertices (boundingbox), go draw a fire truck• Here’s a few vertices, go drawa tree
  • 178.
    178Tessellation Not Newto OpenGL• At least three different bi-variate patch tessellation schemes havebeen added to OpenGL• Evaluators (OpenGL 1.0)• NV_evaluators (GeForce 3)• water-tight• adaptive level-of-detail• forward differencing approach• ATI_pn_triangles Curved PN Triangles (Radeon)• tessellated triangle based on positions+normals• None succeeded• Hard to integrate into art pipelines• Didn’t offer enough performance advantageGLUT’s wire-frameteapot[Moreton 20001][Vlachos 20001]
  • 179.
    179Ratio of CPUcore-to-GPU Performance• Well known computer architecture trends now• Single-threaded CPU performance trends are stalled• Multi-core is CPU designer response• GPU performance continues on-trend• What does this mean for graphics API design?• CPUs must generate more visually rich API commandstreams to saturate GPUs• Can’t just send more commands faster• Single-threaded CPUs can only do so much• So must send more powerful commands
  • 180.
    180Déjà vu• We’vebeen here before• Early 1980s: Graphics terminals used to beconnected to minicomputers by slow speedinterconnects• CPUs themselves far too slow for real-timerendering• Resulting rendering model• Download scene database to graphics terminal• Adjust viewing and modeling parameters• Send “redraw scene” command
  • 181.
    181What Happened• Such“scene processor” hardware not very flexible• Difficult to animate anything beyond rigid dynamics• Eventually SGI and others matched CPUs and interconnects tographics performance• Result was IRIS GL’s immediate mode• CPU fast enough to send geometry every frame• OpenGL took this model• Over time added vertex arrays, vertex buffers, texturing,programmable shading, and more performance• CPU performance became limiter still• Better graphics driver tuning helped• Dual-core drivers help some more
  • 182.
    182OpenGL’s Most PowerfulCommand• Available since OpenGL 1.0• Can render essentially anything OpenGL can render!• Takes just one parameter• The commandglCallList(GLuint displayListName);• Power of display lists comes from• Playing back arbitrary compiled commands• Allowing for hierarchical calling of display list• A display list can contain glCallList or glCallLists• Ability of application to re-define display lists• No editing, but can be re-defined
  • 183.
    183Enhanced Display Lists•OpenGL 1.0 display lists are too inflexible• Pixel & vertex data “compiled into” display lists• Binding objects always “by name”• Rather than “by reference• These problems can be fixed• Modern OpenGL supports buffers for transferring vertices andpixels• Compile commands into display lists that defer vertex andpixel transfers until execute-time– Rather than compile-time• Allow objects (textures, buffers, programs) to be bound “byreference” or “by name”
  • 184.
    184Other Display ListEnhancements• Conditional display list execution• Relaxed vertex index and command order• Parallel construction of display lists by multiple threadsGeneral insight: Easier for driver to optimize application’sgraphics command stream if it gets to1) see the repetition in the command stream clearly2) take time to analyze and optimize usage
  • 185.
    185Conditional Display ListExecution• Today’s occlusion query• Application must “query” to learn occlusion result• Latency too great to respond• Application can use OpenGL 3.0’s conditional rendercapability• But just skips vertex pulling, not state changes• Conditional display list execution• Allow a glCallList to depend on the occlusion resultfrom an occlusion query object• Allows in-band occlusion querying• Skip both vertex pulling and state changes
  • 186.
    186Relaxed Vertex Indexand Command Order• OpenGL today always executes commands “in order”• Sequentially requirement• Provide compile-time specification of re-ordering allowances• Allows GL implementation to re-order• Vertex indices within display list’s vertex batch• Commands within display list• Key rule: state vector rendering command executes in mustmatch the state if command was rendered sequentially• Allow static or dynamic re-ordering• Static re-ordering needed for multi-pass invariances• Past practice• IRIS Performer would sort rendering by state changes forperformance• [Sander 2007] show substantial benefit for vertex ordering
  • 187.
    187Parallel Display ListConstruction• Today’s model• Single thread makes all OpenGL rendering calls• Minimizes GPU context switch overhead• Ties command generation rate to single core’sCPU performance• Enhanced display list model• Multiple threads can build display lists in parallel• Single thread still executes display lists• Countable semaphore objects used to synchronizehand-off of display lists built by other threads withmain rendering thread
  • 188.
    188Rethinking Display Lists•Display lists have been proposed for deprecation• Right as we really need them!• Much more interesting to enhance display lists• Dual-core driver already off-loads display list traversalto driver’s thread• Multi-core driver could scan frequently executeddisplay lists to optimize their order and errorprocessing• Includes adding pre-fetching to avoid stalling CPUon cache misses for object accesses
  • 189.
    189Direct3Disms• Developing ashader-rich game title costs $$$• For top titles, often US$ 5,000,000+• Investment typically amortized over multiple platforms• Consoles are primary target, then PCs• PC version typically developed for Direct3D• Reality: OpenGL is often 3rdor worse priority• API differences = porting & performance pitfalls• Stops or slows Direct3D-developed 3D content fromworking easily on OpenGL platforms
  • 190.
    190Supporting Direct3D: NotNew• OpenGL has always supported multiple formats well• OpenGL’s plethora of pixel and vertex formats• Very first OpenGL extension: EXT_bgra• Provides a pixel component ordering to match thecolor component ordering of Windows for 2D GDIrendering• Made core functionality by OpenGL 1.3• Many OpenGL extensions have embraced Direct3Disms• Secondary color• Fog coordinate• Point sprites
  • 191.
    191Direct3D vs. OpenGLCoordinateSystem Conventions• Window origin conventions• Direct3D = upper-left origin• OpenGL = lower-left origin• Pixel center conventions• Direct3D9 = pixel centers at integer locations• OpenGL (and Direct3D 10) = pixel centers at half-pixel locations• Clip space conventions• Direct3D = [-1,+1] for XY, [0,1] for Z• OpenGL = [-1,+1] range for XYZ• Affects• How projection matrix is loaded• Fragment shaders that access the window position• Point sprites have upper-left texture coordinate origin• OpenGL already lets application choose lower-left or upper-left
  • 192.
    192Direct3D vs. OpenGLProvokingVertex Conventions• Direct3D uses “first” vertex of a triangle or line todetermine which color is used for flat shading• OpenGL uses “last” vertex for lines, triangles, and quads• Except for polygons (GL_POLYGON) mode that use thefirst vertexDirect3D 9pDev->SetRenderState(D3DRS_SHADEMODE,D3DSHADE_FLAT);OpenGLglShadeModel(GL_FLAT);Input triangle stripwith per-vertex colors
  • 193.
    193BGRA Vertex ArrayOrder• Direct3D 9’s most common usage for sending per-vertexcolors is 32-bit D3DCOLOR data type:• Red in bits 16:23• Green in bits 8:15• Blue in bits 0:7• Alpha in bits 24:31• Laid in memory, looks like BGRA order• OpenGL assumes RGBA order for all vertex arrays• Direct3Dism EXT_vertex_array_bgra extension allows:glColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);glSecondaryColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);glVertexAttribPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);8-bitred8-bitalpha8-bitgreen8-bitbluebit 31bit 0
  • 194.
    194OpenGLisms• Things aboutOpenGL’s operation that make it hard fornon-OpenGL applications to port to OpenGL• Examples• Selectors• Linked GLSL program objects
  • 195.
    195Eliminating Selectors fromOpenGL• OpenGL has lots of selectors• Selectors set state that indicates what state subsequentcommands will update• Already mentioned selectors: glClientActiveTexture• Other examples: glActiveTexture, glMatrixMode,glBindTexture, glBindBuffer, glUseProgram,glBindProgramARB• OpenGL is full of selectors– Partly OpenGL’s extensibility strategy– Partly because objects are bound into context» Bind-to-edit objects» Rather than edit-by-name• Direct State Access extension: EXT_direct_state_access• Provides complete selector-free additional API for OpenGL• Shipping in NVIDIA’s 180.43 drivers
  • 196.
    196Reasons to EliminateSelectors• Direct3D has an “edit-by-name” model of operation• Means Direct3D has no selectors• Having to manage selectors when porting Direct3D or consolecode to OpenGL is awkward• Requires deferring updates to minimize selector and objectbind changes• Layered libraries can’t count of selector state• To be safe when updating sate controlled by selectors, suchlibraries must use idiom• Save selector, Set selector, Update state, Restore selector• Bad for performance, particularly bad for dual-core driverssince queries are expensive
  • 197.
    197GLSL Program ObjectLinking• GLSL requires shader objects from different domains(vertex, geometry, fragment) to be linked into singleGLSL program object• Means you can’t mix-and-match shaders easily• Other APIs don’t have this limitation• Direct3D• Prior OpenGL assembly language extensions• Consoles• Have a “separate shader objects” extension could fix thisproblem
  • 198.
    198Separate Shader ObjectsExample• Combining different GLSL shaders at onceSpecular brickbump mappingRed diffuseWobbly torusSmooth torusDifferentGLSLvertexshadersDifferent GLSL fragment shaders
  • 199.
    199Deprecation• Part ofOpenGL 3.0 is a marking of features for deprecation• LOTS of functionality is marked for deprecation• I contend no real application today uses the non-deprecatedsubset of OpenGL—all apps would have to change due todeprecation• Some vendors believe getting rid of features will make OpenGLbetter in some way• NVIDIA does not believe in abandoning API compatibility thisway• OpenGL is part of a large ecosystem so removing features this wayundermines the substantial investment partners have made inOpenGL over years• API compatibility and stability is one of OpenGL’s greatstrengths
  • 200.
    200Synergy between OpenGLand OpenCL• Complimentary capabilities• OpenGL 3.0 = state-of-the-art, cross-platform graphics• OpenCL 1.0 = state-of-the-art, cross-platform compute• Computation & Graphics should work together• Most natural way to intuit compute results is with graphics• When Compute is done on a GPU, there’s no need to “copy” thedata to see it visualized• Appendix B of OpenCL specification• Details with sharing objects between OpenGL and OpenCL• Called “GL” and “CL” from here on…
  • 201.
    201Four Kinds ofShared ObjectsOpenCL 3D image objectcl_memOpenGL renderbuffer objectGLuint renderbufferOpenGL buffer objectGLuint bufferobjOpenCL buffer objectcl_memOpenGL texture 2D objectGLenum targetGLuint textureGLint miplevelOpenGL texture 3D objectGLenum targetGLuint textureGLintOpenCL 2D image objectcl_mem2D image objectcl_memclCreateFromGLBufferclCreateFromGLTexture2DclCreateFromGLTexture3DclCreateFromGLRenderbufferOpenGL OpenCL
  • 202.
    202OpenGL / OpenCLSharing• Requirements for GL object sharing with CL• CL context must be created with an OpenGL context• Each platform-specific API will provide its appropriateway to create an OpenGL-compatible CL context• For WGL (Windows), CGL (OS X), GLX (X11/Linux),EGL (OpenGL ES), etc.• Creating cl_mem for GL Objects does two things1.Ensures CL has a reference to the GL objects2.Provides cl_mem handle to acquire GL object for CL’suse• clRetainMemObject & clReleaseMemObject can createcounted references to cl_mem objects
  • 203.
    203Acquiring GL Objectsfor Compute Access• Still must “enqueue acquire” GL objects for compute kernels touse them• Otherwise reading or writing GL objects with CL is undefined• Enqueue acquire and release provide sequential consistencywith GL command processing• Enqueue commands for GL objects• clEnqueueAcquireGLObjects• Takes list of cl_mem objects for GL objects & list ofcl_events that must complete before acquire• Returns a cl_event for this acquire operation• clEnqueueReleaseGLObjects• Takes list of cl_mem objects for GL objects & list ofcl_events that must complete before release• Returns a cl_event for this release operation
  • 204.
    204Unconventional OpenGL Deployments•Workstation PCs—Quadro• Consumer PCs—GeForce• High-end Visualization—QuadroPlex VisualComputing Solution (VCS)• Embedded Applications• Handheld Devices• Game ConsolesConventionalPCOpenGLProductsUnconventional
  • 205.
    205OpenGL in ContextAfacilitated conversationwith Dr. Marc Levoy, Stanford University
  • 206.

Editor's Notes

  • #81 An exciting SIGGRAPH for me
  • #85 Didn’t continue to succeed, though.One of my sorrows is that OpenGL didn’t seem to contribute to success for SGI
  • #101 Not a required “implementation”, just a concise way to specify the architecture (like ISA registers)Directly inspired changes to the specification (especially to pixel operations, e.g., depth buffer of)
  • #102 Not a required “implementation”, just a concise way to specify the architecture (like ISA registers)Directly inspired changes to the specification (especially to pixel operations, e.g., depth buffer of)

[8]ページ先頭

©2009-2025 Movatter.jp