Movatterモバイル変換

Vertex Shader TricksNew Ways to Use the Vertex Shader to ImprovePerformanceBill BilodeauDeveloper Technology Engineer, AMD

Topics Covered● Overview of the DX11 front-end pipeline● Common bottlenecks● Advanced Vertex Shader Features● Vertex Shader Techniques● Samples and Results

Graphics HardwareDX11 Front-End Pipeline● VS –vertex data● HS – control points● Tessellator● DS – generated vertices● GS – primitives● Write to UAV at all stages● Starting with DX11.1Vector GPR’s(256 2048-bit registers)Vector ALU(1 64-way single precision operation every 4 clocks)Scalar ALU(1 operation every 4 clocks)Scalar GPR’s(256 64-bit registers)Vector/Scalar cross communication busVector GPR’s(256 2048-bit registers)Vector ALU(1 64-way single precision operation every 4 clocks)Scalar ALU(1 operation every 4 clocks)Scalar GPR’s(256 64-bit registers)Vector/Scalar cross communication busVector GPR’s(256 2048-bit registers)Vector ALU(1 64-way single precision operation every 4 clocks)Scalar ALU(1 operation every 4 clocks)Scalar GPR’s(256 64-bit registers)Vector/Scalar cross communication bus...Input AssemblerHull ShaderDomainShaderTessellatorGeometryShaderStreamOutCB,SRV,orUAVVertex Shader

Bottlenecks - VS● VS Attributes● Limit outputs to 4 attributes (AMD)●This applies to all shader stages (except PS)● VS Texture Fetches● Too many texture fetches can add latency●Especially dependent texture fetches●Group fetches together for better performance●Hide latency with ALU instructions

Bottlenecks - VS● Use the caches wisely● Avoid large vertex formatsthat waste pre-VS cachespace● DrawIndexed() allows forreuse of processed verticessaved in the post-VS cache●Vertices with the same indexonly need to get processed onceVertex ShaderPre-VS Cache(Hides Latency)Input AssemblerPost-VS Cache(Vertex Reuse)

Bottlenecks - GS● GS● Can add or remove primitives● Adding new primitives requires storing newvertices●Going off chip to store data can be a bandwidth issue● Using the GS means another shader stage●This means more competition for shader resources●Better if you can do everything in the VS

Advanced Vertex Shader Features● SV_VertexID, SV_InstanceID● UAV output (DX11.1)● NULL vertex buffer● VS can create its own vertex data

SV_VertexID● Can use the vertex id to decide whatvertex data to fetch● Fetch from SRV, or procedurally create avertexVSOut VertexShader(SV_VertexID id){float3 vertex = g_VertexBuffer[id];…}

UAV buffers● Write to UAVs from a Vertex Shader● New feature in DX11.1 (UAV at any stage)● Can be used instead of stream-out forwriting vertex data● Triangle output not limited to strips●You can use whatever format you want● Can output anything useful to a UAV

NULL Vertex Buffer● DX11/DX10 allows this● Just set the number of vertices in Draw()● VS will execute without a vertex buffer bound● Can be used for instancing● Call Draw() with the total number of vertices● Bind mesh and instance data as SRVs

Vertex Shader Techniques● Full Screen Triangle● Vertex Shader Instancing● Merged Instancing● Vertex Shader UAVs

Full Screen Triangle● For post-processing effects● Triangle has better performancethan quad● Fast and easy with VSgenerated coordinates● No IB or VB is necessary● Something you should beusing for full screen effectsClip Space Coordinates(-1, -1, 0)(-1, 3, 0)(3, -1, 0)

Full Screen Triangle: C++ code// Null VB, IBpd3dImmediateContext->IASetVertexBuffers( 0, 0, NULL, NULL, NULL );pd3dImmediateContext->IASetIndexBuffer( NULL, (DXGI_FORMAT)0, 0 );pd3dImmediateContext->IASetInputLayout( NULL );// Set Shaderspd3dImmediateContext->VSSetShader( g_pFullScreenVS, NULL, 0 );pd3dImmediateContext->PSSetShader( … );pd3dImmediateContext->PSSetShaderResources( … );pd3dImmediateContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );// Render 3 vertices for the trianglepd3dImmediateContext->Draw(3, 0);

Full Screen Triangle: HLSL CodeVSOutput VSFullScreenTest(uint id:SV_VERTEXID){VSOutput output;// generate clip space positionoutput.pos.x = (float)(id / 2) * 4.0 - 1.0;output.pos.y = (float)(id % 2) * 4.0 - 1.0;output.pos.z = 0.0;output.pos.w = 1.0;// texture coordinatesoutput.tex.x = (float)(id / 2) * 2.0;output.tex.y = 1.0 - (float)(id % 2) * 2.0;// coloroutput.color = float4(1, 1, 1, 1);return output;}Clip Space Coordinates(-1, -1, 0)(-1, 3, 0)(3, -1, 0)

VS Instancing: Point Sprites● Often done on GS, but can be faster on VS● Create an SRV point buffer and bind to VS● Call Draw or DrawIndexed to render the fulltriangle list.● Read the location from the point buffer andexpand to vertex location in quad● Can be used for particles or Bokeh DOF sprites● Don’t use DrawInstanced for a small mesh

Point Sprites: C++ Codepd3d->IASetIndexBuffer( g_pParticleIndexBuffer, DXGI_FORMAT_R32_UINT, 0 );pd3d->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );pd3dImmediateContext->DrawIndexed( g_particleCount * 6, 0, 0);

Point Sprites: HLSL CodeVSInstancedParticleDrawOut VSIndexBuffer(uint id:SV_VERTEXID){VSInstancedParticleDrawOut output;uint particleIndex = id / 4;uint vertexInQuad = id % 4;// calculate the position of the vertexfloat3 position;position.x = (vertexInQuad % 2) ? 1.0 : -1.0;position.y = (vertexInQuad & 2) ? -1.0 : 1.0;position.z = 0.0;position.xy *= PARTICLE_RADIUS;position = mul( position, (float3x3)g_mInvView ) +g_bufPosColor[particleIndex].pos.xyz;output.pos = mul( float4(position,1.0), g_mWorldViewProj );output.color = g_bufPosColor[particleIndex].color;// texture coordinateoutput.tex.x = (vertexInQuad % 2) ? 1.0 : 0.0;output.tex.y = (vertexInQuad & 2) ? 1.0 : 0.0;return output;}

Point Sprite PerformanceIndexed, 500K SpritesNon-Indexed, 500K SpritesGS, 500K SpritesDrawInstanced, 500K SpritesIndexed, 1M SpritesNon-Indexed, 1M SpritesGS, 1M SpritesDrawInstanced, 1M SpritR9 290x (ms) 0.52 0.77 1.38 1.77 1.02 1.53 2.7 3.54Titan (ms) 0.52 0.87 0.83 5.1 1.5 1.92 1.6 10.3024681012AMD Radeon R9 290xNvidia Titan

Point Sprite Performance● DrawIndexed() is the fastest method● Draw() is slower but doesn’t need an IB● Don’t use DrawInstanced() for creatingsprites on either AMD or NVidia hardware● Not recommended for a small number ofvertices

Merge Instancing● Combine multiple meshes that can beinstanced many times● Better than normal instancing which rendersonly one mesh● Instance nearby meshes for smaller bounding box● Each mesh is a page in the vertex data● Fixed vertex count for each mesh●Meshes smaller than page size use degenerate triangles

Merge InstancingMesh Vertex DataMesh Data 0Mesh Data 1Mesh Data 2...Mesh Instance DataInstance 0Mesh Index 2Instance 1Mesh Index 0...DegenerateTriangleVertex 0Vertex 1Vertex 2Vertex 3...000Fixed Length Page

Merged Instancing using VS● Use the vertex ID to look up the mesh toinstance● All meshes are the same size, so (id / SIZE)can be used as an offset to the mesh● Faster than using DrawInstanced()

Merge Instancing Performance051015202530DrawInstanced Soft InstancingR9 290xGTX 780● Instancing performance test byCloud Imperium Games for StarCitizen● Renders 13.5M triangles (~40Mverts)● DrawInstanced version callsDrawInstanced() and uses instancedata in a vertex buffer● Soft Instancing version usesvertex instancing with Draw() callsand fetches instance data fromSRVAMD RadeonR9 290XNvidiaGTX 780ms

Vertex Shader UAVs● Random access Read/Write in a VS● Can be used to store transformed vertexdata for use in multi-pass algorithms● Can be used for passing constantattributes between any shader stage (notjust from VS)

Skinning to UAV● Skin vertex data then output to UAV● Instance the skinned UAV data multiple times● Can also be used for non-instanced data● Multiple passes can reuse the transformedvertex data – Shadow map rendering● Performance is about the same asstream-out, but you can do more …

Bounding Box to UAV● Can calculate and store Bbox in the VS● Use a UAV to store the min/max values (6)● InterlockedMin/InterlockedMax determine minand max of the bbox●Need to use integer values with atomics● Use the stored bbox in later passes● GPU physics (collision)● Tile based processing

Bounding Box: HLSL Codevoid UAVBBoxSkinVS(VSSkinnedIn input, uint id:SV_VERTEXID ){// skin the vertex. . .// output the max and min for the bounding boxint x = (int) (vSkinned.Pos.x * FLOAT_SCALE); // convert to integerint y = (int) (vSkinned.Pos.y * FLOAT_SCALE);int z = (int) (vSkinned.Pos.z * FLOAT_SCALE);InterlockedMin(g_BBoxUAV[0], x);InterlockedMin(g_BBoxUAV[1], y);InterlockedMin(g_BBoxUAV[2], z);InterlockedMax(g_BBoxUAV[3], x);InterlockedMax(g_BBoxUAV[4], y);InterlockedMax(g_BBoxUAV[5], z);. . .

Particle System UAV● Single pass GPU-only particle system● In the VS:● Generate sprites for rendering● Do Euler integration and update the particlesystem state to a UAV

Particle System: HLSL Codeuint particleIndex = id / 4;uint vertexInQuad = id % 4;// calculate the new position of the vertexfloat3 oldPosition = g_bufPosColor[particleIndex].pos.xyz;float3 oldVelocity = g_bufPosColor[particleIndex].velocity.xyz;// Euler integration to find new position and velocityfloat3 acceleration = normalize(oldVelocity) * ACCELLERATION;float3 newVelocity = acceleration * g_deltaT + oldVelocity;float3 newPosition = newVelocity * g_deltaT + oldPosition;g_particleUAV[particleIndex].pos = float4(newPosition, 1.0);g_particleUAV[particleIndex].velocity = float4(newVelocity, 0.0);// Generate sprite vertices. . .

Conclusion● Vertex shader “tricks” can be moreefficient than more commonly used methods● Use SV_Vertex ID for smarter instancing●Sprites●Merge Instancing● UAVs add lots of freedom to vertex shaders●Bounding box calculation●Single pass VS particle system

Demos● Particle System● UAV Skinning● Bbox

Acknowledgements● Merge Instancing● Emil Person, “Graphics Gems for Games”SIGGRAPH 2011● Brendan Jackson, Cloud Imperium● Thanks to● Nick Thibieroz, AMD● Raul Aguaviva (particle system UAV), AMD● Alex Kharlamov, AMD

Questions● bill.bilodeau@amd.com

Movatterモバイル変換

Change Language

Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14

Embed presentation

Recommended

More Related Content

What's hot

Similar to Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14

More from AMD Developer Central

Recently uploaded

Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14

Editor's Notes