Attention: Here be dragons
This is thelatest (unstable) version of this documentation, which may document features not available in or compatible with released stable versions of Godot.
Checking the stable version of the documentation...
Using compute shaders
This tutorial will walk you through the process of creating a minimal computeshader. But first, a bit of background on compute shaders and how they work withGodot.
Note
This tutorial assumes you are familiar with shaders generally. If you are newto shaders please readIntroduction to shaders andyourfirst shader before proceeding with this tutorial.
A compute shader is a special type of shader program that is orientated towardsgeneral purpose programming. In other words, they are more flexible than vertexshaders and fragment shaders as they don't have a fixed purpose (i.e.transforming vertices or writing colors to an image). Unlike fragment shadersand vertex shaders, compute shaders have very little going on behind the scenes.The code you write is what the GPU runs and very little else. This can make thema very useful tool to offload heavy calculations to the GPU.
Now let's get started by creating a short compute shader.
First, in theexternal text editor of your choice, create a new file calledcompute_example.glsl in your project folder. When you write compute shadersin Godot, you write them in GLSL directly. The Godot shader language is based onGLSL. If you are familiar with normal shaders in Godot, the syntax below willlook somewhat familiar.
Note
Compute shaders can only be used from RenderingDevice-based renderers (theForward+ or Mobile renderer). To follow along with this tutorial, ensure thatyou are using the Forward+ or Mobile renderer. The setting for which islocated in the top right-hand corner of the editor.
Note that compute shader support is generally poor on mobile devices (due todriver bugs), even if they are technically supported.
Let's take a look at this compute shader code:
#[compute]#version 450// Invocations in the (x, y, z) dimensionlayout(local_size_x=2,local_size_y=1,local_size_z=1)in;// A binding to the buffer we create in our scriptlayout(set=0,binding=0,std430)restrictbufferMyDataBuffer{floatdata[];}my_data_buffer;// The code we want to execute in each invocationvoidmain(){// gl_GlobalInvocationID.x uniquely identifies this invocation across all work groupsmy_data_buffer.data[gl_GlobalInvocationID.x]*=2.0;}
This code takes an array of floats, multiplies each element by 2 and store theresults back in the buffer array. Now let's look at it line-by-line.
#[compute]#version 450
These two lines communicate two things:
The following code is a compute shader. This is a Godot-specific hint that is needed for the editor to properly import the shader file.
The code is using GLSL version 450.
You should never have to change these two lines for your custom compute shaders.
// Invocations in the (x, y, z) dimensionlayout(local_size_x=2,local_size_y=1,local_size_z=1)in;
Next, we communicate the number of invocations to be used in each workgroup.Invocations are instances of the shader that are running within the sameworkgroup. When we launch a compute shader from the CPU, we tell it how manyworkgroups to run. Workgroups run in parallel to each other. While running oneworkgroup, you cannot access information in another workgroup. However,invocations in the same workgroup can have some limited access to other invocations.
Think about workgroups and invocations as a giant nestedfor loop.
for(intx=0;x<workgroup_size_x;x++){for(inty=0;y<workgroup_size_y;y++){for(intz=0;z<workgroup_size_z;z++){// Each workgroup runs independently and in parallel.for(intlocal_x=0;local_x<invocation_size_x;local_x++){for(intlocal_y=0;local_y<invocation_size_y;local_y++){for(intlocal_z=0;local_z<invocation_size_z;local_z++){// Compute shader runs here.}}}}}}
Workgroups and invocations are an advanced topic. For now, remember that we willbe running two invocations per workgroup.
// A binding to the buffer we create in our scriptlayout(set=0,binding=0,std430)restrictbufferMyDataBuffer{floatdata[];}my_data_buffer;
Here we provide information about the memory that the compute shader will haveaccess to. Thelayout property allows us to tell the shader where to lookfor the buffer, we will need to match theseset andbinding positionsfrom the CPU side later.
Therestrict keyword tells the shader that this buffer is only going to beaccessed from one place in this shader. In other words, we won't bind thisbuffer in anotherset orbinding index. This is important as it allowsthe shader compiler to optimize the shader code. Always userestrict whenyou can.
This is anunsized buffer, which means it can be any size. So we need to becareful not to read from an index larger than the size of the buffer.
// The code we want to execute in each invocationvoidmain(){// gl_GlobalInvocationID.x uniquely identifies this invocation across all work groupsmy_data_buffer.data[gl_GlobalInvocationID.x]*=2.0;}
Finally, we write themain function which is where all the logic happens. Weaccess a position in the storage buffer using thegl_GlobalInvocationIDbuilt-in variables.gl_GlobalInvocationID gives you the global unique ID forthe current invocation.
To continue, write the code above into your newly createdcompute_example.glslfile.
Create a local RenderingDevice
To interact with and execute a compute shader, we need a script.Create a new script in the language of your choice and attach it to any Nodein your scene.
Now to execute our shader we need a localRenderingDevicewhich can be created using theRenderingServer:
# Create a local rendering device.varrd:=RenderingServer.create_local_rendering_device()
// Create a local rendering device.varrd=RenderingServer.CreateLocalRenderingDevice();
After that, we can load the newly created shader filecompute_example.glsland create a precompiled version of it using this:
# Load GLSL shadervarshader_file:=load("res://compute_example.glsl")varshader_spirv:RDShaderSPIRV=shader_file.get_spirv()varshader:=rd.shader_create_from_spirv(shader_spirv)
// Load GLSL shadervarshaderFile=GD.Load<RDShaderFile>("res://compute_example.glsl");varshaderBytecode=shaderFile.GetSpirV();varshader=rd.ShaderCreateFromSpirV(shaderBytecode);
Warning
Local RenderingDevices cannot be debugged using tools such asRenderDoc.
Provide input data
As you might remember, we want to pass an input array to our shader, multiplyeach element by 2 and get the results.
We need to create a buffer to pass values to a compute shader. We are dealingwith an array of floats, so we will use a storage buffer for this example. Astorage buffer takes an array of bytes and allows the CPU to transfer data toand from the GPU.
So let's initialize an array of floats and create a storage buffer:
# Prepare our data. We use floats in the shader, so we need 32 bit.varinput:=PackedFloat32Array([1,2,3,4,5,6,7,8,9,10])varinput_bytes:=input.to_byte_array()# Create a storage buffer that can hold our float values.# Each float has 4 bytes (32 bit) so 10 x 4 = 40 bytesvarbuffer:=rd.storage_buffer_create(input_bytes.size(),input_bytes)
// Prepare our data. We use floats in the shader, so we need 32 bit.float[]input=[1,2,3,4,5,6,7,8,9,10];varinputBytes=newbyte[input.Length*sizeof(float)];Buffer.BlockCopy(input,0,inputBytes,0,inputBytes.Length);// Create a storage buffer that can hold our float values.// Each float has 4 bytes (32 bit) so 10 x 4 = 40 bytesvarbuffer=rd.StorageBufferCreate((uint)inputBytes.Length,inputBytes);
With the buffer in place we need to tell the rendering device to use thisbuffer. To do that we will need to create a uniform (like in normal shaders) andassign it to a uniform set which we can pass to our shader later.
# Create a uniform to assign the buffer to the rendering devicevaruniform:=RDUniform.new()uniform.uniform_type=RenderingDevice.UNIFORM_TYPE_STORAGE_BUFFERuniform.binding=0# this needs to match the "binding" in our shader fileuniform.add_id(buffer)varuniform_set:=rd.uniform_set_create([uniform],shader,0)# the last parameter (the 0) needs to match the "set" in our shader file
// Create a uniform to assign the buffer to the rendering devicevaruniform=newRDUniform{UniformType=RenderingDevice.UniformType.StorageBuffer,Binding=0};uniform.AddId(buffer);varuniformSet=rd.UniformSetCreate([uniform],shader,0);
Defining a compute pipeline
The next step is to create a set of instructions our GPU can execute.We need a pipeline and a compute list for that.
The steps we need to do to compute our result are:
Create a new pipeline.
Begin a list of instructions for our GPU to execute.
Bind our compute list to our pipeline
Bind our buffer uniform to our pipeline
Specify how many workgroups to use
End the list of instructions
# Create a compute pipelinevarpipeline:=rd.compute_pipeline_create(shader)varcompute_list:=rd.compute_list_begin()rd.compute_list_bind_compute_pipeline(compute_list,pipeline)rd.compute_list_bind_uniform_set(compute_list,uniform_set,0)rd.compute_list_dispatch(compute_list,5,1,1)rd.compute_list_end()
// Create a compute pipelinevarpipeline=rd.ComputePipelineCreate(shader);varcomputeList=rd.ComputeListBegin();rd.ComputeListBindComputePipeline(computeList,pipeline);rd.ComputeListBindUniformSet(computeList,uniformSet,0);rd.ComputeListDispatch(computeList,xGroups:5,yGroups:1,zGroups:1);rd.ComputeListEnd();
Note that we are dispatching the compute shader with 5 work groups in theX axis, and one in the others. Since we have 2 local invocations in the X axis(specified in our shader), 10 compute shader invocations will be launched intotal. If you read or write to indices outside of the range of your buffer, youmay access memory outside of your shaders control or parts of other variableswhich may cause issues on some hardware.
Execute a compute shader
After all of this we are almost done, but we still need to execute our pipeline.So far we have only recorded what we would like the GPU to do; we have notactually run the shader program.
To execute our compute shader we need to submit the pipeline to the GPU andwait for the execution to finish:
# Submit to GPU and wait for syncrd.submit()rd.sync()
// Submit to GPU and wait for syncrd.Submit();rd.Sync();
Ideally, you would not callsync() to synchronize the RenderingDevice rightaway as it will cause the CPU to wait for the GPU to finish working. In ourexample, we synchronize right away because we want our data available for readingright away. In general, you will want to waitat least 2 or 3 frames beforesynchronizing so that the GPU is able to run in parallel with the CPU.
Warning
Long computations can cause Windows graphics drivers to "crash" due toTDR being triggered by Windows.This is a mechanism that reinitializes the graphics driver after a certainamount of time has passed without any activity from the graphics driver(usually 5 to 10 seconds).
Depending on the duration your compute shader takes to execute, you may needto split it into multiple dispatches to reduce the time each dispatch takesand reduce the chances of triggering a TDR. Given TDR is time-dependent,slower GPUs may be more prone to TDRs when running a given compute shadercompared to a faster GPU.
Retrieving results
You may have noticed that, in the example shader, we modified the contents of thestorage buffer. In other words, the shader read from our array and stored the datain the same array again so our results are already there. Let's retrievethe data and print the results to our console.
# Read back the data from the buffervaroutput_bytes:=rd.buffer_get_data(buffer)varoutput:=output_bytes.to_float32_array()print("Input: ",input)print("Output: ",output)
// Read back the data from the buffersvaroutputBytes=rd.BufferGetData(buffer);varoutput=newfloat[input.Length];Buffer.BlockCopy(outputBytes,0,output,0,outputBytes.Length);GD.Print("Input: ",string.Join(", ",input));GD.Print("Output: ",string.Join(", ",output));
Freeing memory
Thebuffer,pipeline, anduniform_set variables we've been using areeach anRID. Because RenderingDevice is meant to be a lower-levelAPI, RIDs aren't freed automatically. This means that once you're done usingbuffer or any other RID, you are responsible for freeing its memorymanually using the RenderingDevice'sfree_rid() method.
With that, you have everything you need to get started working with computeshaders.
See also
The demo projects repository contains aCompute Shader Heightmap demoThis project performs heightmap image generation on the CPU andGPU separately, which lets you compare how a similar algorithm can beimplemented in two different ways (with the GPU implementation being fasterin most cases).