CPU optimization

Measuring performance

We have to know where the "bottlenecks" are to know how to speed up our program.Bottlenecks are the slowest parts of the program that limit the rate thateverything can progress. Focusing on bottlenecks allows us to concentrate ourefforts on optimizing the areas which will give us the greatest speedimprovement, instead of spending a lot of time optimizing functions that willlead to small performance improvements.

For the CPU, the easiest way to identify bottlenecks is to use a profiler.

CPU profilers

Profilers run alongside your program and take timing measurements to work outwhat proportion of time is spent in each function.

The Godot IDE conveniently has a built-in profiler. It does not run every timeyou start your project: it must be manually started and stopped. This isbecause, like most profilers, recording these timing measurements canslow down your project significantly.

After profiling, you can look back at the results for a frame.

Screenshot of the Godot profiler — Results of a profile of one of the demo projects.

Note

We can see the cost of built-in processes such as physics and audio,as well as seeing the cost of our own scripting functions at thebottom.

Time spent waiting for various built-in servers may not be counted inthe profilers. This is a known bug.

When a project is running slowly, you will often see an obvious function orprocess taking a lot more time than others. This is your primary bottleneck, andyou can usually increase speed by optimizing this area.

For more info about using Godot's built-in profiler, seeDebugger panel.

External profilers

Although the Godot IDE profiler is very convenient and useful, sometimes youneed more power, and the ability to profile the Godot engine source code itself.

You canuse a number of third-party C++ profilersto do this.

Screenshot of Callgrind — Example results from Callgrind, which is part of Valgrind.

From the left, Callgrind is listing the percentage of time within a function andits children (Inclusive), the percentage of time spent within the functionitself, excluding child functions (Self), the number of times the function iscalled, the function name, and the file or module.

In this example, we can see nearly all time is spent under theMain::iteration() function. This is the master function in the Godot sourcecode that is called repeatedly. It causes frames to be drawn, physics ticks tobe simulated, and nodes and scripts to be updated. A large proportion of thetime is spent in the functions to render a canvas (66%), because this exampleuses a 2D benchmark. Below this, we see that almost 50% of the time is spentoutside Godot code inlibglapi andi965_dri (the graphics driver).This tells us the a large proportion of CPU time is being spent in thegraphics driver.

This is actually an excellent example because, in an ideal world, only a verysmall proportion of time would be spent in the graphics driver. This is anindication that there is a problem with too much communication and work beingdone in the graphics API. This specific profiling led to the development of 2Dbatching, which greatly speeds up 2D rendering by reducing bottlenecks in thisarea.

Manually timing functions

Another handy technique, especially once you have identified the bottleneckusing a profiler, is to manually time the function or area under test.The specifics vary depending on the language, but in GDScript, you would dothe following:

vartime_start=Time.get_ticks_usec()# Your function you want to timeupdate_enemies()vartime_end=Time.get_ticks_usec()print("update_enemies() took%d microseconds"%time_end-time_start)

vartimeStart=Time.GetTicksUsec();// Your function you want to time.UpdateEnemies();vartimeEnd=Time.GetTicksUsec();GD.Print($"UpdateEnemies() took {timeEnd - timeStart} microseconds");

When manually timing functions, it is usually a good idea to run the functionmany times (1,000 or more times), instead of just once (unless it is a very slowfunction). The reason for doing this is that timers often have limited accuracy.Moreover, CPUs will schedule processes in a haphazard manner. Therefore, anaverage over a series of runs is more accurate than a single measurement.

As you attempt to optimize functions, be sure to either repeatedly profile ortime them as you go. This will give you crucial feedback as to whether theoptimization is working (or not).

Caches

CPU caches are something else to be particularly aware of, especially whencomparing timing results of two different versions of a function. The resultscan be highly dependent on whether the data is in the CPU cache or not. CPUsdon't load data directly from the system RAM, even though it's huge incomparison to the CPU cache (several gigabytes instead of a few megabytes). Thisis because system RAM is very slow to access. Instead, CPUs load data from asmaller, faster bank of memory called cache. Loading data from cache is veryfast, but every time you try and load a memory address that is not stored incache, the cache must make a trip to main memory and slowly load in some data.This delay can result in the CPU sitting around idle for a long time, and isreferred to as a "cache miss".

This means that the first time you run a function, it may run slowly because thedata is not in the CPU cache. The second and later times, it may run much fasterbecause the data is in the cache. Due to this, always use averages when timing,and be aware of the effects of cache.

Understanding caching is also crucial to CPU optimization. If you have analgorithm (routine) that loads small bits of data from randomly spread out areasof main memory, this can result in a lot of cache misses, a lot of the time, theCPU will be waiting around for data instead of doing any work. Instead, if youcan make your data accesses localised, or even better, access memory in a linearfashion (like a continuous list), then the cache will work optimally and the CPUwill be able to work as fast as possible.

Godot usually takes care of such low-level details for you. For example, theServer APIs make sure data is optimized for caching already for things likerendering and physics. Still, you should be especially aware of caching whenwriting GDExtensions.

Languages

Godot supports a number of different languages, and it is worth bearing in mindthat there are trade-offs involved. Some languages are designed for ease of useat the cost of speed, and others are faster but more difficult to work with.

Built-in engine functions run at the same speed regardless of the scriptinglanguage you choose. If your project is making a lot of calculations in its owncode, consider moving those calculations to a faster language.

GDScript

GDScript is designed to be easy to use and iterate,and is ideal for making many types of games. However, in this language, ease ofuse is considered more important than performance. If you need to make heavycalculations, consider moving some of your project to one of the otherlanguages.

C#

C# is popular and has first-class support in Godot. Itoffers a good compromise between speed and ease of use. Beware of possiblegarbage collection pauses and leaks that can occur during gameplay, though. Acommon approach to workaround issues with garbage collection is to useobjectpooling, which is outside the scope of this guide.

Other languages

Third parties provide support for several other languages, includingRust.

C++

Godot is written in C++. Using C++ will usually result in the fastest code.However, on a practical level, it is the most difficult to deploy to end users'machines on different platforms. Options for using C++ includeGDExtensions andcustom modules.

Threads

Consider using threads when making a lot of calculations that can run inparallel to each other. Modern CPUs have multiple cores, each one capable ofdoing a limited amount of work. By spreading work over multiple threads, you canmove further towards peak CPU efficiency.

The disadvantage of threads is that you have to be incredibly careful. As eachCPU core operates independently, they can end up trying to access the samememory at the same time. One thread can be reading to a variable while anotheris writing: this is called arace condition. Before you use threads, make sureyou understand the dangers and how to try and prevent these race conditions.Threads can make debugging considerably more difficult.

For more information on threads, seeUsing multiple threads.

SceneTree

Although Nodes are an incredibly powerful and versatile concept, be aware thatevery node has a cost. Built-in functions such as_process() and_physics_process() propagate through the tree. This housekeeping can reduceperformance when you have a very large numbers of nodes (how many exactlydepends on the target platform and can range from thousands to tens ofthousands so ensure that you profile performance on all target platformsduring development).

Each node is handled individually in the Godot renderer. Therefore, a smallernumber of nodes with more in each can lead to better performance.

One quirk of theSceneTree is that you can sometimesget much better performance by removing nodes from the SceneTree, rather than bypausing or hiding them. You don't have to delete a detached node. You can forexample, keep a reference to a node, detach it from the scene tree usingNode.remove_child(node), then reattachit later usingNode.add_child(node).This can be very useful for adding and removing areas from a game, for example.

You can avoid the SceneTree altogether by using Server APIs. For moreinformation, seeOptimization using Servers.

Physics

In some situations, physics can end up becoming a bottleneck. This isparticularly the case with complex worlds and large numbers of physics objects.

Here are some techniques to speed up physics:

Try using simplified versions of your rendered geometry for collision shapes.Often, this won't be noticeable for end users, but can greatly increaseperformance.
Try removing objects from physics when they are out of view / outside thecurrent area, or reusing physics objects (maybe you allow 8 monsters per area,for example, and reuse these).

Another crucial aspect to physics is the physics tick rate. In some games, youcan greatly reduce the tick rate, and instead of for example, updating physics60 times per second, you may update them only 30 or even 20 times per second.This can greatly reduce the CPU load.

The downside of changing physics tick rate is you can get jerky movement orjitter when the physics update rate does not match the frames per secondrendered. Also, decreasing the physics tick rate will increase input lag.It's recommended to stick to the default physics tick rate (60 Hz) in most gamesthat feature real-time player movement.

The solution to jitter is to usefixed timestep interpolation, which involvessmoothing the rendered positions and rotations over multiple frames to match thephysics. You can either implement this yourself or use athird-party addon.Performance-wise, interpolation is a very cheap operation compared to running aphysics tick. It's orders of magnitude faster, so this can be a significantperformance win while also reducing jitter.

User-contributed notes

Please read theUser-contributed notes policy before submitting a comment.

Movatterモバイル変換