
Metal Game Performance Optimization
Realize the full potential of your Metal-based games by tackling common issues that cause frame rate slowdowns, stutters, and stalls. Discover how to clear up jitter and maintain a silky-smooth frame rate with simple changes in frame pacing. Get introduced to new tools for analyzing rendering passes and pinpoint expensive or unexpected work. Learn how to avoid thread stalls and get specific advice about handling thermal notifications.
Resources
- Analyzing resource dependencies
- Debugging the shaders within a draw command or compute dispatch
- Optimizing GPU performance
- Presentation Slides (PDF)
Related Videos
WWDC23
WWDC18
Good morning, and welcome tothis talk.
My name is Guillem VinalsGangollels.And I work at the GPU SoftwarePerformance Team here at Apple.
Good developers like you makeiOS an excellent gamingplatform.And we at Apple obviously wantto help.
So this year we reviewed some ofthe top iOS games and found somecommon performance issues.
We analyzed a lot of data, andas a result of thatinvestigation, we decided to putthis talk together.So this is going to be the maintopic today.Develop Awesome Games.
But I will only be providingtechnical directions here.So we'll .
Before we begin, please let methank our friends at Croteam.They are the developers behindThe Talos Principle, which is areally awesome game.You will see it featured inthese slides and in two of thedemos.Notice that it has stunningvisuals but it does really notcompromise in performance.And that's what this is all about.
So let's do a quick run throughof the agenda.I'll start with an introductionto the tools.This is a very good place tostart.And then we'll talk about theactual performance issues.Around frame pacing, threadpriorities, thermal states, andunnecessary GPU work.
Even though all these issuesseem unrelated, they willcompound and aggravate eachother.So it's important to tackle themall.Let's start with the tools.This is the most importantmessage.You should profile early and doit often.
Do not ship your game unlessyou've profiled it.And for that you will need toknow your tools.
Today, I will focus on two ofthem.
First, we have instruments,which is our main profilingtool.You will want to use it tounderstand performance, latency,and overall timing.
Second, we have the Metal FrameDebugger, which is also verypowerful tool, which you willwant to use to debug your GPUworkload.So where do we start?This is a question we often get.
Well, this year we are making iteasier for you.We are introducing a newinstruments template, which willbe a great starting point.The Game Performance Template.
It is the combination of alreadyexisting instruments such asSystem Trace, Time Profiler, andMetal System Trace.
We configured it for you so itrecords all the CPU and GPU datathat is relevant for your game.So you can make it smooth.So how do we launch it?How do we get there?Well, just open Instruments andyou will see it right there inthe center.
After you choose it, you will beable to configure it same asevery other template.
Once you start recording, youwill do so in windowed mode,which will allow you to playyour game for as long as youlike, and only the last fewseconds of data will berecorded.And this is how this last fewseconds of data will look like.There's a lot of information solet's have a quick high-leveloverview. First, we have System Trace andTime Profiler, which will giveyou an overview of the systemload as well as your applicationCPU usage.
For example, user interactivemode will record all the activethreads at a given time.In this case, the orange coloryou can see means that there aremore runnable threads availablethan CPU cores.So there is some contingency.
These will offer a great view ofthe system.
There's a couple of great talksthat talk about this instrumentin more depth.Please follow-up on them.Next on our list is Metal SystemTrace, our GPU profiling tool.It offers a great view of thegraphic stack.
All the way from the MetalFramework down to the display.
In particular, we will want topay close attention to the GPU, which is split invertex, fragment, and compute ifyour game uses it.
Notice as well that the displaytrack will be the starting pointof many of our investigations.We will identify a long frame ora starter and we will work itall the way up from there.So it's a very natural place tostart.There is a lot of informationabout the tool because it reallyis a very powerful tool.And I encourage you all to catchup on it.
These are a couple sessions thatwill provide you a greatstarting point.Okay.So next on our list we'll have athread states view which weintroduced this year.
This view will show you thestate of every thread in yourgame.In this case, each colorrepresents a possible threadstate, such as preempted whichis represented in orange.Or blocked which is representedin red.
We designed this viewspecifically with you, gamedevelopers, in mind.Because we know the threadingsystems in modern games are verycomplex.And we hope this really willhelp you.Also we have a track for eachCPU core.
It will show the thread runningon that core as well, as well asthe priority of that thread,which is color coded.
By using this, you will be ableto see at a glance how easy thesystem really is.That was a short but a quitewide introduction to the tools.So it's about time we move tothe actual performance issues.The first one will be aroundframe pacing.And let's visualize it first.For this we used the modifiedversion of the Foxdemo.That will help us illustrate theissue better.
Can you guess which game rendersfaster?Well, some of you may not haveguessed it.
The game on the left is tryingto render at 60 frames persecond.But it can only achieve 40, soit's inconsistent, and it seemsjittery.The game on the right on theother hand is targeting 30frames per second, which canconsistently be achieved.That's why it looks smoother.But that's a bitcounterintuitive.How, how come the game thatrenders faster doesn't looksmoother?Well, this issue's known asmicro stuttering or inconsistentframe pace.
It occurs when the frame time ishigher than the display refreshinterval.For example, our game may take25 milliseconds to render or 40frames per second.And the display may refresh at16.6 millisecond or 60 framesper second.Same as the video we've justseen.These will create some visualinconsistencies.
So how did we get there?What have we done to be in thissituation?Well, we didn't do much really,and that's kind of the wholepoint of this.
After rendering the frame, werequested the next drawable fromthe display link.And as soon as we got thedrawable, we finished the finalpass and presented it rightaway.We explicitly told the system topresent that drawable as soon aspossible, at the next refreshinterval.After all, we are targeting 60frames per second, right?There's also another class ofproblems that will cause microstuttering.And some games are alreadytargeting lower frame rate.
But we have also identified manyof those games that are usingusleep on their main or randomthread.
This is a very bad practice iniOS, so please don't do that andjust hang, hang here for thenext few minutes. And I'll tell you the actualcorrect way of doing this iniOS. Now, let's have a deeper lookinto what happens in the systemfor micro stuttering to bevisible.
In this case, we see here atimeline of all the componentsinvolved in rendering.
And we'll start rendering ourgame normally.Notice this is a three-pointbuffer case, which is quitecommon in iOS.In this case, every drawable isrepresented by a letter and acolor.
And also notice the premisehere.Rendering to drawable V takeslonger than one display refreshinterval, which is the timebetween vsyncs.In this case, could be 25millisecond to render to V and16.6 millisecond in betweendisplay refresh intervals.
So since that is the premise,this means that we will need toon the display forthe next interval to give timeso we can finish.
And we will do so.And that during that interval,we will actually B, B willactually finish.
And we will be ready to presentit but notice that we have justhid the issue here.During this interval, we havealso finished rendering to C.And we are ready to present itright away.
So we will aninconsistent frame pacing fromthat moment onward.We are stuck in this pattern.Every other frame will beinconsistent.And the user will see microstuttering.
Now this may appear in differentshapes and forms in the realworld.So what we'll do now is a quickdemo and I'll show you aninstruments trace of the TalosPrinciple.And we will use to see if we canidentify micro stuttering in thereal world case.
Okay. So what we see here is the samelot of information I've shownyou before.This has been captured with theGame Performance Template bydefault.
Notice all the same instrumentsI talked about here displayed onthe left. And all the game threads here inthe middle.
In particular though, we arelooking now at micro stuttering.
So this quite intuitively willbring us to look at the displaytrack because micro stutteringby definition is framespresented inconsistently.In this case, we have thedisplay track here.
Notice as well that there aresome hints in the display track.We and these are thehints here.They will show you when asurface has been displayed forlonger than we would expect on anormal rendering.
So maybe this is a great placeto start looking at it.There's some clusters of them.So let's zoom into one.
To zoom, we will hold the optionkey and just drag the pointer tothe region of interest.And in this case, if we keeplooking at the display track,it's kind of evident alreadythat we are micro stuttering.We can see that every displayhas a different timing.So in this case for example, wehave 50, 33, 16, back to 50, andback to 33.So when we see this pattern inan instruments capture, it meansthat we are micro stuttering andwe should correct it.
So let's just do that.Back to the slides.
Okay.
We've just seen the problem, howit occurs in the real world.The pattern is basically thesame.So how do we go about fixing it?The best practice here is totarget the frame rate your gamecan achieve.So at the minimum frame durationthere is longer than the time ittakes to render.
For that, there's a bunch ofAPIs that can help you.For example, MT DrawableaddPresentedHandler will giveyou a call back once thatdrawable is presented.So you can identify microstuttering as it is happening.
The other two APIs will help youto actually fix the problem.They will allow you toexplicitly control the framerating-- the frame pacing.In this case we have presentafterMinimumDuration and presentatTime.
What we want to do here?We set the minimum duration forour frame longer than it takesto render.
And we'll do just that.Let's see how that looks.Notice that when we startrendering, we are alreadyconsistent from the get-go.Our frame spends on display moretime it takes to render.
Every frame will be consistent.
The user will see also beingconsistent.
And that's great.Also notice that there's a sideeffect. The frame rate will be lowered.We went from 40 frames persecond to 30 frames per second.So that also gave us some extraframe time to play with.
So how did we do this?How did we fix the-- the framepacing? Well, really it's just a coupleof lines of code.We have the same pattern asbefore.We rendered the scene.We get the next drawable.We do the final pass.The only difference here is thatwe specify a minimum durationfor our frame.And present it with that minimumduration.That's all it takes.
That will allow us to set theminimum duration for our frames.And they will all be consistent.
And after doing so, you may bethinking well, what aboutmaximum duration?What about the concept ofpriority of our work?Or how long a thing could take?Well, that's actually the nextissue on our list-- threadpriorities.Let's visualize it first, sameas we did before.Again, with the modified versionof the Fox II demo.You may be thinking and youwould be right that there aremany things that could causestuttering such as this.Maybe you are doing someresource loading orcompilation.Today we will focus on the morefundamental but also incrediblycommon type of stutter.That caused by thread stalling.If the work priority is not wellcommunicated to the system, yourgame may have unexpected stalls.
iOS does plenty of stuff besidesrendering your game.
Thread priorities are used towarranty the quality of servicein the whole system.
So if a thread does a lot ofwork, its priority will belowered over time so otherthreads can run instead.
That's the concept known aspriority decay.
Also you see on the slide behindme priority inversion.This is another class ofproblems that manifests in avery similar way.In this case, priority inversionoccurs when the render threaddepends on the lower priorityworker thread from your sameengine in order to complete thework.
Let's see how that looks like inthe same timeline as we've seenbefore. In this case, we start renderingat 30 frames per second so weare cool.
But then there is somebackground work.
iOS does lots of stuff.Maybe now it's checking theemail.
And the problem here is that thethread is not wellconfigured.
You may get preempted by thatbackground work.You may not finish schedulingall the work onto the GPU.
And there is no such thing asmaximum duration for a frame.So that could potentially goalong for hundreds ofmilliseconds.
The user will see a stutter.
This is also the theory behindit. And in practice it shows indifferent ways that follow thesame pattern.So let's do another demo.I'll show you another instruments capture of the TalosPrinciple.That will show you how toidentify this problem.
So in this case, what you seehere is again a capture takenwith the Game PerformanceTemplate.But this time we have alreadyzoomed into the frame we areinterested in, which is thisvery long frame.
It has a duration of 233milliseconds.So that's likely a very goodstutter that we shouldinvestigate.By-- by looking at it at aglance, we can already tell thatthe GPU does not seem to bedoing much.It's idle during this time, sothis means that we are notfitting it.
Now we can look at the CPU, ofcourse, and they seem to befairly busy down here.Right?They are really-- all of itseems quite solid.
But notice what you see here isthe time profiler view of ourapplication.And it does not seem to berunning.
Why is our game not running andhow come that causes a stutter?Why?Well, we can switch to the newview I talked to you about, thenew thread states view.
To do so you will go into theicon of your application andclick on that button here andthat would pull out the trackdisplay.
And in this case, you can switchto thread states.And that will hope-- hopefullyalready help you to see there issomething wrong here.
It is highlighted in orange, andit's already telling us that thethread has been preempted for192 milliseconds.So that's the actual problemhere. A render thread is not running.Something preempted it.
If you want to know more, youcan expand information at thebottom, which will contain alsothe thread narrative.
And by clicking at the preemptedthread, you will see here anexplanation of what's going on.In this case, your render threadwas preempted at priority 26,which is very low.It's below background prioritybecause the App Store wasupdating.
So that's something we do notwant.We want to tell the system thatto our user, our game is moreimportant than an App Storeupdate at that particularmoment.So let's go back to the slidesand see how can we do that?So the best practice here is toconfigure your render set.
We recommend the render setpriority to be fixed to 45.
Notice that the OSand macOS priorities haveascending values.
So priority 31 has higherpriority than priority four.
Also, we need to opt out of thescheduler's quality of servicein order to prevent prioritydecay which could lower ourpriority as well.
Let's see how a well-configuredrender thread looks like.In this case, we configure justhow I told you.
We start rendering normally.
We also have some backgroundwork going on. Otherwise it wouldn't be fair.
And this background work couldbe updating the App Store justas we've seen in the demo.
But notice that vsync aftervsync, our render occursnormally.We are preempting the backgroundwork of the CPUs so we can runinstead.
The user does not see thestutter.Your game can run at 30 solidframes per second, even thoughthe system is under heavy load.That is technically awesome, andthat's what this is all about.So let's see how we make thishappen with a little bit ofcode.And it literally is a little bitof code.It is only like a couple lines.
In this case, it's just aboutconfiguring the pthreadattributes before we can createthe pthread.
We need to opt out of quality ofservice, set the priority to 45.
And that's it. We can create the pthread withthose attributes, and it willwork just fine.
It is simple and technicallyawesome.What's not so simple though isthe next issue on our list.
That about dealing with multiplethermal states.The message is very clear.
Design for sustained performanceand deal with the occasionalthermal issues.
So let's see how we go aboutthat.iOS devices give you access toan unprecedented amount ofpower. But in a very smallform factor.So more apps use more resourceson the device, the system maybegin enacting measures in orderto stay cool and responsive.
Also the user may have enabled alow power mode condition, whichwill have a very similar effect.Okay, so the best practicereally is just to adjust yourworkload to the system state.
You should monitor the systemand tune the workloadaccordingly.iOS has many APIs to help youwith that.For example, use NSProcessInfothermalState to either query orregister for notification whenthe device thermal statechanges.
You should also check for thelow power mode condition in asimilar fashion.
Also consider querying the GPUstart/GPU end time from the MTLCommand Buffer in order tounderstand how system loads mayimpact the GPU time.Let's see how we do that with asimple code example.
This comes straight from ourbest practices.
A tip score is a very simpleswitch statement when every casecorresponds to a thermal state.We have nominal, fair, serious,and critical.And that is all very good.So now we know that we are in athermal state and thse command'stelling us to do something aboutit.
So how can, how can we actuallyhelp the system stay cool?Well, I can give you somesuggestions, but it's up to yougame developers to decide whatcompromises to make in order tohelp the system.You know what's best for yourgame to keep being awesome understress.
Some recommendations I'll giveyou though are to target theframe rate that can be sustainedfor the entire game session.For example, stay at 30 framesper second if you cannot sustain60 for ten minutes or more.
Doing the GPU work is also superhelpful.
For example, consider loweringthe resolution of intermediaterender targets, or simply findthe shadow maps, loading simplerassets and even removing some ofthe post-processes altogether.Wherever, whatever fits yourgame the best.
You should decide that one.And this will bring us to thenext issue on our list.
That about dealing withunnecessary GPU work.For that, please welcome mycolleague Ohad on stage.He's going to tell you all aboutit.Thank you, Guillem.Hey, everyone.My name is Ohad, and I'm amember of the Game TechnologiesTeam here at Apple.
In the previous slides, Guillemshowed you how important it isto adapt to the system.
Responding to states like lowpower mode or the varyingthermal states will require youto tune your GPU workload inorder to maintain consistentframe rates throughout an entiregame session.However, for many developers,the GPU is a bit of a black boxhidden behind the curtains of agame engine.Today, we'll pull back thosecurtains.
Wasted GPU time is a very commonproblem and it's one that oftengoes unnoticed.But I want you to remember this.Technically awesome games don'tonly hit their GPU budget.
They're also good citizens tothe system, helping it to staycool and save power.All the popular game enginesprovide a great list of bestpractices to follow.We won't cover those.Instead we'll focus on how totell if something is expensiveto render.
And as we've done with the CPUseveral times today, the bestpractice here is profile yourGPU as well.The power of our GPUs can hidemany efficiencies in eithercontent or algorithms.You will want to time yourworkload, but also understandeach rendering technique thatyou enable.And only keep those that addnoticeably to the visual qualityof your games.
But how do you find theseinefficiencies?How do you determine which partsof your pipeline are flat-outexcessive?This of course brings us back totools.
As always, your first stopshould be Instruments.Here we're looking at MetalSystem Trace.It'll provide you accuratetimings for vertex, fragment,and compute work being done.But by measuring your GPU time,you're only halfway there.Next you want to reallyunderstand what each of yourpasses is doing.
And for this, we're added a newtool to the Metal Frame Debuggerthis year.It's the Dependency graph.
The Dependency graph is a storyof a single frame.
It's made up of nodes and edgesand each one of these tell adifferent part of the story.
Edges represent dependenciesbetween passes.As you follow them from top tobottom, you'll see where eachpass fits into your renderingpipeline. And how they work together tocreate your frame.
Nodes on the other hand are thestory of a single pass.They're made up of three maincomponents.First, the title element willgive you the name of the pass.Now I really want to emphasizethis.Label everything.It'll help you not only in theDependency viewer, butthroughout our entire suite oftools.
Secondly, it'll allow you toquickly tell what type of passyou're looking at.Render, blit, or compute.Here from the icon we can seethat it's a render pass.Next, you have a list ofstatistics describing the workbeing done in this pass.And finally to the bottom, alist of all the resources thatare being written to during thispass, and each of these alsocomed with a label, a thumbnailallowing you to preview yourwork, and a list of informationdescribing each one of thoseresources specifically.
And all that together allows youto really understand each ofyour passes.
Okay, so now we know how to readthe graph.Let's jump into a demo and seehow it all fits together.
Okay.So I have the Fox II demorunning on my machine here.It was built in Scene Kit, whichallowed me to add all sorts ofgreat effects.As you can see, I have cascadingshadow maps, bloom, depth offield, and all of it comestogether to create a beautifullyrendered scene.
Let's use the dependency viewerto see how it all works.First, we'll go to Xcode andwe'll capture a frame using thecapture GPU frame button in thebottom.And we'll select the main passon the left.And we'll also switchto automatic mode which willgive us, will give us ourassistant on the right.Now notice that the same passthat I selected in the debugnavigator is also the one that'sshowing-- is selected, andfocused in the main view.And this is a two-way street.
So as we interact with thegraph, select, selectingdifferent passes or textures oreven buffers, both the navigatoron the left and the assistant onthe right will update to showyour selection.
So this is a really fantasticway to navigate your frame.
Now as I zoom out, the firstthing you'll notice that the statistics hide and the focusgoes away from the individualpasses onto the frame as awhole.And I can zoom out even more tosee a great bird's-eye view ofmy entire frame.Now the really cool thing tonotice here is that sincedependencies drive theconnectivity of the graph, eachlogical piece of work is groupedtogether in space.
So let's zoom in and see what Imean.Here I have a branch of workthat's creating my shadow maps.On the left, I can see threepasses that are rendering theshadows.So this is really fantasticbecause I'm not just getting thestory of my entire frame.But there's another story inbetween these two layers.One of how each renderingtechnique is built up.And this is something that isn'talways entirely obvious whenyou're using a game engine toturn these on.For instance, when my shadowmaps, I may not have known thatcas-- that each cascade wouldrequire its own pass.If I considered each one ofthese individually, theywouldn't really stand out.But now I see that I have toconsider them as a group.And that gives me the insightsthat I need to make informeddecisions on any compromisesthat I make while tuning my GPUworkload.So that's the Dependency viewer.I'll switch back to the slides.And please help me welcomeGuillem back onto the stage forhis final thoughts.Thank you.Thank you. That was an awesome demo.
Cool. So Ohad had just shown us how aframe looks like throughDependency viewer.
And that is great for you toinspect your GPU workload.For example, oftentimes we maygo from a very small and simplepipeline such as this one to avery complex one withpost-process, multiple shadowmaps in HDR. And all of these can be done byadding, you know, a coupleproperties to the common objectof your favorite game engine.
You see that the code complexityof those changes is minimal.But the-- but the renderingcomplexity may have increasedtenfold, which will really bringus back to the beginning rightwhere we started.
Profile.
It is very important that youunderstand what your game does.
You spend tens of thousands ofhours developing a game, youshould consider spending some ofthat time profiling as well.
Everything we have seen todaycan be found within minutes.
The best part?You don't need to know whatyou're looking for.Just record the stutter, get thelong frame, and work it all up--all the way up from there.It's that simple.The tool will give you all theinformation you need to identifythe problems.
But you will need to use thetool.And that is really the takeaway.So we have seen a bunch ofcommon pitfalls followed by somebest practices.
All of these issues can be foundthrough profiling.That's how we found them.We analyzed a ton of games,found the common issues, anddecided to put a talk together.
Now, if you have access to theengine source code, make surethat both thread pacing andthread priorities are wellconfigured.It's just a couple lines of codereally.
But regardless, your game shouldalways adapt to thermals and donot submit unnecessary GPU work.
By making sure to follow allthese best practices, you toowill be developing technicallyawesome games.And that's what this is allabout.
For more information, there isa-- a coming lab at 12 PM.We will be there.I'll be there and now we'll bemore than happy to ask anyquestions you may have afterthis session.Or maybe you just want to sitdown and let us profile yourgame.
Also there, there were two greattalks about Metalfor game developers and ourprofiling tools.Thank you very much, and enjoythe rest of the day.And have a great one.[ Applause ]
[8]ページ先頭