Movatterモバイル変換


[0]ホーム

URL:


Uploaded bySt1X
PPTX, PDF9,193 views

Optimizing Games for Mobiles

The document discusses optimizing mobile game performance through the understanding of various mobile GPU architectures, including Immediate Mode Rendering (IMR), Tile-Based Rendering (TBR), and Tile-Based Deferred Rendering (TBDR). It offers technical recommendations for rendering techniques, geometry processing, and texture management to enhance efficiency and reduce bandwidth usage. The document further elaborates on best practices for CPU and GPU operations, emphasizing memory management and precision in shader programming.

Embed presentation

Downloaded 162 times
Optimising games for mobilesby Dmytro Vovk
Mobile GPUs architecture• There are 3 major mobile GPU architectureson a market:• IMR (Immediate Mode Renderer)• TBR (Tile Based Renderer)• TBDR (Tile Based Deferred Renderer)2
IMR• Renders anything sent to the GPUimmediately. It makes no assumption aboutwhat is going to be submitted next.• Application has to sort opaque geometry frontto back.• It’s basically a brute force.• Nvidia, AMD.3
TBR• Improves on IMR, but still is an IMR.• Bandwidth is a precious resource on mobilesand TBR tries to reduce data transfers as muchas possible.• Your geometry is split in to tiles and then it isprocessed per tile. Tiles have small amount ofmemory for colour, depthstencil buffers, sothey have no need to do transfers fromtosystem memory.• Qualcomm Adreno, ARM Mali 4
TBDR• It is deferred i.e. all the graphics is drawnsomewhere later.• And this is where all the magic happens!• The GPU is aware of context - it know’s what isgoing to be drawn in future and this allows itto employ some awesome optimisations,reduce power consumption, bandwidth and afillrate.• Imagination PowerVR.5
GENERAL RECOMMENDATIONS
What you might know• Batch, Batch, Batch!http://ce.u-sys.org/Veranstaltungen/Interaktive%20Computergraphik%20(Stamminger)/papers/BatchBatchBatch.pdf• Render from one thread only• Avoid synchronisations:1. glFlush/glFinish;2. Querying GL states;3. Accessing render targets;
VERTEX DATA RECOMMENDATIONS
What you might know• Pixel perfect HSR (Hidden Surface Removal),Adreno and ARM does not feature this.• But still needs to sort transparent geometry!• Avoid doing alpha test. Use alpha blendinstead
What you might not know• HSR still requires vertices to be processed!• …thus don’t forget to cull your geometry onCPU!• Prefer Stencil Test before Scissor.– Stencil test is performed in hardware on PowerVRGPUs.– Stencil mask is stored in fast on-chip memory– Stencil can be of any form in contrast to therectangular Scissor
What you might not know• Why no alpha test?!o Alpha testdiscard requires fragment shader to run, before visibility forcurrent fragment can be determined. This will remove benefits of HSRo Even more! If shader code contains discard, than any geometry renderedwith this shader will suffer from alpha test drawbacks. Even if this key-wordis under condition, USSE (PVR’s shader engine) does assumes, that thiscondition may be hit.o Move discard into separate shadero Draw opaque geometry, than alpha tested one and alpha blended in the end
What you might know• Bandwidth matters1. Use constant colour per object, instead of pervertex2. Simplify your models. Use smaller data types.3. Use indexed triangles or non-indexed trianglestrips4. Use VBO instead of client arrays5. Use VAO
What you might not know• VBOs allocations are aligned by 4KB page size.That means, your small buffer for just acouple of triangles will occupy 4KB inmemory, - large amount of small VBOs candefragment and waste you memory.
What you might not know• Updating your VBO data each frame:1. glBufferSubData. If it is used to update big part of theoriginal data it will harm performance. Try to avoidupdates to buffers, that are in use now2. glBufferData. It’s OK to completely overwrite originaldata. Old data will be orphaned by driver and a newdata storage will be allocated3. glMapBuffer with triple buffered VBO is preferred wayto update your data• EXT_map_buffer_range (iOS 6+ only), when you need toupdate only a subset of a buffer object.
What you might not knowint bufferID = 0; //initializationfor (int i = 0; i < 3; ++i) // allocate data for 3 vbo only, do not upload it{glBindBuffer(vertexBuffer[i]);glBufferData(GL_ARRAY_BUFFER, 0, 0, GL_DYNAMIC_DRAW);}//...glBindBuffer(GL_ARRAY_BUFFER, vertexBuffer[bufferID]);void* ptr = glMapBufferOES(GL_ARRAY_BUFFER, GL_WRITE_ONLY_OES);//update data hereglUnmapBufferOES(GL_ARRAY_BUFFER);++bufferID;if (bufferID == 3) //cycling through 3 buffers{bufferID = 0;}
What you might not know• This scheme will give you the best performancepossible – without blocking CPU or GPU, noredundant memcpy operations, lower CPU load, butextra memory is used (note, that you will need noextra temporal buffer to store your data beforesending it to VBO). This is ideal for dynamicbatching of sprites.update(1), draw(1), gpuworking(..............)update(2), draw(2), gpuworking(..............)update(3), draw(3), gpuworking(..............)
What you might not know• Float type is native to GPU• …that means any other type will be convertedto float by USSE• …resulting in few additional cycles• Thus it’s your choice of tradeoff betweenbandwidthstorage and additional cycles
What you might know• Use interleaved vertex data– Align each vertex attribute by 4 bytes boundaries
What you might not know• If you don’t align your data, driver will do thisinstead.• …resulting in slower performance.
What you might not know• PowerVR SGX 5XT GPU series have a vertexcache for last 12 vertex indices. Optimise yourindexed geometry for this cache size.• PowerVR Series 6 (XT) has 16k of vertex cache• Take a look at optimisers, that use TomForsyth’s algorithmhttp://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html
What you might know• Split your vertex data into two parts:1. Static VBO - the one, that never will be changed2. Dynamic VBO – the one, that needs to beupdated frequently• Split your vertex data into few VBOs, when fewmeshes share the same set of attributes
TEXTURE DATARECOMMENDATIONS
What you might know• Bandwidth matters1. Use lower precision formats - RGBA4444,RGBA55512. Use PVRTC compressed textures3. Use atlases4. Use mipmaps. They improve texture cacheefficiency and quality.
What you might not know• Avoid RGB8 format - texture data has to bealigned, so driver will pad RGB8 to RGBA8.• Try to replace it with RGB56524
What you might not know• Why PVRTC?1. PVRTC provides great compression, resulting insmaller texture size, improved cache, savedbandwidth and decreased power consumption2. PVRTC stores pixel data in GPU’s native order i.eBGRA, instead of RGBA, in blocks optimised fordata access pattern.
What you might not know• It doesn’t matter whether your textures are inRGBA or BGRA format - the driver will still dointernal processing on a texture data toimprove memory access locality and cacheefficiency.26
What you might not know• On PVR 6 (XT) driver will reserve memory for bothtexture and mip maps chain, but it will commitmemory only for mip level 0.• If you’ll decide to generate mip maps driver willcommit pages reserved for mip chain.• That’s expectable.
What you might not know• On PVR 55MP (tested on iOS 4 – 7.1.1 versions)driver will ALWAYS commit memory for mip maps,regardless, whether you requested to create them, ornot.• That means you’ll waste 33% of memory!• In most cases you don’t need mip maps for 2Dgames, but you are forced to pay this overhead.• That’s too bad for 2D games. However there is oneworkaround – make your textures NPOT (non-powerof two).28
What you might not know• Luckily, there is one solution to this problem.• Core OpenGL ES 2.0 doesn’t support mip mapsfor NPoT (non power of two) textures, so ifyou’ll make your textures to be NPoT, you willnot pay this memory overhead.29
What you might not know• Interesting notes:• glTexImage2D driver implementation has afunction CheckFastPath. When you uploadPoT texture you’ll hit this fast path. NPoTtextures omit it.• When you upload a lot of textures youVRAM gets defragmented, so driver willremap memory - i.e. it will create one bigbuffer for few small textures and will movethem to that buffer 30
What you might not know• Let’s take a look on a texture upload process.• Usual way to do this:1. Load texture to temporal buffer in RAM1. Encode texture if it is stored in compressed file format– JPGPNG2. Feed this buffer to glTexImage2D3. Draw!• Looks simple, but is it the fastest way?
What you might not know• …NO!void* buf = malloc(TEXTURE_SIZE); //4mb for RGBA8 1024x1024 textureLoadTexture(textureName);glBindTexture(GL_TEXTURE_2D, textureID);glTexImage2D(GL_TEXTURE_2D, 0, 4, 1024, 1024, 0, GL_RGBA, GL_UNSIGNED_BYTE, buf);// buf is copied into internal buffer, created by driver (that's obvious)free(buf); // because buffer can be freed immediately after glTexImage2DglDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_BYTE, 0);// driver will do some additional work to fully upload texture first time it is actually used!• A lot of redundant work!
What you might not know• Jedi way to upload textures:int fileHandle = open(filename, O_RDONLY);void* ptr = mmap(NULL, TEXTURE_SIZE, PROT_READ, MAP_PRIVATE, fileHandle, 0); //file mappingglBindTexture(GL_TEXTURE_2D, textureID);glTexImage2D(GL_TEXTURE_2D, 0, 4, 1024, 1024, 0, GL_RGBA, GL_UNSIGNED_BYTE, ptr);glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_BYTE, 0);// driver will do some additional work to fully upload texture first time it is actually used!munmap(ptr, TEXTURE_SIZE);• File mapping does not copy your file data into RAM! Itdoes load file data page by page, when it’s accessed.• Thus we eliminated one redundant copy, dramaticallydecreased texture upload time and decreased memoryfragmentation
What you might not know• Keep in my, that textures are finally wired onlywhen they are used first time. So draw themoff screen immediately after glTexImage2D,otherwise it will take too long to render thefirst frame and it will be nearly impossible totrack the cause of this.34
What you might not know• NPOT textures works only with theGL_CLAMP_TO_EDGE wrap mode• POT are preferable, they gives you the bestperformance possible• Use NPOT textures with dimensions multiple to32 pixels for best performance• Driver will pad data of your NPOT texture tomatch the size of the closes POT values.
What you might not know• Prefer OES_texture_half_float instead ofOES_texture_float• Texture reads fetch only 32 bits per texel, thus RGBA floattexture will result in 4 texture reads
What you might not know• Always use glClear at the beginning of theframe…• … and EXT_discard_framebuffer at the end.• PVR GPU series have a fast on chipdepthstencil buffer for each tile. If you forgetto cleardiscard depth buffer, it will beuploaded from HW to SW
What you might know• Prefer multi texturing instead of multiplepasses• Configure texture parameters before feedingimage data to driver
SHADERS BEST PRACTICES
What you might know• Be wise with precision hints• Avoid branching• Eliminate loops• Do not use discard. Place discard instruction asearly, as possible to avoid uselesscomputations
What you might not know• Code inside of dynamic branch (condition isnon constant value) will be executed anywayand than it will be orphaned if condition isfalse
What you might not know• highp – represents 32 bit floating point value• mediump – represents 16 bit floating pointvalue in range of [-65520, 65520]• lowp – 10 bit fixed point values in range of [-2,2] with step of 1/256• Try to give the same precision to all youoperands, because conversion takes some time
What you might not know• highp values are calculated on a scalarprocessor only on USSE1 (thats PVR 5):highp vec4 v1, v2;highp float s1, s2;v2 = (v1 * s1) * s2;//scalar processor executes v1 * s1 – 4 operations, and than this result is multiplied by s2 on//a scalar processor again – 4 additional operationsv2 = v1 * (s1 * s2);//s1 * s2 – 1 operation on a scalar processor; result * v1 – 4 operations on a scalar processor
HARDWARE FEATURES
What you might know• Typical CPU found in mobile devices:1. ARMv7ARMv8 architecture2. Cortex AXKraitSwift or Cyclone3. Up to 2300 MHz4. Up to 8 cores5. Thumb-2 instructions set
What you might not know• ARMv7 has no hardware support for integerdivision• VFPv3, VFPv4 FPU• NEON SIMD engine• Unaligned access is done in software on CortexA8. That means it is hundred times slower• Cortex A8 is in-order CPU. Cortex A9+ are outof order
What you might not know• Cortex A9+ core has full VFPv3 FPU, whileCortex A8 has a VFPLite. That means, that floatoperations take 1 cycle on A9 and 10 cycles onA8!
What you might not know• NEON – 16 registers, 128 bit wide each.Supports operations on 8, 16, 32 and 64 bitsintegers and 32 bits float values• NEON can be used for:– Software geometry instancing;– Skinning;– As a general vertex processor;– Other, typical, applications for SIMD.
What you might not know• There are 3 ways to use NEON engine in yourcode:1. Intrinsics1.1 GLKMath2. Handwritten NEON assembly3. Autovectorization. Add –mllvm –vectorize –mllvm –bb-vectorize-aligned-only to Other CC++Flags in project settings and you are ready to go.
What you might not know• Intrinsics:
What you might not know• Assembly:
What you might not know• Summary:Running time, ms CPU usage, %Intrinsics 2764 19Assembly 3664 20FPU 6209 25-28FPU autovectorized 5028 22-24• Intrinsics got me 25% speedup over assembly.• Note that speed of code generated fromintrinsics will vary from compiler to compiler.Modern compilers are really good in this.
What you might not know• Intrinsics advantages over assembly:– Higher level code;– Much simpler;– No need to manage registers;– You can vectorize basic blocks and buildsolution for every new problem with thisblocks. In contrast to assembly – you have tosolve each new problem from scratch;
What you might not know• Assembly advantages over intrinsics:– Code generated from intrinsics vary fromcompiler to compiler and can give you reallybig difference in speed. Assembly code willalways be the same.
What you might not know__attribute__((always_inline)) void Matrix4ByVec4(constfloat32x4x4_t* __restrict__ mat, const float32x4_t* __restrict__vec, float32x4_t* __restrict__ result){(*result) = vmulq_n_f32((*mat).val[0], (*vec)[0]);(*result) = vmlaq_n_f32((*result), (*mat).val[1], (*vec)[1]);(*result) = vmlaq_n_f32((*result), (*mat).val[2], (*vec)[2]);(*result) = vmlaq_n_f32((*result), (*mat).val[3], (*vec)[3]);}
What you might not know__attribute__((always_inline)) void Matrix4ByMatrix4(const float32x4x4_t* __restrict__ m1, const float32x4x4_t* __restrict__ m2,float32x4x4_t* __restrict__ r){#ifdef INTRINSICS(*r).val[0] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[0], 0));(*r).val[1] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[1], 0));(*r).val[2] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[2], 0));(*r).val[3] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[3], 0));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[1], vgetq_lane_f32((*m2).val[0], 1));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[1], vgetq_lane_f32((*m2).val[1], 1));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[1], vgetq_lane_f32((*m2).val[2], 1));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[1], vgetq_lane_f32((*m2).val[3], 1));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[2], vgetq_lane_f32((*m2).val[0], 2));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[2], vgetq_lane_f32((*m2).val[1], 2));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[2], vgetq_lane_f32((*m2).val[2], 2));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[2], vgetq_lane_f32((*m2).val[3], 2));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[3], vgetq_lane_f32((*m2).val[0], 3));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[3], vgetq_lane_f32((*m2).val[1], 3));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[3], vgetq_lane_f32((*m2).val[2], 3));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[3], vgetq_lane_f32((*m2).val[3], 3));}
What you might not know__asm__ volatile("vldmia %6, { q0-q3 } nt""vldmia %0, { q8-q11 }nt""vmul.f32 q12, q8, d0[0]nt""vmul.f32 q13, q8, d2[0]nt""vmul.f32 q14, q8, d4[0]nt""vmul.f32 q15, q8, d6[0]nt""vmla.f32 q12, q9, d0[1]nt""vmla.f32 q13, q9, d2[1]nt""vmla.f32 q14, q9, d4[1]nt""vmla.f32 q15, q9, d6[1]nt""vmla.f32 q12, q10, d1[0]nt""vmla.f32 q13, q10, d3[0]nt""vmla.f32 q14, q10, d5[0]nt""vmla.f32 q15, q10, d7[0]nt""vmla.f32 q12, q11, d1[1]nt""vmla.f32 q13, q11, d3[1]nt""vmla.f32 q14, q11, d5[1]nt""vmla.f32 q15, q11, d7[1]nt""vldmia %1, { q0-q3 } nt""vmul.f32 q8, q12, d0[0]nt""vmul.f32 q9, q12, d2[0]nt""vmul.f32 q10, q12, d4[0]nt""vmul.f32 q11, q12, d6[0]nt""vmla.f32 q8, q13, d0[1]nt""vmla.f32 q8, q14, d1[0]nt""vmla.f32 q8, q15, d1[1]nt""vmla.f32 q9, q13, d2[1]nt""vmla.f32 q9, q14, d3[0]nt""vmla.f32 q9, q15, d3[1]nt""vmla.f32 q10, q13, d4[1]nt""vmla.f32 q10, q14, d5[0]nt""vmla.f32 q10, q15, d5[1]nt""vmla.f32 q11, q13, d6[1]nt""vmla.f32 q11, q14, d7[0]nt""vmla.f32 q11, q15, d7[1]nt""vstmia %2, { q8 }nt""vstmia %3, { q9 }nt""vstmia %4, { q10 }nt""vstmia %5, { q11 }":: "r" (proj), "r" (squareVertices), "r" (v1), "r" (v2), "r" (v3), "r" (v4), "r" (modelView): "memory", "q0", "q1", "q2", "q3", "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15");
What you might not know• For detailed explanation onintrinsicsassembly see:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491e/CIHJBEFE.html

Recommended

PDF
Optimization in Unity: simple tips for developing with "no surprises" / Anton...
PPTX
Unity - Internals: memory and performance
PPTX
Practical guide to optimization in Unity
PDF
Unity Internals: Memory and Performance
PPTX
Optimizing unity games (Google IO 2014)
PDF
Optimizing Large Scenes in Unity
PPTX
[UniteKorea2013] Memory profiling in Unity
PDF
Optimizing Unity games for mobile devices
PPTX
Future Directions for Compute-for-Graphics
PPTX
Parallel Futures of a Game Engine
 
PDF
OpenGL 4.4 - Scene Rendering Techniques
PDF
Introduction to CUDA
PDF
Masked Software Occlusion Culling
PPTX
Scene Graphs & Component Based Game Engines
PPT
Introduction to parallel computing using CUDA
PPTX
Intro to GPGPU with CUDA (DevLink)
PPT
Your Game Needs Direct3D 11, So Get Started Now!
 
PDF
NVidia CUDA Tutorial - June 15, 2009
PPT
Vpu technology &gpgpu computing
PPTX
Built for performance: the UIElements Renderer – Unite Copenhagen 2019
PDF
Kato Mivule: An Overview of CUDA for High Performance Computing
PPT
BitSquid Tech: Benefits of a data-driven renderer
PDF
Introduction to CUDA C: NVIDIA : Notes
PDF
Cuda tutorial
PPTX
Cross-scene references: A shock to the system - Unite Copenhagen 2019
PDF
Unite 2013 optimizing unity games for mobile platforms
PPTX
Battery Optimization for Android Apps - Devoxx14
PPTX
PPTX
Anti malaria month june 2013
PPTX
Adding more visuals without affecting performance
 

More Related Content

PDF
Optimization in Unity: simple tips for developing with "no surprises" / Anton...
PPTX
Unity - Internals: memory and performance
PPTX
Practical guide to optimization in Unity
PDF
Unity Internals: Memory and Performance
PPTX
Optimizing unity games (Google IO 2014)
PDF
Optimizing Large Scenes in Unity
PPTX
[UniteKorea2013] Memory profiling in Unity
PDF
Optimizing Unity games for mobile devices
Optimization in Unity: simple tips for developing with "no surprises" / Anton...
Unity - Internals: memory and performance
Practical guide to optimization in Unity
Unity Internals: Memory and Performance
Optimizing unity games (Google IO 2014)
Optimizing Large Scenes in Unity
[UniteKorea2013] Memory profiling in Unity
Optimizing Unity games for mobile devices

What's hot

PPTX
Future Directions for Compute-for-Graphics
PPTX
Parallel Futures of a Game Engine
 
PDF
OpenGL 4.4 - Scene Rendering Techniques
PDF
Introduction to CUDA
PDF
Masked Software Occlusion Culling
PPTX
Scene Graphs & Component Based Game Engines
PPT
Introduction to parallel computing using CUDA
PPTX
Intro to GPGPU with CUDA (DevLink)
PPT
Your Game Needs Direct3D 11, So Get Started Now!
 
PDF
NVidia CUDA Tutorial - June 15, 2009
PPT
Vpu technology &gpgpu computing
PPTX
Built for performance: the UIElements Renderer – Unite Copenhagen 2019
PDF
Kato Mivule: An Overview of CUDA for High Performance Computing
PPT
BitSquid Tech: Benefits of a data-driven renderer
PDF
Introduction to CUDA C: NVIDIA : Notes
PDF
Cuda tutorial
PPTX
Cross-scene references: A shock to the system - Unite Copenhagen 2019
PDF
Unite 2013 optimizing unity games for mobile platforms
Future Directions for Compute-for-Graphics
Parallel Futures of a Game Engine
 
OpenGL 4.4 - Scene Rendering Techniques
Introduction to CUDA
Masked Software Occlusion Culling
Scene Graphs & Component Based Game Engines
Introduction to parallel computing using CUDA
Intro to GPGPU with CUDA (DevLink)
Your Game Needs Direct3D 11, So Get Started Now!
 
NVidia CUDA Tutorial - June 15, 2009
Vpu technology &gpgpu computing
Built for performance: the UIElements Renderer – Unite Copenhagen 2019
Kato Mivule: An Overview of CUDA for High Performance Computing
BitSquid Tech: Benefits of a data-driven renderer
Introduction to CUDA C: NVIDIA : Notes
Cuda tutorial
Cross-scene references: A shock to the system - Unite Copenhagen 2019
Unite 2013 optimizing unity games for mobile platforms

Viewers also liked

PPTX
Battery Optimization for Android Apps - Devoxx14
PPTX
PPTX
Anti malaria month june 2013
PPTX
Adding more visuals without affecting performance
 
PPTX
Anti dengue month , July 2013
PPTX
Code vectorization for mobile devices
 
DOC
CV_PDhawad
PDF
How to build rock solid apps & keep 100m+ users happy
PDF
Modul07 a
PPTX
Evaluation
PPTX
Think vis 2013
PPTX
Writing in the right way for your website, by Expert Market
PPT
Pip 2013-2014
PDF
New Age, New Learners, New Skills
KEY
Win Over Your Toughest Audiences
PPTX
Volvio el oso arturo a showmatch
PDF
Raspberry Stake, Tree stake, Nursery Stake
PPTX
March of dimes
 
PPTX
Changes to improve your health
PPTX
FDRS Competition Presentation:
Battery Optimization for Android Apps - Devoxx14
Anti malaria month june 2013
Adding more visuals without affecting performance
 
Anti dengue month , July 2013
Code vectorization for mobile devices
 
CV_PDhawad
How to build rock solid apps & keep 100m+ users happy
Modul07 a
Evaluation
Think vis 2013
Writing in the right way for your website, by Expert Market
Pip 2013-2014
New Age, New Learners, New Skills
Win Over Your Toughest Audiences
Volvio el oso arturo a showmatch
Raspberry Stake, Tree stake, Nursery Stake
March of dimes
 
Changes to improve your health
FDRS Competition Presentation:

Similar to Optimizing Games for Mobiles

PDF
Дмитрий Вовк - Learn iOS Game Optimization. Ultimate Guide
PPTX
Approaching zero driver overhead
PPTX
Triangle Visibility buffer
PPTX
Beyond porting
PDF
Smedberg niklas bringing_aaa_graphics
PPTX
[TGDF 2020] Mobile Graphics Best Practices for Artist
PDF
OpenGL ES and Mobile GPU
PPSX
Dx11 performancereloaded
PDF
Hpg2011 papers kazakov
PDF
The Explanation the Pipeline design strategy.pdf
 
PPTX
GFX Part 7 - Introduction to Rendering Targets in OpenGL ES
PDF
GeForce 8800 OpenGL Extensions
PDF
PowerVR performance recommendations
PPT
Advanced Mobile Optimizations.ppt
PDF
GPU - how can we use it?
PDF
Droidcon2013 triangles gangolells_imagination
PPT
Advanced Game Development with the Mobile 3D Graphics API
PPTX
Advanced Mobile Optimizations
PPTX
Penn graphics
KEY
openFrameworks 007 - GL
 
Дмитрий Вовк - Learn iOS Game Optimization. Ultimate Guide
Approaching zero driver overhead
Triangle Visibility buffer
Beyond porting
Smedberg niklas bringing_aaa_graphics
[TGDF 2020] Mobile Graphics Best Practices for Artist
OpenGL ES and Mobile GPU
Dx11 performancereloaded
Hpg2011 papers kazakov
The Explanation the Pipeline design strategy.pdf
 
GFX Part 7 - Introduction to Rendering Targets in OpenGL ES
GeForce 8800 OpenGL Extensions
PowerVR performance recommendations
Advanced Mobile Optimizations.ppt
GPU - how can we use it?
Droidcon2013 triangles gangolells_imagination
Advanced Game Development with the Mobile 3D Graphics API
Advanced Mobile Optimizations
Penn graphics
openFrameworks 007 - GL
 

Recently uploaded

PPTX
UFCD 0797 - SISTEMAS OPERATIVOS_Unidade Completa.pptx
PDF
[BDD 2025 - Mobile Development] Crafting Immersive UI with E2E and AGSL Shade...
PDF
Supervised Machine Learning Approaches for Log-Based Anomaly Detection: A Cas...
PPTX
kernel PPT (Explanation of Windows Kernal).pptx
PDF
Top Crypto Supers 15th Report November 2025
PDF
ODSC AI West: Agent Optimization: Beyond Context engineering
PDF
Transcript: The partnership effect: Libraries and publishers on collaborating...
PDF
Dev Dives: Build smarter agents with UiPath Agent Builder
PDF
The Necessity of Digital Forensics, the Digital Forensics Process & Laborator...
PDF
The Evolving Role of the CEO in the Age of AI
PPTX
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
PDF
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
PDF
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
PPTX
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
PDF
[BDD 2025 - Artificial Intelligence] AI for the Underdogs: Innovation for Sma...
PDF
Top 10 AI Development Companies in UK 2025
PPTX
Guardrails in Action - Ensuring Safe AI with Azure AI Content Safety.pptx
PDF
"DISC as GPS for team leaders: how to lead a team from storming to performing...
 
PPTX
Support, Monitoring, Continuous Improvement & Scaling Agentic Automation [3/3]
PPTX
How to Choose the Right Vendor for ADA PDF Accessibility and Compliance in 2026
UFCD 0797 - SISTEMAS OPERATIVOS_Unidade Completa.pptx
[BDD 2025 - Mobile Development] Crafting Immersive UI with E2E and AGSL Shade...
Supervised Machine Learning Approaches for Log-Based Anomaly Detection: A Cas...
kernel PPT (Explanation of Windows Kernal).pptx
Top Crypto Supers 15th Report November 2025
ODSC AI West: Agent Optimization: Beyond Context engineering
Transcript: The partnership effect: Libraries and publishers on collaborating...
Dev Dives: Build smarter agents with UiPath Agent Builder
The Necessity of Digital Forensics, the Digital Forensics Process & Laborator...
The Evolving Role of the CEO in the Age of AI
The power of Slack and MuleSoft | Bangalore MuleSoft Meetup #60
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
"Feelings versus facts: why metrics are more important than intuition", Igor ...
 
[BDD 2025 - Artificial Intelligence] AI for the Underdogs: Innovation for Sma...
Top 10 AI Development Companies in UK 2025
Guardrails in Action - Ensuring Safe AI with Azure AI Content Safety.pptx
"DISC as GPS for team leaders: how to lead a team from storming to performing...
 
Support, Monitoring, Continuous Improvement & Scaling Agentic Automation [3/3]
How to Choose the Right Vendor for ADA PDF Accessibility and Compliance in 2026

Optimizing Games for Mobiles

  • 1.
    Optimising games formobilesby Dmytro Vovk
  • 2.
    Mobile GPUs architecture•There are 3 major mobile GPU architectureson a market:• IMR (Immediate Mode Renderer)• TBR (Tile Based Renderer)• TBDR (Tile Based Deferred Renderer)2
  • 3.
    IMR• Renders anythingsent to the GPUimmediately. It makes no assumption aboutwhat is going to be submitted next.• Application has to sort opaque geometry frontto back.• It’s basically a brute force.• Nvidia, AMD.3
  • 4.
    TBR• Improves onIMR, but still is an IMR.• Bandwidth is a precious resource on mobilesand TBR tries to reduce data transfers as muchas possible.• Your geometry is split in to tiles and then it isprocessed per tile. Tiles have small amount ofmemory for colour, depthstencil buffers, sothey have no need to do transfers fromtosystem memory.• Qualcomm Adreno, ARM Mali 4
  • 5.
    TBDR• It isdeferred i.e. all the graphics is drawnsomewhere later.• And this is where all the magic happens!• The GPU is aware of context - it know’s what isgoing to be drawn in future and this allows itto employ some awesome optimisations,reduce power consumption, bandwidth and afillrate.• Imagination PowerVR.5
  • 6.
  • 7.
    What you mightknow• Batch, Batch, Batch!http://ce.u-sys.org/Veranstaltungen/Interaktive%20Computergraphik%20(Stamminger)/papers/BatchBatchBatch.pdf• Render from one thread only• Avoid synchronisations:1. glFlush/glFinish;2. Querying GL states;3. Accessing render targets;
  • 8.
  • 9.
    What you mightknow• Pixel perfect HSR (Hidden Surface Removal),Adreno and ARM does not feature this.• But still needs to sort transparent geometry!• Avoid doing alpha test. Use alpha blendinstead
  • 10.
    What you mightnot know• HSR still requires vertices to be processed!• …thus don’t forget to cull your geometry onCPU!• Prefer Stencil Test before Scissor.– Stencil test is performed in hardware on PowerVRGPUs.– Stencil mask is stored in fast on-chip memory– Stencil can be of any form in contrast to therectangular Scissor
  • 11.
    What you mightnot know• Why no alpha test?!o Alpha testdiscard requires fragment shader to run, before visibility forcurrent fragment can be determined. This will remove benefits of HSRo Even more! If shader code contains discard, than any geometry renderedwith this shader will suffer from alpha test drawbacks. Even if this key-wordis under condition, USSE (PVR’s shader engine) does assumes, that thiscondition may be hit.o Move discard into separate shadero Draw opaque geometry, than alpha tested one and alpha blended in the end
  • 12.
    What you mightknow• Bandwidth matters1. Use constant colour per object, instead of pervertex2. Simplify your models. Use smaller data types.3. Use indexed triangles or non-indexed trianglestrips4. Use VBO instead of client arrays5. Use VAO
  • 13.
    What you mightnot know• VBOs allocations are aligned by 4KB page size.That means, your small buffer for just acouple of triangles will occupy 4KB inmemory, - large amount of small VBOs candefragment and waste you memory.
  • 14.
    What you mightnot know• Updating your VBO data each frame:1. glBufferSubData. If it is used to update big part of theoriginal data it will harm performance. Try to avoidupdates to buffers, that are in use now2. glBufferData. It’s OK to completely overwrite originaldata. Old data will be orphaned by driver and a newdata storage will be allocated3. glMapBuffer with triple buffered VBO is preferred wayto update your data• EXT_map_buffer_range (iOS 6+ only), when you need toupdate only a subset of a buffer object.
  • 15.
    What you mightnot knowint bufferID = 0; //initializationfor (int i = 0; i < 3; ++i) // allocate data for 3 vbo only, do not upload it{glBindBuffer(vertexBuffer[i]);glBufferData(GL_ARRAY_BUFFER, 0, 0, GL_DYNAMIC_DRAW);}//...glBindBuffer(GL_ARRAY_BUFFER, vertexBuffer[bufferID]);void* ptr = glMapBufferOES(GL_ARRAY_BUFFER, GL_WRITE_ONLY_OES);//update data hereglUnmapBufferOES(GL_ARRAY_BUFFER);++bufferID;if (bufferID == 3) //cycling through 3 buffers{bufferID = 0;}
  • 16.
    What you mightnot know• This scheme will give you the best performancepossible – without blocking CPU or GPU, noredundant memcpy operations, lower CPU load, butextra memory is used (note, that you will need noextra temporal buffer to store your data beforesending it to VBO). This is ideal for dynamicbatching of sprites.update(1), draw(1), gpuworking(..............)update(2), draw(2), gpuworking(..............)update(3), draw(3), gpuworking(..............)
  • 17.
    What you mightnot know• Float type is native to GPU• …that means any other type will be convertedto float by USSE• …resulting in few additional cycles• Thus it’s your choice of tradeoff betweenbandwidthstorage and additional cycles
  • 18.
    What you mightknow• Use interleaved vertex data– Align each vertex attribute by 4 bytes boundaries
  • 19.
    What you mightnot know• If you don’t align your data, driver will do thisinstead.• …resulting in slower performance.
  • 20.
    What you mightnot know• PowerVR SGX 5XT GPU series have a vertexcache for last 12 vertex indices. Optimise yourindexed geometry for this cache size.• PowerVR Series 6 (XT) has 16k of vertex cache• Take a look at optimisers, that use TomForsyth’s algorithmhttp://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html
  • 21.
    What you mightknow• Split your vertex data into two parts:1. Static VBO - the one, that never will be changed2. Dynamic VBO – the one, that needs to beupdated frequently• Split your vertex data into few VBOs, when fewmeshes share the same set of attributes
  • 22.
  • 23.
    What you mightknow• Bandwidth matters1. Use lower precision formats - RGBA4444,RGBA55512. Use PVRTC compressed textures3. Use atlases4. Use mipmaps. They improve texture cacheefficiency and quality.
  • 24.
    What you mightnot know• Avoid RGB8 format - texture data has to bealigned, so driver will pad RGB8 to RGBA8.• Try to replace it with RGB56524
  • 25.
    What you mightnot know• Why PVRTC?1. PVRTC provides great compression, resulting insmaller texture size, improved cache, savedbandwidth and decreased power consumption2. PVRTC stores pixel data in GPU’s native order i.eBGRA, instead of RGBA, in blocks optimised fordata access pattern.
  • 26.
    What you mightnot know• It doesn’t matter whether your textures are inRGBA or BGRA format - the driver will still dointernal processing on a texture data toimprove memory access locality and cacheefficiency.26
  • 27.
    What you mightnot know• On PVR 6 (XT) driver will reserve memory for bothtexture and mip maps chain, but it will commitmemory only for mip level 0.• If you’ll decide to generate mip maps driver willcommit pages reserved for mip chain.• That’s expectable.
  • 28.
    What you mightnot know• On PVR 55MP (tested on iOS 4 – 7.1.1 versions)driver will ALWAYS commit memory for mip maps,regardless, whether you requested to create them, ornot.• That means you’ll waste 33% of memory!• In most cases you don’t need mip maps for 2Dgames, but you are forced to pay this overhead.• That’s too bad for 2D games. However there is oneworkaround – make your textures NPOT (non-powerof two).28
  • 29.
    What you mightnot know• Luckily, there is one solution to this problem.• Core OpenGL ES 2.0 doesn’t support mip mapsfor NPoT (non power of two) textures, so ifyou’ll make your textures to be NPoT, you willnot pay this memory overhead.29
  • 30.
    What you mightnot know• Interesting notes:• glTexImage2D driver implementation has afunction CheckFastPath. When you uploadPoT texture you’ll hit this fast path. NPoTtextures omit it.• When you upload a lot of textures youVRAM gets defragmented, so driver willremap memory - i.e. it will create one bigbuffer for few small textures and will movethem to that buffer 30
  • 31.
    What you mightnot know• Let’s take a look on a texture upload process.• Usual way to do this:1. Load texture to temporal buffer in RAM1. Encode texture if it is stored in compressed file format– JPGPNG2. Feed this buffer to glTexImage2D3. Draw!• Looks simple, but is it the fastest way?
  • 32.
    What you mightnot know• …NO!void* buf = malloc(TEXTURE_SIZE); //4mb for RGBA8 1024x1024 textureLoadTexture(textureName);glBindTexture(GL_TEXTURE_2D, textureID);glTexImage2D(GL_TEXTURE_2D, 0, 4, 1024, 1024, 0, GL_RGBA, GL_UNSIGNED_BYTE, buf);// buf is copied into internal buffer, created by driver (that's obvious)free(buf); // because buffer can be freed immediately after glTexImage2DglDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_BYTE, 0);// driver will do some additional work to fully upload texture first time it is actually used!• A lot of redundant work!
  • 33.
    What you mightnot know• Jedi way to upload textures:int fileHandle = open(filename, O_RDONLY);void* ptr = mmap(NULL, TEXTURE_SIZE, PROT_READ, MAP_PRIVATE, fileHandle, 0); //file mappingglBindTexture(GL_TEXTURE_2D, textureID);glTexImage2D(GL_TEXTURE_2D, 0, 4, 1024, 1024, 0, GL_RGBA, GL_UNSIGNED_BYTE, ptr);glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_BYTE, 0);// driver will do some additional work to fully upload texture first time it is actually used!munmap(ptr, TEXTURE_SIZE);• File mapping does not copy your file data into RAM! Itdoes load file data page by page, when it’s accessed.• Thus we eliminated one redundant copy, dramaticallydecreased texture upload time and decreased memoryfragmentation
  • 34.
    What you mightnot know• Keep in my, that textures are finally wired onlywhen they are used first time. So draw themoff screen immediately after glTexImage2D,otherwise it will take too long to render thefirst frame and it will be nearly impossible totrack the cause of this.34
  • 35.
    What you mightnot know• NPOT textures works only with theGL_CLAMP_TO_EDGE wrap mode• POT are preferable, they gives you the bestperformance possible• Use NPOT textures with dimensions multiple to32 pixels for best performance• Driver will pad data of your NPOT texture tomatch the size of the closes POT values.
  • 36.
    What you mightnot know• Prefer OES_texture_half_float instead ofOES_texture_float• Texture reads fetch only 32 bits per texel, thus RGBA floattexture will result in 4 texture reads
  • 37.
    What you mightnot know• Always use glClear at the beginning of theframe…• … and EXT_discard_framebuffer at the end.• PVR GPU series have a fast on chipdepthstencil buffer for each tile. If you forgetto cleardiscard depth buffer, it will beuploaded from HW to SW
  • 38.
    What you mightknow• Prefer multi texturing instead of multiplepasses• Configure texture parameters before feedingimage data to driver
  • 39.
  • 40.
    What you mightknow• Be wise with precision hints• Avoid branching• Eliminate loops• Do not use discard. Place discard instruction asearly, as possible to avoid uselesscomputations
  • 41.
    What you mightnot know• Code inside of dynamic branch (condition isnon constant value) will be executed anywayand than it will be orphaned if condition isfalse
  • 42.
    What you mightnot know• highp – represents 32 bit floating point value• mediump – represents 16 bit floating pointvalue in range of [-65520, 65520]• lowp – 10 bit fixed point values in range of [-2,2] with step of 1/256• Try to give the same precision to all youoperands, because conversion takes some time
  • 43.
    What you mightnot know• highp values are calculated on a scalarprocessor only on USSE1 (thats PVR 5):highp vec4 v1, v2;highp float s1, s2;v2 = (v1 * s1) * s2;//scalar processor executes v1 * s1 – 4 operations, and than this result is multiplied by s2 on//a scalar processor again – 4 additional operationsv2 = v1 * (s1 * s2);//s1 * s2 – 1 operation on a scalar processor; result * v1 – 4 operations on a scalar processor
  • 44.
  • 45.
    What you mightknow• Typical CPU found in mobile devices:1. ARMv7ARMv8 architecture2. Cortex AXKraitSwift or Cyclone3. Up to 2300 MHz4. Up to 8 cores5. Thumb-2 instructions set
  • 46.
    What you mightnot know• ARMv7 has no hardware support for integerdivision• VFPv3, VFPv4 FPU• NEON SIMD engine• Unaligned access is done in software on CortexA8. That means it is hundred times slower• Cortex A8 is in-order CPU. Cortex A9+ are outof order
  • 47.
    What you mightnot know• Cortex A9+ core has full VFPv3 FPU, whileCortex A8 has a VFPLite. That means, that floatoperations take 1 cycle on A9 and 10 cycles onA8!
  • 48.
    What you mightnot know• NEON – 16 registers, 128 bit wide each.Supports operations on 8, 16, 32 and 64 bitsintegers and 32 bits float values• NEON can be used for:– Software geometry instancing;– Skinning;– As a general vertex processor;– Other, typical, applications for SIMD.
  • 49.
    What you mightnot know• There are 3 ways to use NEON engine in yourcode:1. Intrinsics1.1 GLKMath2. Handwritten NEON assembly3. Autovectorization. Add –mllvm –vectorize –mllvm –bb-vectorize-aligned-only to Other CC++Flags in project settings and you are ready to go.
  • 51.
    What you mightnot know• Intrinsics:
  • 52.
    What you mightnot know• Assembly:
  • 53.
    What you mightnot know• Summary:Running time, ms CPU usage, %Intrinsics 2764 19Assembly 3664 20FPU 6209 25-28FPU autovectorized 5028 22-24• Intrinsics got me 25% speedup over assembly.• Note that speed of code generated fromintrinsics will vary from compiler to compiler.Modern compilers are really good in this.
  • 54.
    What you mightnot know• Intrinsics advantages over assembly:– Higher level code;– Much simpler;– No need to manage registers;– You can vectorize basic blocks and buildsolution for every new problem with thisblocks. In contrast to assembly – you have tosolve each new problem from scratch;
  • 55.
    What you mightnot know• Assembly advantages over intrinsics:– Code generated from intrinsics vary fromcompiler to compiler and can give you reallybig difference in speed. Assembly code willalways be the same.
  • 56.
    What you mightnot know__attribute__((always_inline)) void Matrix4ByVec4(constfloat32x4x4_t* __restrict__ mat, const float32x4_t* __restrict__vec, float32x4_t* __restrict__ result){(*result) = vmulq_n_f32((*mat).val[0], (*vec)[0]);(*result) = vmlaq_n_f32((*result), (*mat).val[1], (*vec)[1]);(*result) = vmlaq_n_f32((*result), (*mat).val[2], (*vec)[2]);(*result) = vmlaq_n_f32((*result), (*mat).val[3], (*vec)[3]);}
  • 57.
    What you mightnot know__attribute__((always_inline)) void Matrix4ByMatrix4(const float32x4x4_t* __restrict__ m1, const float32x4x4_t* __restrict__ m2,float32x4x4_t* __restrict__ r){#ifdef INTRINSICS(*r).val[0] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[0], 0));(*r).val[1] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[1], 0));(*r).val[2] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[2], 0));(*r).val[3] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[3], 0));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[1], vgetq_lane_f32((*m2).val[0], 1));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[1], vgetq_lane_f32((*m2).val[1], 1));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[1], vgetq_lane_f32((*m2).val[2], 1));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[1], vgetq_lane_f32((*m2).val[3], 1));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[2], vgetq_lane_f32((*m2).val[0], 2));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[2], vgetq_lane_f32((*m2).val[1], 2));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[2], vgetq_lane_f32((*m2).val[2], 2));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[2], vgetq_lane_f32((*m2).val[3], 2));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[3], vgetq_lane_f32((*m2).val[0], 3));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[3], vgetq_lane_f32((*m2).val[1], 3));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[3], vgetq_lane_f32((*m2).val[2], 3));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[3], vgetq_lane_f32((*m2).val[3], 3));}
  • 58.
    What you mightnot know__asm__ volatile("vldmia %6, { q0-q3 } nt""vldmia %0, { q8-q11 }nt""vmul.f32 q12, q8, d0[0]nt""vmul.f32 q13, q8, d2[0]nt""vmul.f32 q14, q8, d4[0]nt""vmul.f32 q15, q8, d6[0]nt""vmla.f32 q12, q9, d0[1]nt""vmla.f32 q13, q9, d2[1]nt""vmla.f32 q14, q9, d4[1]nt""vmla.f32 q15, q9, d6[1]nt""vmla.f32 q12, q10, d1[0]nt""vmla.f32 q13, q10, d3[0]nt""vmla.f32 q14, q10, d5[0]nt""vmla.f32 q15, q10, d7[0]nt""vmla.f32 q12, q11, d1[1]nt""vmla.f32 q13, q11, d3[1]nt""vmla.f32 q14, q11, d5[1]nt""vmla.f32 q15, q11, d7[1]nt""vldmia %1, { q0-q3 } nt""vmul.f32 q8, q12, d0[0]nt""vmul.f32 q9, q12, d2[0]nt""vmul.f32 q10, q12, d4[0]nt""vmul.f32 q11, q12, d6[0]nt""vmla.f32 q8, q13, d0[1]nt""vmla.f32 q8, q14, d1[0]nt""vmla.f32 q8, q15, d1[1]nt""vmla.f32 q9, q13, d2[1]nt""vmla.f32 q9, q14, d3[0]nt""vmla.f32 q9, q15, d3[1]nt""vmla.f32 q10, q13, d4[1]nt""vmla.f32 q10, q14, d5[0]nt""vmla.f32 q10, q15, d5[1]nt""vmla.f32 q11, q13, d6[1]nt""vmla.f32 q11, q14, d7[0]nt""vmla.f32 q11, q15, d7[1]nt""vstmia %2, { q8 }nt""vstmia %3, { q9 }nt""vstmia %4, { q10 }nt""vstmia %5, { q11 }":: "r" (proj), "r" (squareVertices), "r" (v1), "r" (v2), "r" (v3), "r" (v4), "r" (modelView): "memory", "q0", "q1", "q2", "q3", "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15");
  • 59.
    What you mightnot know• For detailed explanation onintrinsicsassembly see:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491e/CIHJBEFE.html

Editor's Notes

  • #2 In this presentation I am going to talk mostly about Imagination Technologies GPUs. This is at least 50% of the market. All test I did on iOS, but I assume, you’ll get the same behaviour on Android.This presentation will consist from few parts, each dedicated to optimisation problems in one area.
  • #7 I’ll start from the most common recommendations.

[8]ページ先頭

©2009-2025 Movatter.jp