Movatterモバイル変換


[0]ホーム

URL:


Uploaded bySt1X
PPTX, PDF9,193 views

Optimizing Games for Mobiles

The document discusses optimizing mobile game performance through the understanding of various mobile GPU architectures, including Immediate Mode Rendering (IMR), Tile-Based Rendering (TBR), and Tile-Based Deferred Rendering (TBDR). It offers technical recommendations for rendering techniques, geometry processing, and texture management to enhance efficiency and reduce bandwidth usage. The document further elaborates on best practices for CPU and GPU operations, emphasizing memory management and precision in shader programming.

Embed presentation

Downloaded 162 times
Optimising games for mobilesby Dmytro Vovk
Mobile GPUs architecture• There are 3 major mobile GPU architectureson a market:• IMR (Immediate Mode Renderer)• TBR (Tile Based Renderer)• TBDR (Tile Based Deferred Renderer)2
IMR• Renders anything sent to the GPUimmediately. It makes no assumption aboutwhat is going to be submitted next.• Application has to sort opaque geometry frontto back.• It’s basically a brute force.• Nvidia, AMD.3
TBR• Improves on IMR, but still is an IMR.• Bandwidth is a precious resource on mobilesand TBR tries to reduce data transfers as muchas possible.• Your geometry is split in to tiles and then it isprocessed per tile. Tiles have small amount ofmemory for colour, depthstencil buffers, sothey have no need to do transfers fromtosystem memory.• Qualcomm Adreno, ARM Mali 4
TBDR• It is deferred i.e. all the graphics is drawnsomewhere later.• And this is where all the magic happens!• The GPU is aware of context - it know’s what isgoing to be drawn in future and this allows itto employ some awesome optimisations,reduce power consumption, bandwidth and afillrate.• Imagination PowerVR.5
GENERAL RECOMMENDATIONS
What you might know• Batch, Batch, Batch!http://ce.u-sys.org/Veranstaltungen/Interaktive%20Computergraphik%20(Stamminger)/papers/BatchBatchBatch.pdf• Render from one thread only• Avoid synchronisations:1. glFlush/glFinish;2. Querying GL states;3. Accessing render targets;
VERTEX DATA RECOMMENDATIONS
What you might know• Pixel perfect HSR (Hidden Surface Removal),Adreno and ARM does not feature this.• But still needs to sort transparent geometry!• Avoid doing alpha test. Use alpha blendinstead
What you might not know• HSR still requires vertices to be processed!• …thus don’t forget to cull your geometry onCPU!• Prefer Stencil Test before Scissor.– Stencil test is performed in hardware on PowerVRGPUs.– Stencil mask is stored in fast on-chip memory– Stencil can be of any form in contrast to therectangular Scissor
What you might not know• Why no alpha test?!o Alpha testdiscard requires fragment shader to run, before visibility forcurrent fragment can be determined. This will remove benefits of HSRo Even more! If shader code contains discard, than any geometry renderedwith this shader will suffer from alpha test drawbacks. Even if this key-wordis under condition, USSE (PVR’s shader engine) does assumes, that thiscondition may be hit.o Move discard into separate shadero Draw opaque geometry, than alpha tested one and alpha blended in the end
What you might know• Bandwidth matters1. Use constant colour per object, instead of pervertex2. Simplify your models. Use smaller data types.3. Use indexed triangles or non-indexed trianglestrips4. Use VBO instead of client arrays5. Use VAO
What you might not know• VBOs allocations are aligned by 4KB page size.That means, your small buffer for just acouple of triangles will occupy 4KB inmemory, - large amount of small VBOs candefragment and waste you memory.
What you might not know• Updating your VBO data each frame:1. glBufferSubData. If it is used to update big part of theoriginal data it will harm performance. Try to avoidupdates to buffers, that are in use now2. glBufferData. It’s OK to completely overwrite originaldata. Old data will be orphaned by driver and a newdata storage will be allocated3. glMapBuffer with triple buffered VBO is preferred wayto update your data• EXT_map_buffer_range (iOS 6+ only), when you need toupdate only a subset of a buffer object.
What you might not knowint bufferID = 0; //initializationfor (int i = 0; i < 3; ++i) // allocate data for 3 vbo only, do not upload it{glBindBuffer(vertexBuffer[i]);glBufferData(GL_ARRAY_BUFFER, 0, 0, GL_DYNAMIC_DRAW);}//...glBindBuffer(GL_ARRAY_BUFFER, vertexBuffer[bufferID]);void* ptr = glMapBufferOES(GL_ARRAY_BUFFER, GL_WRITE_ONLY_OES);//update data hereglUnmapBufferOES(GL_ARRAY_BUFFER);++bufferID;if (bufferID == 3) //cycling through 3 buffers{bufferID = 0;}
What you might not know• This scheme will give you the best performancepossible – without blocking CPU or GPU, noredundant memcpy operations, lower CPU load, butextra memory is used (note, that you will need noextra temporal buffer to store your data beforesending it to VBO). This is ideal for dynamicbatching of sprites.update(1), draw(1), gpuworking(..............)update(2), draw(2), gpuworking(..............)update(3), draw(3), gpuworking(..............)
What you might not know• Float type is native to GPU• …that means any other type will be convertedto float by USSE• …resulting in few additional cycles• Thus it’s your choice of tradeoff betweenbandwidthstorage and additional cycles
What you might know• Use interleaved vertex data– Align each vertex attribute by 4 bytes boundaries
What you might not know• If you don’t align your data, driver will do thisinstead.• …resulting in slower performance.
What you might not know• PowerVR SGX 5XT GPU series have a vertexcache for last 12 vertex indices. Optimise yourindexed geometry for this cache size.• PowerVR Series 6 (XT) has 16k of vertex cache• Take a look at optimisers, that use TomForsyth’s algorithmhttp://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html
What you might know• Split your vertex data into two parts:1. Static VBO - the one, that never will be changed2. Dynamic VBO – the one, that needs to beupdated frequently• Split your vertex data into few VBOs, when fewmeshes share the same set of attributes
TEXTURE DATARECOMMENDATIONS
What you might know• Bandwidth matters1. Use lower precision formats - RGBA4444,RGBA55512. Use PVRTC compressed textures3. Use atlases4. Use mipmaps. They improve texture cacheefficiency and quality.
What you might not know• Avoid RGB8 format - texture data has to bealigned, so driver will pad RGB8 to RGBA8.• Try to replace it with RGB56524
What you might not know• Why PVRTC?1. PVRTC provides great compression, resulting insmaller texture size, improved cache, savedbandwidth and decreased power consumption2. PVRTC stores pixel data in GPU’s native order i.eBGRA, instead of RGBA, in blocks optimised fordata access pattern.
What you might not know• It doesn’t matter whether your textures are inRGBA or BGRA format - the driver will still dointernal processing on a texture data toimprove memory access locality and cacheefficiency.26
What you might not know• On PVR 6 (XT) driver will reserve memory for bothtexture and mip maps chain, but it will commitmemory only for mip level 0.• If you’ll decide to generate mip maps driver willcommit pages reserved for mip chain.• That’s expectable.
What you might not know• On PVR 55MP (tested on iOS 4 – 7.1.1 versions)driver will ALWAYS commit memory for mip maps,regardless, whether you requested to create them, ornot.• That means you’ll waste 33% of memory!• In most cases you don’t need mip maps for 2Dgames, but you are forced to pay this overhead.• That’s too bad for 2D games. However there is oneworkaround – make your textures NPOT (non-powerof two).28
What you might not know• Luckily, there is one solution to this problem.• Core OpenGL ES 2.0 doesn’t support mip mapsfor NPoT (non power of two) textures, so ifyou’ll make your textures to be NPoT, you willnot pay this memory overhead.29
What you might not know• Interesting notes:• glTexImage2D driver implementation has afunction CheckFastPath. When you uploadPoT texture you’ll hit this fast path. NPoTtextures omit it.• When you upload a lot of textures youVRAM gets defragmented, so driver willremap memory - i.e. it will create one bigbuffer for few small textures and will movethem to that buffer 30
What you might not know• Let’s take a look on a texture upload process.• Usual way to do this:1. Load texture to temporal buffer in RAM1. Encode texture if it is stored in compressed file format– JPGPNG2. Feed this buffer to glTexImage2D3. Draw!• Looks simple, but is it the fastest way?
What you might not know• …NO!void* buf = malloc(TEXTURE_SIZE); //4mb for RGBA8 1024x1024 textureLoadTexture(textureName);glBindTexture(GL_TEXTURE_2D, textureID);glTexImage2D(GL_TEXTURE_2D, 0, 4, 1024, 1024, 0, GL_RGBA, GL_UNSIGNED_BYTE, buf);// buf is copied into internal buffer, created by driver (that's obvious)free(buf); // because buffer can be freed immediately after glTexImage2DglDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_BYTE, 0);// driver will do some additional work to fully upload texture first time it is actually used!• A lot of redundant work!
What you might not know• Jedi way to upload textures:int fileHandle = open(filename, O_RDONLY);void* ptr = mmap(NULL, TEXTURE_SIZE, PROT_READ, MAP_PRIVATE, fileHandle, 0); //file mappingglBindTexture(GL_TEXTURE_2D, textureID);glTexImage2D(GL_TEXTURE_2D, 0, 4, 1024, 1024, 0, GL_RGBA, GL_UNSIGNED_BYTE, ptr);glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_BYTE, 0);// driver will do some additional work to fully upload texture first time it is actually used!munmap(ptr, TEXTURE_SIZE);• File mapping does not copy your file data into RAM! Itdoes load file data page by page, when it’s accessed.• Thus we eliminated one redundant copy, dramaticallydecreased texture upload time and decreased memoryfragmentation
What you might not know• Keep in my, that textures are finally wired onlywhen they are used first time. So draw themoff screen immediately after glTexImage2D,otherwise it will take too long to render thefirst frame and it will be nearly impossible totrack the cause of this.34
What you might not know• NPOT textures works only with theGL_CLAMP_TO_EDGE wrap mode• POT are preferable, they gives you the bestperformance possible• Use NPOT textures with dimensions multiple to32 pixels for best performance• Driver will pad data of your NPOT texture tomatch the size of the closes POT values.
What you might not know• Prefer OES_texture_half_float instead ofOES_texture_float• Texture reads fetch only 32 bits per texel, thus RGBA floattexture will result in 4 texture reads
What you might not know• Always use glClear at the beginning of theframe…• … and EXT_discard_framebuffer at the end.• PVR GPU series have a fast on chipdepthstencil buffer for each tile. If you forgetto cleardiscard depth buffer, it will beuploaded from HW to SW
What you might know• Prefer multi texturing instead of multiplepasses• Configure texture parameters before feedingimage data to driver
SHADERS BEST PRACTICES
What you might know• Be wise with precision hints• Avoid branching• Eliminate loops• Do not use discard. Place discard instruction asearly, as possible to avoid uselesscomputations
What you might not know• Code inside of dynamic branch (condition isnon constant value) will be executed anywayand than it will be orphaned if condition isfalse
What you might not know• highp – represents 32 bit floating point value• mediump – represents 16 bit floating pointvalue in range of [-65520, 65520]• lowp – 10 bit fixed point values in range of [-2,2] with step of 1/256• Try to give the same precision to all youoperands, because conversion takes some time
What you might not know• highp values are calculated on a scalarprocessor only on USSE1 (thats PVR 5):highp vec4 v1, v2;highp float s1, s2;v2 = (v1 * s1) * s2;//scalar processor executes v1 * s1 – 4 operations, and than this result is multiplied by s2 on//a scalar processor again – 4 additional operationsv2 = v1 * (s1 * s2);//s1 * s2 – 1 operation on a scalar processor; result * v1 – 4 operations on a scalar processor
HARDWARE FEATURES
What you might know• Typical CPU found in mobile devices:1. ARMv7ARMv8 architecture2. Cortex AXKraitSwift or Cyclone3. Up to 2300 MHz4. Up to 8 cores5. Thumb-2 instructions set
What you might not know• ARMv7 has no hardware support for integerdivision• VFPv3, VFPv4 FPU• NEON SIMD engine• Unaligned access is done in software on CortexA8. That means it is hundred times slower• Cortex A8 is in-order CPU. Cortex A9+ are outof order
What you might not know• Cortex A9+ core has full VFPv3 FPU, whileCortex A8 has a VFPLite. That means, that floatoperations take 1 cycle on A9 and 10 cycles onA8!
What you might not know• NEON – 16 registers, 128 bit wide each.Supports operations on 8, 16, 32 and 64 bitsintegers and 32 bits float values• NEON can be used for:– Software geometry instancing;– Skinning;– As a general vertex processor;– Other, typical, applications for SIMD.
What you might not know• There are 3 ways to use NEON engine in yourcode:1. Intrinsics1.1 GLKMath2. Handwritten NEON assembly3. Autovectorization. Add –mllvm –vectorize –mllvm –bb-vectorize-aligned-only to Other CC++Flags in project settings and you are ready to go.
What you might not know• Intrinsics:
What you might not know• Assembly:
What you might not know• Summary:Running time, ms CPU usage, %Intrinsics 2764 19Assembly 3664 20FPU 6209 25-28FPU autovectorized 5028 22-24• Intrinsics got me 25% speedup over assembly.• Note that speed of code generated fromintrinsics will vary from compiler to compiler.Modern compilers are really good in this.
What you might not know• Intrinsics advantages over assembly:– Higher level code;– Much simpler;– No need to manage registers;– You can vectorize basic blocks and buildsolution for every new problem with thisblocks. In contrast to assembly – you have tosolve each new problem from scratch;
What you might not know• Assembly advantages over intrinsics:– Code generated from intrinsics vary fromcompiler to compiler and can give you reallybig difference in speed. Assembly code willalways be the same.
What you might not know__attribute__((always_inline)) void Matrix4ByVec4(constfloat32x4x4_t* __restrict__ mat, const float32x4_t* __restrict__vec, float32x4_t* __restrict__ result){(*result) = vmulq_n_f32((*mat).val[0], (*vec)[0]);(*result) = vmlaq_n_f32((*result), (*mat).val[1], (*vec)[1]);(*result) = vmlaq_n_f32((*result), (*mat).val[2], (*vec)[2]);(*result) = vmlaq_n_f32((*result), (*mat).val[3], (*vec)[3]);}
What you might not know__attribute__((always_inline)) void Matrix4ByMatrix4(const float32x4x4_t* __restrict__ m1, const float32x4x4_t* __restrict__ m2,float32x4x4_t* __restrict__ r){#ifdef INTRINSICS(*r).val[0] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[0], 0));(*r).val[1] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[1], 0));(*r).val[2] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[2], 0));(*r).val[3] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[3], 0));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[1], vgetq_lane_f32((*m2).val[0], 1));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[1], vgetq_lane_f32((*m2).val[1], 1));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[1], vgetq_lane_f32((*m2).val[2], 1));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[1], vgetq_lane_f32((*m2).val[3], 1));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[2], vgetq_lane_f32((*m2).val[0], 2));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[2], vgetq_lane_f32((*m2).val[1], 2));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[2], vgetq_lane_f32((*m2).val[2], 2));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[2], vgetq_lane_f32((*m2).val[3], 2));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[3], vgetq_lane_f32((*m2).val[0], 3));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[3], vgetq_lane_f32((*m2).val[1], 3));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[3], vgetq_lane_f32((*m2).val[2], 3));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[3], vgetq_lane_f32((*m2).val[3], 3));}
What you might not know__asm__ volatile("vldmia %6, { q0-q3 } nt""vldmia %0, { q8-q11 }nt""vmul.f32 q12, q8, d0[0]nt""vmul.f32 q13, q8, d2[0]nt""vmul.f32 q14, q8, d4[0]nt""vmul.f32 q15, q8, d6[0]nt""vmla.f32 q12, q9, d0[1]nt""vmla.f32 q13, q9, d2[1]nt""vmla.f32 q14, q9, d4[1]nt""vmla.f32 q15, q9, d6[1]nt""vmla.f32 q12, q10, d1[0]nt""vmla.f32 q13, q10, d3[0]nt""vmla.f32 q14, q10, d5[0]nt""vmla.f32 q15, q10, d7[0]nt""vmla.f32 q12, q11, d1[1]nt""vmla.f32 q13, q11, d3[1]nt""vmla.f32 q14, q11, d5[1]nt""vmla.f32 q15, q11, d7[1]nt""vldmia %1, { q0-q3 } nt""vmul.f32 q8, q12, d0[0]nt""vmul.f32 q9, q12, d2[0]nt""vmul.f32 q10, q12, d4[0]nt""vmul.f32 q11, q12, d6[0]nt""vmla.f32 q8, q13, d0[1]nt""vmla.f32 q8, q14, d1[0]nt""vmla.f32 q8, q15, d1[1]nt""vmla.f32 q9, q13, d2[1]nt""vmla.f32 q9, q14, d3[0]nt""vmla.f32 q9, q15, d3[1]nt""vmla.f32 q10, q13, d4[1]nt""vmla.f32 q10, q14, d5[0]nt""vmla.f32 q10, q15, d5[1]nt""vmla.f32 q11, q13, d6[1]nt""vmla.f32 q11, q14, d7[0]nt""vmla.f32 q11, q15, d7[1]nt""vstmia %2, { q8 }nt""vstmia %3, { q9 }nt""vstmia %4, { q10 }nt""vstmia %5, { q11 }":: "r" (proj), "r" (squareVertices), "r" (v1), "r" (v2), "r" (v3), "r" (v4), "r" (modelView): "memory", "q0", "q1", "q2", "q3", "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15");
What you might not know• For detailed explanation onintrinsicsassembly see:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491e/CIHJBEFE.html

Recommended

PPTX
Optimizing unity games (Google IO 2014)
PPTX
Unity - Internals: memory and performance
PPTX
[UniteKorea2013] Memory profiling in Unity
PPTX
Practical guide to optimization in Unity
PDF
Optimization in Unity: simple tips for developing with "no surprises" / Anton...
PDF
Unity Internals: Memory and Performance
PDF
Optimizing Large Scenes in Unity
PDF
Optimizing Unity games for mobile devices
PDF
Unite 2013 optimizing unity games for mobile platforms
PDF
Kato Mivule: An Overview of CUDA for High Performance Computing
PDF
OpenGL 4.4 - Scene Rendering Techniques
PDF
Cuda tutorial
PDF
Introduction to CUDA
PPTX
Parallel Futures of a Game Engine
 
PPTX
Future Directions for Compute-for-Graphics
PPTX
Intro to GPGPU with CUDA (DevLink)
PPTX
Scene Graphs & Component Based Game Engines
PDF
Masked Software Occlusion Culling
PPT
Your Game Needs Direct3D 11, So Get Started Now!
 
PDF
Introduction to CUDA C: NVIDIA : Notes
PPTX
Built for performance: the UIElements Renderer – Unite Copenhagen 2019
PPT
Introduction to parallel computing using CUDA
PPT
BitSquid Tech: Benefits of a data-driven renderer
PPT
Vpu technology &gpgpu computing
PDF
NVidia CUDA Tutorial - June 15, 2009
PPTX
Cross-scene references: A shock to the system - Unite Copenhagen 2019
PPTX
Battery Optimization for Android Apps - Devoxx14
PDF
How to build rock solid apps & keep 100m+ users happy
PPTX
Volvio el oso arturo a showmatch
PPTX
Anti malaria month june 2013

More Related Content

PPTX
Optimizing unity games (Google IO 2014)
PPTX
Unity - Internals: memory and performance
PPTX
[UniteKorea2013] Memory profiling in Unity
PPTX
Practical guide to optimization in Unity
PDF
Optimization in Unity: simple tips for developing with "no surprises" / Anton...
PDF
Unity Internals: Memory and Performance
PDF
Optimizing Large Scenes in Unity
PDF
Optimizing Unity games for mobile devices
Optimizing unity games (Google IO 2014)
Unity - Internals: memory and performance
[UniteKorea2013] Memory profiling in Unity
Practical guide to optimization in Unity
Optimization in Unity: simple tips for developing with "no surprises" / Anton...
Unity Internals: Memory and Performance
Optimizing Large Scenes in Unity
Optimizing Unity games for mobile devices

What's hot

PDF
Unite 2013 optimizing unity games for mobile platforms
PDF
Kato Mivule: An Overview of CUDA for High Performance Computing
PDF
OpenGL 4.4 - Scene Rendering Techniques
PDF
Cuda tutorial
PDF
Introduction to CUDA
PPTX
Parallel Futures of a Game Engine
 
PPTX
Future Directions for Compute-for-Graphics
PPTX
Intro to GPGPU with CUDA (DevLink)
PPTX
Scene Graphs & Component Based Game Engines
PDF
Masked Software Occlusion Culling
PPT
Your Game Needs Direct3D 11, So Get Started Now!
 
PDF
Introduction to CUDA C: NVIDIA : Notes
PPTX
Built for performance: the UIElements Renderer – Unite Copenhagen 2019
PPT
Introduction to parallel computing using CUDA
PPT
BitSquid Tech: Benefits of a data-driven renderer
PPT
Vpu technology &gpgpu computing
PDF
NVidia CUDA Tutorial - June 15, 2009
PPTX
Cross-scene references: A shock to the system - Unite Copenhagen 2019
Unite 2013 optimizing unity games for mobile platforms
Kato Mivule: An Overview of CUDA for High Performance Computing
OpenGL 4.4 - Scene Rendering Techniques
Cuda tutorial
Introduction to CUDA
Parallel Futures of a Game Engine
 
Future Directions for Compute-for-Graphics
Intro to GPGPU with CUDA (DevLink)
Scene Graphs & Component Based Game Engines
Masked Software Occlusion Culling
Your Game Needs Direct3D 11, So Get Started Now!
 
Introduction to CUDA C: NVIDIA : Notes
Built for performance: the UIElements Renderer – Unite Copenhagen 2019
Introduction to parallel computing using CUDA
BitSquid Tech: Benefits of a data-driven renderer
Vpu technology &gpgpu computing
NVidia CUDA Tutorial - June 15, 2009
Cross-scene references: A shock to the system - Unite Copenhagen 2019

Viewers also liked

PPTX
Battery Optimization for Android Apps - Devoxx14
PDF
How to build rock solid apps & keep 100m+ users happy
PPTX
Volvio el oso arturo a showmatch
PPTX
Anti malaria month june 2013
DOC
CV_PDhawad
PPTX
Writing in the right way for your website, by Expert Market
PPTX
March of dimes
 
PPTX
PPTX
Anti dengue month , July 2013
PDF
Raspberry Stake, Tree stake, Nursery Stake
PPTX
Adding more visuals without affecting performance
 
PDF
Modul07 a
PPTX
Think vis 2013
KEY
Win Over Your Toughest Audiences
PPTX
Changes to improve your health
PPT
Pip 2013-2014
PPTX
FDRS Competition Presentation:
PPTX
Evaluation
PDF
New Age, New Learners, New Skills
PPTX
Code vectorization for mobile devices
 
Battery Optimization for Android Apps - Devoxx14
How to build rock solid apps & keep 100m+ users happy
Volvio el oso arturo a showmatch
Anti malaria month june 2013
CV_PDhawad
Writing in the right way for your website, by Expert Market
March of dimes
 
Anti dengue month , July 2013
Raspberry Stake, Tree stake, Nursery Stake
Adding more visuals without affecting performance
 
Modul07 a
Think vis 2013
Win Over Your Toughest Audiences
Changes to improve your health
Pip 2013-2014
FDRS Competition Presentation:
Evaluation
New Age, New Learners, New Skills
Code vectorization for mobile devices
 

Similar to Optimizing Games for Mobiles

PDF
Дмитрий Вовк - Learn iOS Game Optimization. Ultimate Guide
PPTX
Beyond porting
PPTX
Approaching zero driver overhead
PDF
OpenGL ES and Mobile GPU
PPTX
Triangle Visibility buffer
KEY
openFrameworks 007 - GL
 
PPTX
GFX Part 7 - Introduction to Rendering Targets in OpenGL ES
PPT
Advanced Game Development with the Mobile 3D Graphics API
PDF
GeForce 8800 OpenGL Extensions
PDF
Droidcon2013 triangles gangolells_imagination
PPT
Advanced Mobile Optimizations.ppt
PDF
The Explanation the Pipeline design strategy.pdf
 
PDF
PowerVR performance recommendations
PPTX
Penn graphics
PPSX
Dx11 performancereloaded
PDF
Smedberg niklas bringing_aaa_graphics
PDF
GPU - how can we use it?
PDF
Hpg2011 papers kazakov
PPTX
Advanced Mobile Optimizations
PPTX
[TGDF 2020] Mobile Graphics Best Practices for Artist
Дмитрий Вовк - Learn iOS Game Optimization. Ultimate Guide
Beyond porting
Approaching zero driver overhead
OpenGL ES and Mobile GPU
Triangle Visibility buffer
openFrameworks 007 - GL
 
GFX Part 7 - Introduction to Rendering Targets in OpenGL ES
Advanced Game Development with the Mobile 3D Graphics API
GeForce 8800 OpenGL Extensions
Droidcon2013 triangles gangolells_imagination
Advanced Mobile Optimizations.ppt
The Explanation the Pipeline design strategy.pdf
 
PowerVR performance recommendations
Penn graphics
Dx11 performancereloaded
Smedberg niklas bringing_aaa_graphics
GPU - how can we use it?
Hpg2011 papers kazakov
Advanced Mobile Optimizations
[TGDF 2020] Mobile Graphics Best Practices for Artist

Recently uploaded

PDF
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
PDF
Top 10 AI Development Companies in UK 2025
PDF
[BDD 2025 - Full-Stack Development] The Modern Stack: Building Web & AI Appli...
PDF
Transforming Content Operations in the Age of AI
PDF
Open Source Post-Quantum Cryptography - Matt Caswell
PPTX
Guardrails in Action - Ensuring Safe AI with Azure AI Content Safety.pptx
PPTX
UFCD 0797 - SISTEMAS OPERATIVOS_Unidade Completa.pptx
PDF
ODSC AI West: Agent Optimization: Beyond Context engineering
PDF
DUBAI IT MODERNIZATION WITH AZURE MANAGED SERVICES.pdf
PDF
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
PDF
Integrating AI with Meaningful Human Collaboration
PDF
The Evolving Role of the CEO in the Age of AI
PDF
So You Want to Work at Google | DevFest Seattle 2025
PDF
[BDD 2025 - Full-Stack Development] Agentic AI Architecture: Redefining Syste...
PDF
Oracle MySQL HeatWave - One Page - Version 3
PDF
Mulesoft Meetup Online Portuguese: MCP e IA
PDF
[BDD 2025 - Artificial Intelligence] Building AI Systems That Users (and Comp...
PDF
10 Best Automation QA Testing Software Tools in 2025.pdf
PPTX
kernel PPT (Explanation of Windows Kernal).pptx
PDF
Oracle MySQL HeatWave - Short - Version 3
KMWorld - KM & AI Bring Collectivity, Nostalgia, & Selectivity
Top 10 AI Development Companies in UK 2025
[BDD 2025 - Full-Stack Development] The Modern Stack: Building Web & AI Appli...
Transforming Content Operations in the Age of AI
Open Source Post-Quantum Cryptography - Matt Caswell
Guardrails in Action - Ensuring Safe AI with Azure AI Content Safety.pptx
UFCD 0797 - SISTEMAS OPERATIVOS_Unidade Completa.pptx
ODSC AI West: Agent Optimization: Beyond Context engineering
DUBAI IT MODERNIZATION WITH AZURE MANAGED SERVICES.pdf
Mastering Agentic Orchestration with UiPath Maestro | Hands on Workshop
Integrating AI with Meaningful Human Collaboration
The Evolving Role of the CEO in the Age of AI
So You Want to Work at Google | DevFest Seattle 2025
[BDD 2025 - Full-Stack Development] Agentic AI Architecture: Redefining Syste...
Oracle MySQL HeatWave - One Page - Version 3
Mulesoft Meetup Online Portuguese: MCP e IA
[BDD 2025 - Artificial Intelligence] Building AI Systems That Users (and Comp...
10 Best Automation QA Testing Software Tools in 2025.pdf
kernel PPT (Explanation of Windows Kernal).pptx
Oracle MySQL HeatWave - Short - Version 3

Optimizing Games for Mobiles

  • 1.
    Optimising games formobilesby Dmytro Vovk
  • 2.
    Mobile GPUs architecture•There are 3 major mobile GPU architectureson a market:• IMR (Immediate Mode Renderer)• TBR (Tile Based Renderer)• TBDR (Tile Based Deferred Renderer)2
  • 3.
    IMR• Renders anythingsent to the GPUimmediately. It makes no assumption aboutwhat is going to be submitted next.• Application has to sort opaque geometry frontto back.• It’s basically a brute force.• Nvidia, AMD.3
  • 4.
    TBR• Improves onIMR, but still is an IMR.• Bandwidth is a precious resource on mobilesand TBR tries to reduce data transfers as muchas possible.• Your geometry is split in to tiles and then it isprocessed per tile. Tiles have small amount ofmemory for colour, depthstencil buffers, sothey have no need to do transfers fromtosystem memory.• Qualcomm Adreno, ARM Mali 4
  • 5.
    TBDR• It isdeferred i.e. all the graphics is drawnsomewhere later.• And this is where all the magic happens!• The GPU is aware of context - it know’s what isgoing to be drawn in future and this allows itto employ some awesome optimisations,reduce power consumption, bandwidth and afillrate.• Imagination PowerVR.5
  • 6.
  • 7.
    What you mightknow• Batch, Batch, Batch!http://ce.u-sys.org/Veranstaltungen/Interaktive%20Computergraphik%20(Stamminger)/papers/BatchBatchBatch.pdf• Render from one thread only• Avoid synchronisations:1. glFlush/glFinish;2. Querying GL states;3. Accessing render targets;
  • 8.
  • 9.
    What you mightknow• Pixel perfect HSR (Hidden Surface Removal),Adreno and ARM does not feature this.• But still needs to sort transparent geometry!• Avoid doing alpha test. Use alpha blendinstead
  • 10.
    What you mightnot know• HSR still requires vertices to be processed!• …thus don’t forget to cull your geometry onCPU!• Prefer Stencil Test before Scissor.– Stencil test is performed in hardware on PowerVRGPUs.– Stencil mask is stored in fast on-chip memory– Stencil can be of any form in contrast to therectangular Scissor
  • 11.
    What you mightnot know• Why no alpha test?!o Alpha testdiscard requires fragment shader to run, before visibility forcurrent fragment can be determined. This will remove benefits of HSRo Even more! If shader code contains discard, than any geometry renderedwith this shader will suffer from alpha test drawbacks. Even if this key-wordis under condition, USSE (PVR’s shader engine) does assumes, that thiscondition may be hit.o Move discard into separate shadero Draw opaque geometry, than alpha tested one and alpha blended in the end
  • 12.
    What you mightknow• Bandwidth matters1. Use constant colour per object, instead of pervertex2. Simplify your models. Use smaller data types.3. Use indexed triangles or non-indexed trianglestrips4. Use VBO instead of client arrays5. Use VAO
  • 13.
    What you mightnot know• VBOs allocations are aligned by 4KB page size.That means, your small buffer for just acouple of triangles will occupy 4KB inmemory, - large amount of small VBOs candefragment and waste you memory.
  • 14.
    What you mightnot know• Updating your VBO data each frame:1. glBufferSubData. If it is used to update big part of theoriginal data it will harm performance. Try to avoidupdates to buffers, that are in use now2. glBufferData. It’s OK to completely overwrite originaldata. Old data will be orphaned by driver and a newdata storage will be allocated3. glMapBuffer with triple buffered VBO is preferred wayto update your data• EXT_map_buffer_range (iOS 6+ only), when you need toupdate only a subset of a buffer object.
  • 15.
    What you mightnot knowint bufferID = 0; //initializationfor (int i = 0; i < 3; ++i) // allocate data for 3 vbo only, do not upload it{glBindBuffer(vertexBuffer[i]);glBufferData(GL_ARRAY_BUFFER, 0, 0, GL_DYNAMIC_DRAW);}//...glBindBuffer(GL_ARRAY_BUFFER, vertexBuffer[bufferID]);void* ptr = glMapBufferOES(GL_ARRAY_BUFFER, GL_WRITE_ONLY_OES);//update data hereglUnmapBufferOES(GL_ARRAY_BUFFER);++bufferID;if (bufferID == 3) //cycling through 3 buffers{bufferID = 0;}
  • 16.
    What you mightnot know• This scheme will give you the best performancepossible – without blocking CPU or GPU, noredundant memcpy operations, lower CPU load, butextra memory is used (note, that you will need noextra temporal buffer to store your data beforesending it to VBO). This is ideal for dynamicbatching of sprites.update(1), draw(1), gpuworking(..............)update(2), draw(2), gpuworking(..............)update(3), draw(3), gpuworking(..............)
  • 17.
    What you mightnot know• Float type is native to GPU• …that means any other type will be convertedto float by USSE• …resulting in few additional cycles• Thus it’s your choice of tradeoff betweenbandwidthstorage and additional cycles
  • 18.
    What you mightknow• Use interleaved vertex data– Align each vertex attribute by 4 bytes boundaries
  • 19.
    What you mightnot know• If you don’t align your data, driver will do thisinstead.• …resulting in slower performance.
  • 20.
    What you mightnot know• PowerVR SGX 5XT GPU series have a vertexcache for last 12 vertex indices. Optimise yourindexed geometry for this cache size.• PowerVR Series 6 (XT) has 16k of vertex cache• Take a look at optimisers, that use TomForsyth’s algorithmhttp://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html
  • 21.
    What you mightknow• Split your vertex data into two parts:1. Static VBO - the one, that never will be changed2. Dynamic VBO – the one, that needs to beupdated frequently• Split your vertex data into few VBOs, when fewmeshes share the same set of attributes
  • 22.
  • 23.
    What you mightknow• Bandwidth matters1. Use lower precision formats - RGBA4444,RGBA55512. Use PVRTC compressed textures3. Use atlases4. Use mipmaps. They improve texture cacheefficiency and quality.
  • 24.
    What you mightnot know• Avoid RGB8 format - texture data has to bealigned, so driver will pad RGB8 to RGBA8.• Try to replace it with RGB56524
  • 25.
    What you mightnot know• Why PVRTC?1. PVRTC provides great compression, resulting insmaller texture size, improved cache, savedbandwidth and decreased power consumption2. PVRTC stores pixel data in GPU’s native order i.eBGRA, instead of RGBA, in blocks optimised fordata access pattern.
  • 26.
    What you mightnot know• It doesn’t matter whether your textures are inRGBA or BGRA format - the driver will still dointernal processing on a texture data toimprove memory access locality and cacheefficiency.26
  • 27.
    What you mightnot know• On PVR 6 (XT) driver will reserve memory for bothtexture and mip maps chain, but it will commitmemory only for mip level 0.• If you’ll decide to generate mip maps driver willcommit pages reserved for mip chain.• That’s expectable.
  • 28.
    What you mightnot know• On PVR 55MP (tested on iOS 4 – 7.1.1 versions)driver will ALWAYS commit memory for mip maps,regardless, whether you requested to create them, ornot.• That means you’ll waste 33% of memory!• In most cases you don’t need mip maps for 2Dgames, but you are forced to pay this overhead.• That’s too bad for 2D games. However there is oneworkaround – make your textures NPOT (non-powerof two).28
  • 29.
    What you mightnot know• Luckily, there is one solution to this problem.• Core OpenGL ES 2.0 doesn’t support mip mapsfor NPoT (non power of two) textures, so ifyou’ll make your textures to be NPoT, you willnot pay this memory overhead.29
  • 30.
    What you mightnot know• Interesting notes:• glTexImage2D driver implementation has afunction CheckFastPath. When you uploadPoT texture you’ll hit this fast path. NPoTtextures omit it.• When you upload a lot of textures youVRAM gets defragmented, so driver willremap memory - i.e. it will create one bigbuffer for few small textures and will movethem to that buffer 30
  • 31.
    What you mightnot know• Let’s take a look on a texture upload process.• Usual way to do this:1. Load texture to temporal buffer in RAM1. Encode texture if it is stored in compressed file format– JPGPNG2. Feed this buffer to glTexImage2D3. Draw!• Looks simple, but is it the fastest way?
  • 32.
    What you mightnot know• …NO!void* buf = malloc(TEXTURE_SIZE); //4mb for RGBA8 1024x1024 textureLoadTexture(textureName);glBindTexture(GL_TEXTURE_2D, textureID);glTexImage2D(GL_TEXTURE_2D, 0, 4, 1024, 1024, 0, GL_RGBA, GL_UNSIGNED_BYTE, buf);// buf is copied into internal buffer, created by driver (that's obvious)free(buf); // because buffer can be freed immediately after glTexImage2DglDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_BYTE, 0);// driver will do some additional work to fully upload texture first time it is actually used!• A lot of redundant work!
  • 33.
    What you mightnot know• Jedi way to upload textures:int fileHandle = open(filename, O_RDONLY);void* ptr = mmap(NULL, TEXTURE_SIZE, PROT_READ, MAP_PRIVATE, fileHandle, 0); //file mappingglBindTexture(GL_TEXTURE_2D, textureID);glTexImage2D(GL_TEXTURE_2D, 0, 4, 1024, 1024, 0, GL_RGBA, GL_UNSIGNED_BYTE, ptr);glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_BYTE, 0);// driver will do some additional work to fully upload texture first time it is actually used!munmap(ptr, TEXTURE_SIZE);• File mapping does not copy your file data into RAM! Itdoes load file data page by page, when it’s accessed.• Thus we eliminated one redundant copy, dramaticallydecreased texture upload time and decreased memoryfragmentation
  • 34.
    What you mightnot know• Keep in my, that textures are finally wired onlywhen they are used first time. So draw themoff screen immediately after glTexImage2D,otherwise it will take too long to render thefirst frame and it will be nearly impossible totrack the cause of this.34
  • 35.
    What you mightnot know• NPOT textures works only with theGL_CLAMP_TO_EDGE wrap mode• POT are preferable, they gives you the bestperformance possible• Use NPOT textures with dimensions multiple to32 pixels for best performance• Driver will pad data of your NPOT texture tomatch the size of the closes POT values.
  • 36.
    What you mightnot know• Prefer OES_texture_half_float instead ofOES_texture_float• Texture reads fetch only 32 bits per texel, thus RGBA floattexture will result in 4 texture reads
  • 37.
    What you mightnot know• Always use glClear at the beginning of theframe…• … and EXT_discard_framebuffer at the end.• PVR GPU series have a fast on chipdepthstencil buffer for each tile. If you forgetto cleardiscard depth buffer, it will beuploaded from HW to SW
  • 38.
    What you mightknow• Prefer multi texturing instead of multiplepasses• Configure texture parameters before feedingimage data to driver
  • 39.
  • 40.
    What you mightknow• Be wise with precision hints• Avoid branching• Eliminate loops• Do not use discard. Place discard instruction asearly, as possible to avoid uselesscomputations
  • 41.
    What you mightnot know• Code inside of dynamic branch (condition isnon constant value) will be executed anywayand than it will be orphaned if condition isfalse
  • 42.
    What you mightnot know• highp – represents 32 bit floating point value• mediump – represents 16 bit floating pointvalue in range of [-65520, 65520]• lowp – 10 bit fixed point values in range of [-2,2] with step of 1/256• Try to give the same precision to all youoperands, because conversion takes some time
  • 43.
    What you mightnot know• highp values are calculated on a scalarprocessor only on USSE1 (thats PVR 5):highp vec4 v1, v2;highp float s1, s2;v2 = (v1 * s1) * s2;//scalar processor executes v1 * s1 – 4 operations, and than this result is multiplied by s2 on//a scalar processor again – 4 additional operationsv2 = v1 * (s1 * s2);//s1 * s2 – 1 operation on a scalar processor; result * v1 – 4 operations on a scalar processor
  • 44.
  • 45.
    What you mightknow• Typical CPU found in mobile devices:1. ARMv7ARMv8 architecture2. Cortex AXKraitSwift or Cyclone3. Up to 2300 MHz4. Up to 8 cores5. Thumb-2 instructions set
  • 46.
    What you mightnot know• ARMv7 has no hardware support for integerdivision• VFPv3, VFPv4 FPU• NEON SIMD engine• Unaligned access is done in software on CortexA8. That means it is hundred times slower• Cortex A8 is in-order CPU. Cortex A9+ are outof order
  • 47.
    What you mightnot know• Cortex A9+ core has full VFPv3 FPU, whileCortex A8 has a VFPLite. That means, that floatoperations take 1 cycle on A9 and 10 cycles onA8!
  • 48.
    What you mightnot know• NEON – 16 registers, 128 bit wide each.Supports operations on 8, 16, 32 and 64 bitsintegers and 32 bits float values• NEON can be used for:– Software geometry instancing;– Skinning;– As a general vertex processor;– Other, typical, applications for SIMD.
  • 49.
    What you mightnot know• There are 3 ways to use NEON engine in yourcode:1. Intrinsics1.1 GLKMath2. Handwritten NEON assembly3. Autovectorization. Add –mllvm –vectorize –mllvm –bb-vectorize-aligned-only to Other CC++Flags in project settings and you are ready to go.
  • 51.
    What you mightnot know• Intrinsics:
  • 52.
    What you mightnot know• Assembly:
  • 53.
    What you mightnot know• Summary:Running time, ms CPU usage, %Intrinsics 2764 19Assembly 3664 20FPU 6209 25-28FPU autovectorized 5028 22-24• Intrinsics got me 25% speedup over assembly.• Note that speed of code generated fromintrinsics will vary from compiler to compiler.Modern compilers are really good in this.
  • 54.
    What you mightnot know• Intrinsics advantages over assembly:– Higher level code;– Much simpler;– No need to manage registers;– You can vectorize basic blocks and buildsolution for every new problem with thisblocks. In contrast to assembly – you have tosolve each new problem from scratch;
  • 55.
    What you mightnot know• Assembly advantages over intrinsics:– Code generated from intrinsics vary fromcompiler to compiler and can give you reallybig difference in speed. Assembly code willalways be the same.
  • 56.
    What you mightnot know__attribute__((always_inline)) void Matrix4ByVec4(constfloat32x4x4_t* __restrict__ mat, const float32x4_t* __restrict__vec, float32x4_t* __restrict__ result){(*result) = vmulq_n_f32((*mat).val[0], (*vec)[0]);(*result) = vmlaq_n_f32((*result), (*mat).val[1], (*vec)[1]);(*result) = vmlaq_n_f32((*result), (*mat).val[2], (*vec)[2]);(*result) = vmlaq_n_f32((*result), (*mat).val[3], (*vec)[3]);}
  • 57.
    What you mightnot know__attribute__((always_inline)) void Matrix4ByMatrix4(const float32x4x4_t* __restrict__ m1, const float32x4x4_t* __restrict__ m2,float32x4x4_t* __restrict__ r){#ifdef INTRINSICS(*r).val[0] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[0], 0));(*r).val[1] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[1], 0));(*r).val[2] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[2], 0));(*r).val[3] = vmulq_n_f32((*m1).val[0], vgetq_lane_f32((*m2).val[3], 0));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[1], vgetq_lane_f32((*m2).val[0], 1));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[1], vgetq_lane_f32((*m2).val[1], 1));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[1], vgetq_lane_f32((*m2).val[2], 1));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[1], vgetq_lane_f32((*m2).val[3], 1));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[2], vgetq_lane_f32((*m2).val[0], 2));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[2], vgetq_lane_f32((*m2).val[1], 2));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[2], vgetq_lane_f32((*m2).val[2], 2));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[2], vgetq_lane_f32((*m2).val[3], 2));(*r).val[0] = vmlaq_n_f32((*r).val[0], (*m1).val[3], vgetq_lane_f32((*m2).val[0], 3));(*r).val[1] = vmlaq_n_f32((*r).val[1], (*m1).val[3], vgetq_lane_f32((*m2).val[1], 3));(*r).val[2] = vmlaq_n_f32((*r).val[2], (*m1).val[3], vgetq_lane_f32((*m2).val[2], 3));(*r).val[3] = vmlaq_n_f32((*r).val[3], (*m1).val[3], vgetq_lane_f32((*m2).val[3], 3));}
  • 58.
    What you mightnot know__asm__ volatile("vldmia %6, { q0-q3 } nt""vldmia %0, { q8-q11 }nt""vmul.f32 q12, q8, d0[0]nt""vmul.f32 q13, q8, d2[0]nt""vmul.f32 q14, q8, d4[0]nt""vmul.f32 q15, q8, d6[0]nt""vmla.f32 q12, q9, d0[1]nt""vmla.f32 q13, q9, d2[1]nt""vmla.f32 q14, q9, d4[1]nt""vmla.f32 q15, q9, d6[1]nt""vmla.f32 q12, q10, d1[0]nt""vmla.f32 q13, q10, d3[0]nt""vmla.f32 q14, q10, d5[0]nt""vmla.f32 q15, q10, d7[0]nt""vmla.f32 q12, q11, d1[1]nt""vmla.f32 q13, q11, d3[1]nt""vmla.f32 q14, q11, d5[1]nt""vmla.f32 q15, q11, d7[1]nt""vldmia %1, { q0-q3 } nt""vmul.f32 q8, q12, d0[0]nt""vmul.f32 q9, q12, d2[0]nt""vmul.f32 q10, q12, d4[0]nt""vmul.f32 q11, q12, d6[0]nt""vmla.f32 q8, q13, d0[1]nt""vmla.f32 q8, q14, d1[0]nt""vmla.f32 q8, q15, d1[1]nt""vmla.f32 q9, q13, d2[1]nt""vmla.f32 q9, q14, d3[0]nt""vmla.f32 q9, q15, d3[1]nt""vmla.f32 q10, q13, d4[1]nt""vmla.f32 q10, q14, d5[0]nt""vmla.f32 q10, q15, d5[1]nt""vmla.f32 q11, q13, d6[1]nt""vmla.f32 q11, q14, d7[0]nt""vmla.f32 q11, q15, d7[1]nt""vstmia %2, { q8 }nt""vstmia %3, { q9 }nt""vstmia %4, { q10 }nt""vstmia %5, { q11 }":: "r" (proj), "r" (squareVertices), "r" (v1), "r" (v2), "r" (v3), "r" (v4), "r" (modelView): "memory", "q0", "q1", "q2", "q3", "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15");
  • 59.
    What you mightnot know• For detailed explanation onintrinsicsassembly see:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491e/CIHJBEFE.html

Editor's Notes

  • #2 In this presentation I am going to talk mostly about Imagination Technologies GPUs. This is at least 50% of the market. All test I did on iOS, but I assume, you’ll get the same behaviour on Android.This presentation will consist from few parts, each dedicated to optimisation problems in one area.
  • #7 I’ll start from the most common recommendations.

[8]ページ先頭

©2009-2025 Movatter.jp