- Notifications
You must be signed in to change notification settings - Fork22
A C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.
License
a2flo/floor
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project provides a unified compute & graphics host API, as well as a unified compute & graphics C++ device language and library to enable same-source CUDA/Host/Metal/OpenCL/Vulkan programming and execution.
The unified host API is implemented atcompute andgraphics.All backends (CUDA/Host/Metal/OpenCL/Vulkan) currently provide compute support, while graphics support is limited to Metal and Vulkan.
To provide a unified device language, a clang/LLVM/libc++ 14.0 toolchain has beenmodified.
Certain parts of libfloor are used by both host and device code (math andconstexpr). Additional device library code is located atdevice.
Advanced examples can be found in thefloor_examples repository.
Let’s take this fairly simple C++ kernel below that computes the body/body-interactions in aN-body simulation and compile it for each backend. Note that loop unrolling is omitted for conciseness.
// define global constantsstaticconstexpr constantconstuint32_t NBODY_TILE_SIZE {256u };staticconstexpr constantconstfloat NBODY_DAMPING {0.999f };staticconstexpr constantconstfloat NBODY_SOFTENING {0.01f };// define a 1D kernel with a required local size of (NBODY_TILE_SIZE = 256, 1, 1)kernel_1d(NBODY_TILE_SIZE)void simplified_nbody(buffer<const float4> in_positions,// read-only global memory buffer buffer<float4> out_positions,// read-write global memory buffer buffer<float3> inout_velocities,// read-write global memory buffer param<float> time_delta) {// read-only parameter// each work-item represents/computes one bodyconstauto position = in_positions[global_id.x];auto velocity = inout_velocities[global_id.x]; float3 acceleration;// vectors are automatically zero-initialized local_buffer<float4, NBODY_TILE_SIZE> local_body_positions;// local memory array allocation// loop over all bodiesfor (uint32_t i =0, tile =0, count = global_size.x; i < count; i += NBODY_TILE_SIZE, ++tile) {// move resp. body position/mass from global to local memory local_body_positions[local_id.x] = in_positions[tile * NBODY_TILE_SIZE + local_id.x];local_barrier();// barrier across all work-items in this work-group// loop over bodies in this work-groupfor (uint32_t j =0; j < NBODY_TILE_SIZE; ++j) {constauto r = local_body_positions[j].xyz - position.xyz;constauto dist_sq = r.dot(r) + (NBODY_SOFTENING * NBODY_SOFTENING);constauto inv_dist =rsqrt(dist_sq);constauto s = local_body_positions[j].w * (inv_dist * inv_dist * inv_dist);// .w is mass acceleration += r * s; }local_barrier(); } velocity = (velocity + acceleration * time_delta) * NBODY_DAMPING; out_positions[global_id.x].xyz += velocity * time_delta;// update XYZ position inout_velocities[global_id.x] = velocity;// update velocity}
click to unfold the output for each backend
CUDA / PTX
You can download the PTX filehere and the CUBIN filehere (note that building CUBINs is optional and requiresptxas
).//// Generated by LLVM NVPTX Back-End//.version8.4.target sm_86.address_size64// .globlsimplified_nbody// _ZZ16simplified_nbodyE20local_body_positions has been demoted.visible .entry simplified_nbody(.param .u64 simplified_nbody_param_0,.param .u64 simplified_nbody_param_1,.param .u64 simplified_nbody_param_2,.param .f32 simplified_nbody_param_3).reqntid256,1,1{.reg .pred %p<3>;.reg .b32 %r<25>;.reg .f32 %f<71>;.reg .b64 %rd<18>;// demoted variable.shared.align4 .b8 _ZZ16simplified_nbodyE20local_body_positions[4096];mov.u32 %r1, %tid.x;mov.u32%r11, %ntid.x;mov.u32%r12, %ctaid.x;mad.lo.s32%r13,%r12,%r11, %r1;cvt.u64.u32 %rd3,%r13;mul.wide.u32 %rd7,%r13,12;ld.param.u64 %rd8, [simplified_nbody_param_2];cvta.to.global.u64 %rd9, %rd8;add.s64 %rd4, %rd9, %rd7;ld.global.f32 %f6, [%rd4+8];add.s64 %rd6, %rd4,8;ld.global.f32 %f5, [%rd4+4];add.s64 %rd5, %rd4,4;ld.global.f32 %f4, [%rd4];mul.wide.u32 %rd10,%r13,16;ld.param.u64 %rd11, [simplified_nbody_param_0];cvta.to.global.u64 %rd2, %rd11;add.s64 %rd12, %rd2, %rd10;ld.global.nc.f32 %f3, [%rd12+8];ld.global.nc.f32 %f2, [%rd12+4];ld.global.nc.f32 %f1, [%rd12];mov.u32%r14, %nctaid.x;mul.lo.s32 %r2,%r14,%r11;shl.b32%r15, %r1,4;mov.u32 %r16, _ZZ16simplified_nbodyE20local_body_positions;add.s32 %r3, %r16,%r15;ld.param.u64 %rd13, [simplified_nbody_param_1];cvta.to.global.u64 %rd1, %rd13;mov.f32 %f68, 0f00000000;mov.u32%r10,0;ld.param.f32 %f16, [simplified_nbody_param_3];mov.u32 %r22,%r10;mov.u32 %r23,%r10;mov.f32 %f69, %f68;mov.f32 %f70, %f68;LBB0_1:shl.b32 %r18, %r23,8;add.s32 %r19, %r18, %r1;mul.wide.u32 %rd14, %r19,16;add.s64 %rd15, %rd2, %rd14;ld.global.nc.f32 %f18, [%rd15];st.shared.f32 [%r3], %f18;ld.global.nc.f32 %f19, [%rd15+4];st.shared.f32 [%r3+4], %f19;ld.global.nc.f32 %f20, [%rd15+8];st.shared.f32 [%r3+8], %f20;ld.global.nc.f32 %f21, [%rd15+12];st.shared.f32 [%r3+12], %f21;barrier.sync0;mov.u32 %r24,%r10;LBB0_2:add.s32 %r21, %r16, %r24;ld.shared.f32 %f22, [%r21+4];sub.f32 %f23, %f22, %f2;ld.shared.f32 %f24, [%r21];sub.f32 %f25, %f24, %f1;fma.rn.f32 %f26, %f25, %f25, 0f38D1B717;fma.rn.f32 %f27, %f23, %f23, %f26;ld.shared.f32 %f28, [%r21+8];sub.f32 %f29, %f28, %f3;fma.rn.f32 %f30, %f29, %f29, %f27;rsqrt.approx.ftz.f32 %f31, %f30;mul.f32 %f32, %f31, %f31;mul.f32 %f33, %f32, %f31;ld.shared.f32 %f34, [%r21+12];mul.f32 %f35, %f33, %f34;fma.rn.f32 %f36, %f35, %f29, %f68;ld.shared.f32 %f37, [%r21+20];sub.f32 %f38, %f37, %f2;ld.shared.f32 %f39, [%r21+16];sub.f32 %f40, %f39, %f1;fma.rn.f32 %f41, %f40, %f40, 0f38D1B717;fma.rn.f32 %f42, %f38, %f38, %f41;ld.shared.f32 %f43, [%r21+24];sub.f32 %f44, %f43, %f3;fma.rn.f32 %f45, %f44, %f44, %f42;rsqrt.approx.ftz.f32 %f46, %f45;mul.f32 %f47, %f46, %f46;mul.f32 %f48, %f47, %f46;ld.shared.f32 %f49, [%r21+28];mul.f32 %f50, %f48, %f49;fma.rn.f32 %f68, %f50, %f44, %f36;fma.rn.f32 %f51, %f35, %f23, %f69;fma.rn.f32 %f69, %f50, %f38, %f51;fma.rn.f32 %f52, %f35, %f25, %f70;fma.rn.f32 %f70, %f50, %f40, %f52;add.s32 %r24, %r24,32;setp.eq.s32 %p1, %r24,4096;@%p1 bra LBB0_3;bra.uni LBB0_2;LBB0_3:add.s32 %r22, %r22,256;setp.lt.u32 %p2, %r22, %r2;barrier.sync0;add.s32 %r23, %r23,1;@%p2 bra LBB0_1;fma.rn.f32 %f53, %f70, %f16, %f4;mul.f32 %f54, %f53, 0f3F7FBE77;shl.b64 %rd16, %rd3,4;add.s64 %rd17, %rd1, %rd16;ld.global.f32 %f55, [%rd17];fma.rn.f32 %f56, %f54, %f16, %f55;st.global.f32 [%rd17], %f56;fma.rn.f32 %f57, %f69, %f16, %f5;mul.f32 %f58, %f57, 0f3F7FBE77;ld.global.f32 %f59, [%rd17+4];fma.rn.f32 %f60, %f58, %f16, %f59;st.global.f32 [%rd17+4], %f60;fma.rn.f32 %f61, %f68, %f16, %f6;mul.f32 %f62, %f61, 0f3F7FBE77;ld.global.f32 %f63, [%rd17+8];fma.rn.f32 %f64, %f62, %f16, %f63;st.global.f32 [%rd17+8], %f64;st.global.f32 [%rd4], %f54;st.global.f32 [%rd5], %f58;st.global.f32 [%rd6], %f62;ret;}
Host-Compute (x86 CPU)
Note that the compiler would usually directly output a.bin file (ELF format). The output below comes from disassembling it withobjdump -d
. Also note that this has been compiled for thex86-5
target (AVX-512+).nbody.bin: file formatelf64-x86-64Disassembly of section .text:0000000000000000 <simplified_nbody>:0:55push %rbp1:4889 e5mov %rsp,%rbp4:4157push %r156:4156push %r148:4155push %r13 a:4154push %r12 c:53push %rbx d:4883 e4 c0and$0xffffffffffffffc0,%rsp11:4881 ec40090000sub$0x940,%rsp18:48 8d05 f9 ff ff fflea-0x7(%rip),%rax #18 <simplified_nbody+0x18> 1f:49 be0000000000 movabs$0x0,%r1426:00000029:4889 4c2450mov %rcx,0x50(%rsp) 2e:4889742468mov %rsi,0x68(%rsp)33:4889 7c2448mov %rdi,0x48(%rsp)38:4901 c6add %rax,%r14 3b:48 b80000000000 movabs$0x0,%rax42:00000045:49 8b0406mov (%r14,%rax,1),%rax49:8b00mov (%rax),%eax 4b:48 8d 0c40lea (%rax,%rax,2),%rcx 4f:4889 c6mov %rax,%rsi52:48 c1 e604shl$0x4,%rsi56:4889742458mov %rsi,0x58(%rsp) 5b:48 8d04 8alea (%rdx,%rcx,4),%rax 5f:c5 fa1004 8avmovss (%rdx,%rcx,4),%xmm064:c5 f9 6e54 8a04vmovd0x4(%rdx,%rcx,4),%xmm2 6a:c5 fa10 4c 8a08vmovss0x8(%rdx,%rcx,4),%xmm170:4889442460mov %rax,0x60(%rsp)75:48 b80000000000 movabs$0x0,%rax 7c:000000 7f:49 8b0406mov (%r14,%rax,1),%rax83:8b18mov (%rax),%ebx85:c5 fa114424 3cvmovss %xmm0,0x3c(%rsp) 8b:c5 f9 7e542440vmovd %xmm2,0x40(%rsp)91:c5 fa11 4c2444vmovss %xmm1,0x44(%rsp)97:85 dbtest %ebx,%ebx99:0f84 f9160000je1798 <simplified_nbody+0x1798> 9f:48 8b442448mov0x48(%rsp),%rax a4:49 bd0000000000 movabs$0x0,%r13 ab:000000 ae:4531 ffxor %r15d,%r15d b1:c5 fa100430vmovss (%rax,%rsi,1),%xmm0 b6:c5 fa10 4c3004vmovss0x4(%rax,%rsi,1),%xmm1 bc:c5 fa10543008vmovss0x8(%rax,%rsi,1),%xmm2 c2:48 b80000000000 movabs$0x0,%rax c9:000000 cc:49 8b0406mov (%r14,%rax,1),%rax d0:4889442478mov %rax,0x78(%rsp) d5:4b 8d04 2elea (%r14,%r13,1),%rax d9:4889442470mov %rax,0x70(%rsp) de:48 b80000000000 movabs$0x0,%rax e5:000000 e8:62 f2 7d4818 c0vbroadcastss %xmm0,%zmm0 ee:4d 8b2406mov (%r14,%rax,1),%r12 f2:62 f2 7d4818 c9vbroadcastss %xmm1,%zmm1 f8:48 b80000000000 movabs$0x0,%rax ff:000000102:62 f1 7c48294424vmovaps %zmm0,0x700(%rsp)109:1c 10a:62 f2 7d4818 c2vbroadcastss %xmm2,%zmm0110:62 d2 fd48 5b1406 vbroadcasti64x4 (%r14,%rax,1),%zmm2117:48 b80000000000 movabs$0x0,%rax 11e:000000121:62 f1 7c4829 4c24vmovaps %zmm1,0x6c0(%rsp)128:1b129:62 d2 fd48 5b 0c06 vbroadcasti64x4 (%r14,%rax,1),%zmm1130:48 b80000000000 movabs$0x0,%rax137:000000 13a:62 f1 7c48294424vmovaps %zmm0,0x680(%rsp)141:1a142:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0146:c5 f82984248000vmovaps %xmm0,0x80(%rsp) 14d:0000 14f:62 f1 fd48 7f5424 vmovdqa64 %zmm2,0x640(%rsp)156:19157:62 d2 fd48 5b1406 vbroadcasti64x4 (%r14,%rax,1),%zmm2 15e:48 b80000000000 movabs$0x0,%rax165:000000168:62 f1 fd48 7f 4c24 vmovdqa64 %zmm1,0x840(%rsp) 16f:21170:62 d2 7d4818 0c06vbroadcastss (%r14,%rax,1),%zmm1177:48 b80000000000 movabs$0x0,%rax 17e:000000181:62 f1 fd48 7f5424 vmovdqa64 %zmm2,0x800(%rsp)188:20189:62 d2 fd48 5b1406 vbroadcasti64x4 (%r14,%rax,1),%zmm2190:48 b80000000000 movabs$0x0,%rax197:000000 19a:62 f1 7c4829 4c24vmovaps %zmm1,0x600(%rsp) 1a1:18 1a2:62 d2 7d4818 0c06vbroadcastss (%r14,%rax,1),%zmm1 1a9:48 b80000000000 movabs$0x0,%rax 1b0:000000 1b3:62 d2 7d48180406vbroadcastss (%r14,%rax,1),%zmm0 1ba:62 f1 fd48 7f5424 vmovdqa64 %zmm2,0x7c0(%rsp) 1c1:1f 1c2:62 f1 7c4829 4c24vmovaps %zmm1,0x780(%rsp) 1c9:1e 1ca:62 f1 7c48294424vmovaps %zmm0,0x740(%rsp) 1d1:1d 1d2:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0 1d6:c5 f8298424 c000vmovaps %xmm0,0xc0(%rsp) 1dd:0000 1df:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0 1e3:c5 f82984240001vmovaps %xmm0,0x100(%rsp) 1ea:0000 1ec:0f 1f4000 nopl0x0(%rax) 1f0:48 8b442478mov0x78(%rsp),%rax 1f5:48 8b542448mov0x48(%rsp),%rdx 1fa:8b00mov (%rax),%eax 1fc:42 8d 0c38lea (%rax,%r15,1),%ecx200:48 c1 e004shl$0x4,%rax204:48 c1 e104shl$0x4,%rcx208:c5 f81004 0avmovups (%rdx,%rcx,1),%xmm0 20d:48 8b 4c2470mov0x70(%rsp),%rcx212:c5 f8290408vmovaps %xmm0,(%rax,%rcx,1)217:c5 f877vzeroupper 21a:41 ff d4call*%r12 21d:6291 7c4828 5c 2evmovaps0x80(%r14,%r13,1),%zmm3224:02225:62 f1 7c48286424vmovaps0x640(%rsp),%zmm4 22c:19 22d:6281 7c4828 5c 2evmovaps0xc0(%r14,%r13,1),%zmm19234:03235:6291 7c482854 2evmovaps0x180(%r14,%r13,1),%zmm2 23c:06 23d:6211 7c4828 4c 2evmovaps0x100(%r14,%r13,1),%zmm9244:04245:6211 7c4828 6c 2evmovaps0x140(%r14,%r13,1),%zmm13 24c:05 24d:6281 7c4828 4c 2evmovaps0x1c0(%r14,%r13,1),%zmm17254:07255:6271 7c48287424vmovaps0x800(%rsp),%zmm14 25c:20 25d:6291 7c482804 2evmovaps (%r14,%r13,1),%zmm0264:6281 7c482854 2evmovaps0x40(%r14,%r13,1),%zmm18 26b:01 26c:62 f1 7c48287424vmovaps0x7c0(%rsp),%zmm6273:1f274:6201 7c482844 2evmovaps0x280(%r14,%r13,1),%zmm24 27b:0a 27c:6281 7c482874 2evmovaps0x200(%r14,%r13,1),%zmm22283:08284:6281 7c4828 6c 2evmovaps0x240(%r14,%r13,1),%zmm21 28b:09 28c:6281 7c4828 7c 2evmovaps0x2c0(%r14,%r13,1),%zmm23293:0b294:6201 7c482864 2evmovaps0x380(%r14,%r13,1),%zmm28 29b:0e 29c:6201 7c482854 2evmovaps0x300(%r14,%r13,1),%zmm26 2a3:0c 2a4:6201 7c4828 5c 2evmovaps0x3c0(%r14,%r13,1),%zmm27 2ab:0f 2ac:62 f1 7c4828 cbvmovaps %zmm3,%zmm1 2b2:62 e1 7c4828 e2vmovaps %zmm2,%zmm20 2b8:62 d1 7c4828 e9vmovaps %zmm9,%zmm5 2be:6261 7c4828 cavmovaps %zmm2,%zmm25 2c4:62 f1 7c4828 f8vmovaps %zmm0,%zmm7 2ca:6271 7c4828 fbvmovaps %zmm3,%zmm15 2d0:62 e1 7c4828 c0vmovaps %zmm0,%zmm16 2d6:6271 7c4828 c3vmovaps %zmm3,%zmm8 2dc:6271 7c4828 e0vmovaps %zmm0,%zmm12 2e2:6271 7c4828 d2vmovaps %zmm2,%zmm10 2e8:62 b2 4d48 7f db vpermt2ps %zmm19,%zmm6,%zmm3 2ee:62 b2 4d48 7f c2 vpermt2ps %zmm18,%zmm6,%zmm0 2f4:6261 7c4828 f4vmovaps %zmm4,%zmm30 2fa:62 b2 4d48 7f d1 vpermt2ps %zmm17,%zmm6,%zmm2300:6251 7c4828 d9vmovaps %zmm9,%zmm11306:6201 7c4828 e8vmovaps %zmm24,%zmm29 30c:6201 7c4828 fcvmovaps %zmm28,%zmm31312:62 b2 5d48 7f cb vpermt2ps %zmm19,%zmm4,%zmm1318:62 a2 5d48 7f e1 vpermt2ps %zmm17,%zmm4,%zmm20 31e:62 d2 5d48 7f ed vpermt2ps %zmm13,%zmm4,%zmm5324:6222 0d48 7f c9 vpermt2ps %zmm17,%zmm14,%zmm25 32a:62 b2 5d48 7f fa vpermt2ps %zmm18,%zmm4,%zmm7330:62 d1 7c4828 e1vmovaps %zmm9,%zmm4336:6232 0d48 7f fb vpermt2ps %zmm19,%zmm14,%zmm15 33c:62 a2 0d48 7f c2 vpermt2ps %zmm18,%zmm14,%zmm16342:6252 4d48 7f cd vpermt2ps %zmm13,%zmm6,%zmm9348:6252 0d48 7f dd vpermt2ps %zmm13,%zmm14,%zmm11 34e:6291 7c4828 f2vmovaps %zmm26,%zmm6354:6222 0d40 7f ef vpermt2ps %zmm23,%zmm30,%zmm29 35a:62 f3 fd4823 c3 e4 vshuff64x2$0xe4,%zmm3,%zmm0,%zmm0361:6291 7c4828 dcvmovaps %zmm28,%zmm3367:62 f1 7c4829 4c24vmovaps %zmm1,0x140(%rsp) 36e:05 36f:62 f1 7c4828 4c24vmovaps0x840(%rsp),%zmm1376:21377:62 b3 d54823 ec e4 vshuff64x2$0xe4,%zmm20,%zmm5,%zmm5 37e:6261 7c4829 4c24vmovaps %zmm25,0x280(%rsp)385:0a386:6201 7c4828 4c 2evmovaps0x340(%r14,%r13,1),%zmm25 38d:0d 38e:62 a1 7c4828 e6vmovaps %zmm22,%zmm20394:62 f3 b54823 d2 e4 vshuff64x2$0xe4,%zmm2,%zmm9,%zmm2 39b:6271 7c4828 4c24vmovaps0x640(%rsp),%zmm9 3a2:19 3a3:6292 0d48 7f db vpermt2ps %zmm27,%zmm14,%zmm3 3a9:62 f3 c54823 7c24 vshuff64x2$0xe4,0x140(%rsp),%zmm7,%zmm7 3b0:05 e4 3b2:62 a2 0d48 7f e5 vpermt2ps %zmm21,%zmm14,%zmm20 3b8:62 f1 fd48294424vmovapd %zmm0,0x140(%rsp) 3bf:05 3c0:62 f1 fd4829 6c24vmovapd %zmm5,0x4c0(%rsp) 3c7:13 3c8:62 f1 7c4828 6c24vmovaps0x7c0(%rsp),%zmm5 3cf:1f 3d0:62 f1 fd48295424vmovapd %zmm2,0x500(%rsp) 3d7:14 3d8:62327548 7f c3 vpermt2ps %zmm19,%zmm1,%zmm8 3de:62327548 7f e2 vpermt2ps %zmm18,%zmm1,%zmm12 3e4:62 a1 7c4828 devmovaps %zmm22,%zmm19 3ea:6281 7c4828 d0vmovaps %zmm24,%zmm18 3f0:62327548 7f d1 vpermt2ps %zmm17,%zmm1,%zmm10 3f6:6281 7c4828 c8vmovaps %zmm24,%zmm17 3fc:62 d27548 7f e5 vpermt2ps %zmm13,%zmm1,%zmm4402:6211 7c4828 eevmovaps %zmm30,%zmm13408:6221 7c4828 f6vmovaps %zmm22,%zmm30 40e:62 a2 0d48 7f d7 vpermt2ps %zmm23,%zmm14,%zmm18414:62 a27548 7f cf vpermt2ps %zmm23,%zmm1,%zmm17 41a:62 a27548 7f dd vpermt2ps %zmm21,%zmm1,%zmm19420:62021548 7f fb vpermt2ps %zmm27,%zmm13,%zmm31426:62921548 7f f1 vpermt2ps %zmm25,%zmm13,%zmm6 42c:62221548 7f f5 vpermt2ps %zmm21,%zmm13,%zmm30432:6211 7c4828 ecvmovaps %zmm28,%zmm13438:62 f1 fd4829 7c24vmovapd %zmm7,0x240(%rsp) 43f:09440:62 f3 a54823 7c24 vshuff64x2$0xe4,0x280(%rsp),%zmm11,%zmm7447:0a e4449:62025548 7f e3 vpermt2ps %zmm27,%zmm5,%zmm28 44f:62225548 7f c7 vpermt2ps %zmm23,%zmm5,%zmm24455:62 a25548 7f f5 vpermt2ps %zmm21,%zmm5,%zmm22 45b:62127548 7f eb vpermt2ps %zmm27,%zmm1,%zmm13461:6281 7c4828 7c 2evmovaps0x4c0(%r14,%r13,1),%zmm23468:13469:62 e1 7c4828 6c24vmovaps0x6c0(%rsp),%zmm21470:1b471:62 d3 dd4823 c2 e4 vshuff64x2$0xe4,%zmm10,%zmm4,%zmm0478:6253 fd4023 d7 e4 vshuff64x2$0xe4,%zmm15,%zmm16,%zmm10 47f:6211 7c4828 favmovaps %zmm26,%zmm15485:6253 9d4823 c0 e4 vshuff64x2$0xe4,%zmm8,%zmm12,%zmm8 48c:6211 7c4828 e2vmovaps %zmm26,%zmm12492:62025548 7f d1 vpermt2ps %zmm25,%zmm5,%zmm26498:6281 7c482844 2evmovaps0x540(%r14,%r13,1),%zmm16 49f:15 4a0:6233 e54023 d9 e4 vshuff64x2$0xe4,%zmm17,%zmm19,%zmm11 4a7:62 a3 dd4023 d2 e4 vshuff64x2$0xe4,%zmm18,%zmm20,%zmm18 4ae:6281 7c482864 2evmovaps0x580(%r14,%r13,1),%zmm20 4b5:16 4b6:6281 7c4828 4c 2evmovaps0x500(%r14,%r13,1),%zmm17 4bd:14 4be:6212 0d48 7f f9 vpermt2ps %zmm25,%zmm14,%zmm15 4c4:62127548 7f e1 vpermt2ps %zmm25,%zmm1,%zmm12 4ca:6201 7c4828 4c 2evmovaps0x5c0(%r14,%r13,1),%zmm25 4d1:17 4d2:6293 8d4023 d5 e4 vshuff64x2$0xe4,%zmm29,%zmm30,%zmm2 4d9:62 e1 7c4828 5c24vmovaps0x780(%rsp),%zmm19 4e0:1e 4e1:62 f1 fd48294424vmovapd %zmm0,0x440(%rsp) 4e8:11 4e9:6293 cd4823 c7 e4 vshuff64x2$0xe4,%zmm31,%zmm6,%zmm0 4f0:62 f1 fd48295424vmovapd %zmm2,0x200(%rsp) 4f7:08 4f8:62 f1 7c4828 d5vmovaps %zmm5,%zmm2 4fe:62 f1 fd48294424vmovapd %zmm0,0x400(%rsp)505:10506:6293 cd4023 c0 e4 vshuff64x2$0xe4,%zmm24,%zmm22,%zmm0 50d:6281 7c482874 2evmovaps0x400(%r14,%r13,1),%zmm22514:10515:6201 7c482844 2evmovaps0x480(%r14,%r13,1),%zmm24 51c:12 51d:62 f1 fd48294424vmovapd %zmm0,0x480(%rsp)524:12525:6293 ad4023 e4 e4 vshuff64x2$0xe4,%zmm28,%zmm26,%zmm4 52c:62 d3 9d4823 ed e4 vshuff64x2$0xe4,%zmm13,%zmm12,%zmm5533:62 f3854823 db e4 vshuff64x2$0xe4,%zmm3,%zmm15,%zmm3 53a:6221 7c4828 dcvmovaps %zmm20,%zmm27540:6221 7c4828 e1vmovaps %zmm17,%zmm28546:62 f1 fd48296424vmovapd %zmm4,0x280(%rsp) 54d:0a 54e:6291 7c482864 2evmovaps0x440(%r14,%r13,1),%zmm4555:11556:6221 7c4828 f4vmovaps %zmm20,%zmm30 55c:6221 7c4828 f9vmovaps %zmm17,%zmm31562:62023548 7f d9 vpermt2ps %zmm25,%zmm9,%zmm27568:62223548 7f e0 vpermt2ps %zmm16,%zmm9,%zmm28 56e:6202 0d48 7f f1 vpermt2ps %zmm25,%zmm14,%zmm30574:6222 0d48 7f f8 vpermt2ps %zmm16,%zmm14,%zmm31 57a:6201 7c4828 d0vmovaps %zmm24,%zmm26580:6231 7c4828 eevmovaps %zmm22,%zmm13586:6211 7c4828 f8vmovaps %zmm24,%zmm15 58c:6221 7c4828 eevmovaps %zmm22,%zmm29592:62223548 7f d7 vpermt2ps %zmm23,%zmm9,%zmm26598:62327548 7f ff vpermt2ps %zmm23,%zmm1,%zmm15 59e:6293 9d4023 f3 e4 vshuff64x2$0xe4,%zmm27,%zmm28,%zmm6 5a5:62723548 7f ec vpermt2ps %zmm4,%zmm9,%zmm13 5ab:6221 7c4828 e4vmovaps %zmm20,%zmm28 5b1:6262 0d48 7f ec vpermt2ps %zmm4,%zmm14,%zmm29 5b7:62027548 7f e1 vpermt2ps %zmm25,%zmm1,%zmm28 5bd:62 f1 fd48297424vmovapd %zmm6,0x1c0(%rsp) 5c4:07 5c5:62 b1 7c4828 f6vmovaps %zmm22,%zmm6 5cb:62 f27548 7f f4 vpermt2ps %zmm4,%zmm1,%zmm6 5d1:6293954823 c2 e4 vshuff64x2$0xe4,%zmm26,%zmm13,%zmm0 5d8:6271 7c4828 e9vmovaps %zmm1,%zmm13 5de:62 f1 fd48294424vmovapd %zmm0,0x180(%rsp) 5e5:06 5e6:62 d3 cd4823 c7 e4 vshuff64x2$0xe4,%zmm15,%zmm6,%zmm0 5ed:62 f1 7c48287424vmovaps0x600(%rsp),%zmm6 5f4:18 5f5:62 f1 fd48294424vmovapd %zmm0,0x300(%rsp) 5fc:0c 5fd:62 b1 7c4828 c1vmovaps %zmm17,%zmm0603:62 b27548 7f c0 vpermt2ps %zmm16,%zmm1,%zmm0609:62 f1 7c4828 4c24vmovaps0x240(%rsp),%zmm1610:09611:6293 fd4823 c4 e4 vshuff64x2$0xe4,%zmm28,%zmm0,%zmm0618:6261 7c4828 e2vmovaps %zmm2,%zmm28 61e:62 e2 1d40 7f f4 vpermt2ps %zmm4,%zmm28,%zmm22624:62 f1 7c48286424vmovaps0x4c0(%rsp),%zmm4 62b:13 62c:62 a2 1d40 7f c8 vpermt2ps %zmm16,%zmm28,%zmm17632:6282 1d40 7f e1 vpermt2ps %zmm25,%zmm28,%zmm20638:62 e1 7c48284424vmovaps0x1c0(%rsp),%zmm16 63f:07640:62 f1 fd48294424vmovapd %zmm0,0x2c0(%rsp)647:0b648:6291 7c4828 c0vmovaps %zmm24,%zmm0 64e:6222 6d48 7f c7 vpermt2ps %zmm23,%zmm2,%zmm24654:62 f1 7c48285424vmovaps0x680(%rsp),%zmm2 65b:1a 65c:62 b2 0d48 7f c7 vpermt2ps %zmm23,%zmm14,%zmm0662:62 e1 7c4828 7c24vmovaps0x740(%rsp),%zmm23669:1d 66a:62 a3 f54023 e4 e4 vshuff64x2$0xe4,%zmm20,%zmm17,%zmm20671:6283 cd4023 f0 e4 vshuff64x2$0xe4,%zmm24,%zmm22,%zmm22678:62 f3954023 c0 e4 vshuff64x2$0xe4,%zmm0,%zmm29,%zmm0 67f:6203854023 ee e4 vshuff64x2$0xe4,%zmm30,%zmm31,%zmm29686:6221 3c48 5c f5vsubps %zmm21,%zmm8,%zmm30 68c:6271 7c48284424vmovaps0x440(%rsp),%zmm8693:11694:6261 2c48 5c favsubps %zmm2,%zmm10,%zmm31 69a:62614448 5c davsubps %zmm2,%zmm7,%zmm27 6a0:62 b1 7c4828 fbvmovaps %zmm19,%zmm7 6a6:62 f16448 5c davsubps %zmm2,%zmm3,%zmm3 6ac:62 f1 7c4829 5c24vmovaps %zmm3,0x240(%rsp) 6b3:09 6b4:62 f1 fd48294424vmovapd %zmm0,0x5c0(%rsp) 6bb:17 6bc:62 f1 7c48284424vmovaps0x700(%rsp),%zmm0 6c3:1c 6c4:6221 3c48 5c d5vsubps %zmm21,%zmm8,%zmm26 6ca:62717448 5c e0vsubps %zmm0,%zmm1,%zmm12 6d0:62 f1 5c48 5c e0vsubps %zmm0,%zmm4,%zmm4 6d6:62 e1 7c40 5c c0vsubps %zmm0,%zmm16,%zmm16 6dc:6251 7c4828 ccvmovaps %zmm12,%zmm9 6e2:6271 7c4828 c4vmovaps %zmm4,%zmm8 6e8:62 e1 7c48294424vmovaps %zmm16,0x340(%rsp) 6ef:0d 6f0:6272 1d48 a8 ce vfmadd213ps %zmm6,%zmm12,%zmm9 6f6:6272 5d48 a8 c6 vfmadd213ps %zmm6,%zmm4,%zmm8 6fc:6212 0d40 b8 ce vfmadd231ps %zmm30,%zmm30,%zmm9702:6212 2d40 b8 c2 vfmadd231ps %zmm26,%zmm26,%zmm8708:62120540 b8 cf vfmadd231ps %zmm31,%zmm31,%zmm9 70e:62122540 b8 c3 vfmadd231ps %zmm27,%zmm27,%zmm8714:6252 7d48 4e d1 vrsqrt14ps %zmm9,%zmm10 71a:6252 7d48 4e f8 vrsqrt14ps %zmm8,%zmm15720:6251344859 cavmulps %zmm10,%zmm9,%zmm9726:6251 3c4859 c7vmulps %zmm15,%zmm8,%zmm8 72c:6232 2d48 a8 cb vfmadd213ps %zmm19,%zmm10,%zmm9732:6231 2c4859 d7vmulps %zmm23,%zmm10,%zmm10738:62320548 a8 c3 vfmadd213ps %zmm19,%zmm15,%zmm8 73e:6251 2c4859 d1vmulps %zmm9,%zmm10,%zmm10744:6231044859 cfvmulps %zmm23,%zmm15,%zmm9 74a:6271 7c4828 7c24vmovaps0x200(%rsp),%zmm15751:08752:62 d1344859 c8vmulps %zmm8,%zmm9,%zmm1758:62312448 5c cdvsubps %zmm21,%zmm11,%zmm9 75e:6271 6c40 5c c2vsubps %zmm2,%zmm18,%zmm8764:6271 7c4829 4c24vmovaps %zmm9,0x200(%rsp) 76b:08 76c:6271 7c48294424vmovaps %zmm8,0x3c0(%rsp)773:0f774:62 e10448 5c d8vsubps %zmm0,%zmm15,%zmm19 77a:6231 7c4828 dbvmovaps %zmm19,%zmm11780:62726540 a8 de vfmadd213ps %zmm6,%zmm19,%zmm11786:62523548 b8 d9 vfmadd231ps %zmm9,%zmm9,%zmm11 78c:6271 7c4828 4c24vmovaps0x400(%rsp),%zmm9793:10794:6252 3d48 b8 d8 vfmadd231ps %zmm8,%zmm8,%zmm11 79a:62315448 5c c5vsubps %zmm21,%zmm5,%zmm8 7a0:62 c2 7d48 4e d3 vrsqrt14ps %zmm11,%zmm18 7a6:6271 7c48294424vmovaps %zmm8,0x380(%rsp) 7ad:0e 7ae:6231244859 davmulps %zmm18,%zmm11,%zmm11 7b4:6272 6d40 a8 df vfmadd213ps %zmm7,%zmm18,%zmm11 7ba:62 a1 6c4059 d7vmulps %zmm23,%zmm18,%zmm18 7c0:62 c1 6c4059 d3vmulps %zmm11,%zmm18,%zmm18 7c6:6261 6c40594424vmulps0x480(%rsp),%zmm18,%zmm24 7cd:12 7ce:62713448 5c f8vsubps %zmm0,%zmm9,%zmm15 7d4:62 d1 7c4828 efvmovaps %zmm15,%zmm5 7da:62 f20548 a8 ee vfmadd213ps %zmm6,%zmm15,%zmm5 7e0:62 d2 3d48 b8 e8 vfmadd231ps %zmm8,%zmm8,%zmm5 7e6:62717448594424vmulps0x500(%rsp),%zmm1,%zmm8 7ed:14 7ee:62 f1744859 c9vmulps %zmm1,%zmm1,%zmm1 7f4:62 f26548 b8 eb vfmadd231ps %zmm3,%zmm3,%zmm5 7fa:62 f1 2c4859 5c24vmulps0x140(%rsp),%zmm10,%zmm3801:05802:6251 2c4859 d2vmulps %zmm10,%zmm10,%zmm10808:6272 7d48 4e dd vrsqrt14ps %zmm5,%zmm11 80e:62 d1544859 ebvmulps %zmm11,%zmm5,%zmm5814:62 f22548 a8 ef vfmadd213ps %zmm7,%zmm11,%zmm5 81a:6231244859 dfvmulps %zmm23,%zmm11,%zmm11820:6251744859 c0vmulps %zmm8,%zmm1,%zmm8826:6291 7c4828 4c 2evmovaps0x780(%r14,%r13,1),%zmm1 82d:1e 82e:6261 2c4859 cbvmulps %zmm3,%zmm10,%zmm25834:c4412857 d2vxorps %xmm10,%xmm10,%xmm10839:c46329 0c 8c2400vblendps$0x1,0x100(%rsp),%xmm10,%xmm9840:01000001844:62 f1244859 edvmulps %zmm5,%zmm11,%zmm5 84a:c46329 0c 9c24 c0vblendps$0x1,0xc0(%rsp),%xmm10,%xmm11851:00000001855:c4 e329 0c 9c2480vblendps$0x1,0x80(%rsp),%xmm10,%xmm3 85c:00000001860:6271 7c48285424vmovaps0x180(%rsp),%zmm10867:06868:62 e1 2c48 5c c8vsubps %zmm0,%zmm10,%zmm17 86e:62 f1 7c48284424vmovaps0x300(%rsp),%zmm0875:0c876:6271 7c48285424vmovaps0x2c0(%rsp),%zmm10 87d:0b 87e:62 f1 7c4829 5c24vmovaps %zmm3,0x100(%rsp)885:04886:62 b1 6c4059 davmulps %zmm18,%zmm18,%zmm3 88c:62 e15448595424vmulps0x280(%rsp),%zmm5,%zmm18893:0a894:62 f1544859 edvmulps %zmm5,%zmm5,%zmm5 89a:62123540 b8 de vfmadd231ps %zmm30,%zmm25,%zmm11 8a0:6201 7c482874 2evmovaps0x600(%r14,%r13,1),%zmm30 8a7:18 8a8:62523540 b8 cc vfmadd231ps %zmm12,%zmm25,%zmm9 8ae:6201644859 c0vmulps %zmm24,%zmm3,%zmm24 8b4:62 f11440 5c davsubps %zmm2,%zmm29,%zmm3 8ba:6221 7c4828 e8vmovaps %zmm16,%zmm29 8c0:62 e1 7c4829 4c24vmovaps %zmm17,0x80(%rsp) 8c7:02 8c8:62 e27540 a8 ce vfmadd213ps %zmm6,%zmm17,%zmm17 8ce:62621540 a8 ee vfmadd213ps %zmm6,%zmm29,%zmm29 8d4:62 f1 3c4859 f4vmulps %zmm4,%zmm8,%zmm6 8da:6232 3d40 b8 cb vfmadd231ps %zmm19,%zmm24,%zmm9 8e0:6281 7c4828 5c 2evmovaps0x700(%r14,%r13,1),%zmm19 8e7:1c 8e8:62 f1 7c4829 5c24vmovaps %zmm3,0x1c0(%rsp) 8ef:07 8f0:62 a1544859 c2vmulps %zmm18,%zmm5,%zmm16 8f6:62 e1 7c48285424vmovaps0x640(%rsp),%zmm18 8fd:19 8fe:62 d2 7d40 b8 f7 vfmadd231ps %zmm15,%zmm16,%zmm6904:6211 7c4828 fcvmovaps %zmm28,%zmm15 90a:62 b1 7c48 5c c5vsubps %zmm21,%zmm0,%zmm0910:6231 2c48 5c d5vsubps %zmm21,%zmm10,%zmm10916:62 e1 7c4828 6c24vmovaps0x5c0(%rsp),%zmm21 91d:17 91e:6211 7c4828 e6vmovaps %zmm30,%zmm12924:62 e2 7d48 b8 c8 vfmadd231ps %zmm0,%zmm0,%zmm17 92a:6242 2d48 b8 ea vfmadd231ps %zmm10,%zmm10,%zmm29930:62 f1 7c48294424vmovaps %zmm0,0xc0(%rsp)937:03938:6271 7c48295424vmovaps %zmm10,0x140(%rsp) 93f:05940:6251 7c4828 d3vmovaps %zmm11,%zmm10946:6271 7c4828 devmovaps %zmm6,%zmm11 94c:62626548 b8 eb vfmadd231ps %zmm3,%zmm3,%zmm29952:62 b1 7c4828 f3vmovaps %zmm19,%zmm6958:6292 7d48 4e c5 vrsqrt14ps %zmm29,%zmm0 95e:62 f1144059 e8vmulps %zmm0,%zmm29,%zmm5964:62 f2 7d48 a8 ef vfmadd213ps %zmm7,%zmm0,%zmm5 96a:62 e15440 5c eavsubps %zmm2,%zmm21,%zmm21970:62 a25540 b8 cd vfmadd231ps %zmm21,%zmm21,%zmm17976:62 e1 7c4829 6c24vmovaps %zmm21,0x180(%rsp) 97d:06 97e:6281 3c4859 eavmulps %zmm26,%zmm8,%zmm21984:6201 7c482854 2evmovaps0x940(%r14,%r13,1),%zmm26 98b:25 98c:62 b2 7d48 4e d1 vrsqrt14ps %zmm17,%zmm2992:62 e2 7d40 b8 6c24 vfmadd231ps0x380(%rsp),%zmm16,%zmm21999:0e 99a:62 f1744059 e2vmulps %zmm2,%zmm17,%zmm4 9a0:62 f2 6d48 a8 e7 vfmadd213ps %zmm7,%zmm2,%zmm4 9a6:62 b1 6c4859 d7vmulps %zmm23,%zmm2,%zmm2 9ac:62 f1 6c4859 d4vmulps %zmm4,%zmm2,%zmm2 9b2:62 b1 7c4859 e7vmulps %zmm23,%zmm0,%zmm4 9b8:6281 3c4859 fbvmulps %zmm27,%zmm8,%zmm23 9be:6251 7c4828 c1vmovaps %zmm9,%zmm8 9c4:6201 7c4828 5c 2evmovaps0xb40(%r14,%r13,1),%zmm27 9cb:2d 9cc:6261 5c4859 edvmulps %zmm5,%zmm4,%zmm29 9d2:62 f1 7c4828 6c24vmovaps0x100(%rsp),%zmm5 9d9:04 9da:62 f1 6c4859 e2vmulps %zmm2,%zmm2,%zmm4 9e0:62 f1 4c4059 d2vmulps %zmm2,%zmm22,%zmm2 9e6:6281 7c482874 2evmovaps0x640(%r14,%r13,1),%zmm22 9ed:19 9ee:62 e1 5c4859 cavmulps %zmm2,%zmm4,%zmm17 9f4:6291 7c482854 2evmovaps0x6c0(%r14,%r13,1),%zmm2 9fb:1b 9fc:6291 7c482864 2evmovaps0x740(%r14,%r13,1),%zmm4 a03:1d a04:62727540 b84424 vfmadd231ps0x80(%rsp),%zmm17,%zmm8 a0b:02 a0c:6271 7c48294424vmovaps %zmm8,0x80(%rsp) a13:02 a14:6271 7c48284424vmovaps0x600(%rsp),%zmm8 a1b:18 a1c:62923540 b8 ef vfmadd231ps %zmm31,%zmm25,%zmm5 a22:6201 7c4828 7c 2evmovaps0x680(%r14,%r13,1),%zmm31 a29:1a a2a:6201 7c4828 4c 2evmovaps0x7c0(%r14,%r13,1),%zmm25 a31:1f a32:6232 6d40 7f e6 vpermt2ps %zmm22,%zmm18,%zmm12 a38:62 f21548 7f f4 vpermt2ps %zmm4,%zmm13,%zmm6 a3e:62 f2 3d40 b8 6c24 vfmadd231ps0x3c0(%rsp),%zmm24,%zmm5 a45:0f a46:62 f27540 b8 6c24 vfmadd231ps0x180(%rsp),%zmm17,%zmm5 a4d:06 a4e:6291 7c4828 ffvmovaps %zmm31,%zmm7 a54:6211 7c4828 cfvmovaps %zmm31,%zmm9 a5a:62 f2 6d40 7f fa vpermt2ps %zmm2,%zmm18,%zmm7 a60:62721548 7f ca vpermt2ps %zmm2,%zmm13,%zmm9 a66:62 f1 7c4829 6c24vmovaps %zmm5,0x100(%rsp) a6d:04 a6e:62 f1 7c4828 6c24vmovaps0x700(%rsp),%zmm5 a75:1c a76:62 f3 9d4823 c7 e4 vshuff64x2$0xe4,%zmm7,%zmm12,%zmm0 a7d:62 f1 7c4828 f9vmovaps %zmm1,%zmm7 a83:6231 7c4828 e3vmovaps %zmm19,%zmm12 a89:6292 6d40 7f f9 vpermt2ps %zmm25,%zmm18,%zmm7 a8f:6272 6d40 7f e4 vpermt2ps %zmm4,%zmm18,%zmm12 a95:62 f1 fd48294424vmovapd %zmm0,0x4c0(%rsp) a9c:13 a9d:62 f3 9d4823 df e4 vshuff64x2$0xe4,%zmm7,%zmm12,%zmm3 aa4:6271 7c4828 e1vmovaps %zmm1,%zmm12aaa:6291 7c4828 fevmovaps %zmm30,%zmm7 ab0:62121548 7f e1 vpermt2ps %zmm25,%zmm13,%zmm12 ab6:62 b21548 7f fe vpermt2ps %zmm22,%zmm13,%zmm7 abc:62 f1 fd4829 5c24vmovapd %zmm3,0x500(%rsp) ac3:14 ac4:62 d3 cd4823 f4 e4 vshuff64x2$0xe4,%zmm12,%zmm6,%zmm6 acb:6253 c54823 c9 e4 vshuff64x2$0xe4,%zmm9,%zmm7,%zmm9 ad2:6291 7c4828 ffvmovaps %zmm31,%zmm7 ad8:6262 1d40 7f fa vpermt2ps %zmm2,%zmm28,%zmm31 ade:6211 7c482864 2evmovaps0x980(%r14,%r13,1),%zmm12 ae5:26 ae6:62 f2 0d48 7f fa vpermt2ps %zmm2,%zmm14,%zmm7 aec:62 f1 7c4828 d1vmovaps %zmm1,%zmm2 af2:6292 1d40 7f c9 vpermt2ps %zmm25,%zmm28,%zmm1 af8:62 f1 fd48297424vmovapd %zmm6,0x440(%rsp) aff:11 b00:6291 7c4828 f6vmovaps %zmm30,%zmm6 b06:6292 0d48 7f d1 vpermt2ps %zmm25,%zmm14,%zmm2 b0c:6222 1d40 7f f6 vpermt2ps %zmm22,%zmm28,%zmm30 b12:6201 7c4828 4c 2evmovaps0x880(%r14,%r13,1),%zmm25 b19:22 b1a:62 b2 0d48 7f f6 vpermt2ps %zmm22,%zmm14,%zmm6 b20:62 f3 cd4823 df e4 vshuff64x2$0xe4,%zmm7,%zmm6,%zmm3 b27:6291 5c4059 f5vmulps %zmm29,%zmm20,%zmm6 b2d:6293 8d4023 ff e4 vshuff64x2$0xe4,%zmm31,%zmm30,%zmm7 b34:6201 7c482874 2evmovaps0x9c0(%r14,%r13,1),%zmm30 b3b:27 b3c:62 c1 7c4828 e4vmovaps %zmm12,%zmm20 b42:6201 7c4828 7c 2evmovaps0xa80(%r14,%r13,1),%zmm31 b49:2a b4a:62 f1 fd4829 5c24vmovapd %zmm3,0x400(%rsp) b51:10 b52:62 d1 7c4828 davmovaps %zmm10,%zmm3 b58:6231 7c4828 d3vmovaps %zmm19,%zmm10 b5e:62 e2 1d40 7f dc vpermt2ps %zmm4,%zmm28,%zmm19 b64:62 f1 fd4829 7c24vmovapd %zmm7,0x280(%rsp) b6b:0a b6c:6272 0d48 7f d4 vpermt2ps %zmm4,%zmm14,%zmm10 b72:6291 7c482864 2evmovaps0x800(%r14,%r13,1),%zmm4 b79:20 b7a:62 f2 3d40 b8 5c24 vfmadd231ps0x200(%rsp),%zmm24,%zmm3 b81:08 b82:6201 7c482844 2evmovaps0x8c0(%r14,%r13,1),%zmm24 b89:23 b8a:62 f27540 b8 5c24 vfmadd231ps0xc0(%rsp),%zmm17,%zmm3 b91:03 b92:62821548 7f e6 vpermt2ps %zmm30,%zmm13,%zmm20 b98:6281 7c4828 cfvmovaps %zmm31,%zmm17 b9e:62 f3 e54023 c1 e4 vshuff64x2$0xe4,%zmm1,%zmm19,%zmm0 ba5:6291 7c4828 4c 2evmovaps0x840(%r14,%r13,1),%zmm1 bac:21 bad:6281 7c4828 d9vmovaps %zmm25,%zmm19 bb3:62 f3 ad4823 d2 e4 vshuff64x2$0xe4,%zmm2,%zmm10,%zmm2 bba:6211144059 d5vmulps %zmm29,%zmm29,%zmm10 bc0:6201 7c4828 6c 2evmovaps0x900(%r14,%r13,1),%zmm29 bc7:24 bc8:6282 6d40 7f d8 vpermt2ps %zmm24,%zmm18,%zmm19 bce:62 f1 fd48294424vmovapd %zmm0,0x480(%rsp) bd5:12 bd6:62 f1 7c4829 5c24vmovaps %zmm3,0xc0(%rsp) bdd:03 bde:62 f1 fd48295424vmovapd %zmm2,0x200(%rsp) be5:08 be6:62 b1 7c4828 d7vmovaps %zmm23,%zmm2 bec:62 e1 2c4859 fevmulps %zmm6,%zmm10,%zmm23 bf2:62 f1 7c4828 f4vmovaps %zmm4,%zmm6 bf8:6251 7c4828 d4vmovaps %zmm12,%zmm10 bfe:6212 6d40 7f d6 vpermt2ps %zmm30,%zmm18,%zmm10 c04:62724540 b8 5c24 vfmadd231ps0x340(%rsp),%zmm23,%zmm11 c0b:0d c0c:62 f2 7d40 b85424 vfmadd231ps0x240(%rsp),%zmm16,%zmm2 c13:09 c14:62 e24540 b8 6c24 vfmadd231ps0x140(%rsp),%zmm23,%zmm21 c1b:05 c1c:62 f2 6d40 7f f1 vpermt2ps %zmm1,%zmm18,%zmm6 c22:62 f24540 b85424 vfmadd231ps0x1c0(%rsp),%zmm23,%zmm2 c29:07 c2a:62 e1 7c4829 6c24vmovaps %zmm21,0x140(%rsp) c31:05 c32:6271 7c4829 5c24vmovaps %zmm11,0x240(%rsp) c39:09 c3a:6281 7c4828 f5vmovaps %zmm29,%zmm22 c40:6291 7c4828 fdvmovaps %zmm29,%zmm7 c46:6282 6d40 7f f2 vpermt2ps %zmm26,%zmm18,%zmm22 c4c:62921548 7f fa vpermt2ps %zmm26,%zmm13,%zmm7 c52:62 f1 7c48295424vmovaps %zmm2,0x1c0(%rsp) c59:07 c5a:62 b3 cd4823 c3 e4 vshuff64x2$0xe4,%zmm19,%zmm6,%zmm0 c61:6291 7c4828 f1vmovaps %zmm25,%zmm6 c67:62921548 7f f0 vpermt2ps %zmm24,%zmm13,%zmm6 c6d:62 f1 fd48294424vmovapd %zmm0,0x380(%rsp) c74:0e c75:62 d3 cd4023 c2 e4 vshuff64x2$0xe4,%zmm10,%zmm22,%zmm0 c7c:6271 7c4828 d4vmovaps %zmm4,%zmm10 c82:6272 0d48 7f d1 vpermt2ps %zmm1,%zmm14,%zmm10 c88:62 f1 fd48294424vmovapd %zmm0,0x3c0(%rsp) c8f:0f c90:62 f1 7c4828 c4vmovaps %zmm4,%zmm0 c96:62 f2 1d40 7f e1 vpermt2ps %zmm1,%zmm28,%zmm4 c9c:62 f21548 7f c1 vpermt2ps %zmm1,%zmm13,%zmm0 ca2:62 f3 fd4823 c6 e4 vshuff64x2$0xe4,%zmm6,%zmm0,%zmm0 ca9:6291 7c482874 2evmovaps0xb80(%r14,%r13,1),%zmm6 cb0:2e cb1:62 f1 fd48294424vmovapd %zmm0,0x2c0(%rsp) cb8:0b cb9:62 b3 c54823 c4 e4 vshuff64x2$0xe4,%zmm20,%zmm7,%zmm0 cc0:62 f1 7c4828 7c24vmovaps0x6c0(%rsp),%zmm7 cc7:1b cc8:62 f1 fd48294424vmovapd %zmm0,0x340(%rsp) ccf:0d cd0:6291 7c4828 c1vmovaps %zmm25,%zmm0 cd6:6202 1d40 7f c8 vpermt2ps %zmm24,%zmm28,%zmm25 cdc:6292 0d48 7f c0 vpermt2ps %zmm24,%zmm14,%zmm0 ce2:6241 7c4828 c4vmovaps %zmm12,%zmm24 ce8:6212 1d40 7f e6 vpermt2ps %zmm30,%zmm28,%zmm12 cee:6202 0d48 7f c6 vpermt2ps %zmm30,%zmm14,%zmm24 cf4:6201 7c482874 2evmovaps0xa40(%r14,%r13,1),%zmm30 cfb:29 cfc:6293 dd4823 c9 e4 vshuff64x2$0xe4,%zmm25,%zmm4,%zmm1 d03:6291 7c482864 2evmovaps0xbc0(%r14,%r13,1),%zmm4 d0a:2f d0b:62 e3 ad4823 f0 e4 vshuff64x2$0xe4,%zmm0,%zmm10,%zmm22 d12:6291 7c4828 c5vmovaps %zmm29,%zmm0 d18:6202 1d40 7f ea vpermt2ps %zmm26,%zmm28,%zmm29 d1e:6201 7c482864 2evmovaps0xa00(%r14,%r13,1),%zmm28 d25:28 d26:6211 7c482854 2evmovaps0xac0(%r14,%r13,1),%zmm10 d2d:2b d2e:6292 0d48 7f c2 vpermt2ps %zmm26,%zmm14,%zmm0 d34:62 f1 fd4829 4c24vmovapd %zmm1,0x300(%rsp) d3b:0c d3c:6291 7c4828 4c 2evmovaps0xb00(%r14,%r13,1),%zmm1 d43:2c d44:6261 7c48285424vmovaps0x780(%rsp),%zmm26 d4b:1e d4c:62 d3954023 dc e4 vshuff64x2$0xe4,%zmm12,%zmm29,%zmm3 d53:62713448 5c e7vsubps %zmm7,%zmm9,%zmm12 d59:6271 7c4828 4c24vmovaps0x680(%rsp),%zmm9 d60:1a d61:6281 7c4828 fcvmovaps %zmm28,%zmm23 d67:62 c2 6d40 7f ca vpermt2ps %zmm10,%zmm18,%zmm17 d6d:6283 fd4823 c0 e4 vshuff64x2$0xe4,%zmm24,%zmm0,%zmm16 d74:6201 7c4828 c7vmovaps %zmm31,%zmm24 d7a:6201 7c4828 ccvmovaps %zmm28,%zmm25 d80:6282 6d40 7f fe vpermt2ps %zmm30,%zmm18,%zmm23 d86:62421548 7f c2 vpermt2ps %zmm10,%zmm13,%zmm24 d8c:62021548 7f ce vpermt2ps %zmm30,%zmm13,%zmm25 d92:62 f1 fd4829 5c24vmovapd %zmm3,0x180(%rsp) d99:06 d9a:62 f1 7c4828 5c24vmovaps0x4c0(%rsp),%zmm3 da1:13 da2:62 c1 7c40 5c c1vsubps %zmm9,%zmm16,%zmm16 da8:62 b3 c54023 c1 e4 vshuff64x2$0xe4,%zmm17,%zmm23,%zmm0 daf:62 e1 7c4828 cevmovaps %zmm6,%zmm17 db5:62 e1 7c4828 f9vmovaps %zmm1,%zmm23 dbb:6293 b54023 d0 e4 vshuff64x2$0xe4,%zmm24,%zmm25,%zmm2 dc2:62 e2 6d40 7f cc vpermt2ps %zmm4,%zmm18,%zmm17 dc8:62821548 7f fb vpermt2ps %zmm27,%zmm13,%zmm23 dce:62716448 5c ddvsubps %zmm5,%zmm3,%zmm11 dd4:62 f1 fd48294424vmovapd %zmm0,0x540(%rsp) ddb:15 ddc:62 f1 7c4828 c1vmovaps %zmm1,%zmm0 de2:62 f1 fd48295424vmovapd %zmm2,0x8c0(%rsp) de9:23 dea:62 f1 7c48285424vmovaps0x400(%rsp),%zmm2 df1:10 df2:6292 6d40 7f c3 vpermt2ps %zmm27,%zmm18,%zmm0 df8:62 b3 fd4823 c1 e4 vshuff64x2$0xe4,%zmm17,%zmm0,%zmm0 dff:62 e1 7c4828 cevmovaps %zmm6,%zmm17 e05:62 c1 6c48 5c e1vsubps %zmm9,%zmm2,%zmm20 e0b:62 f1 7c48285424vmovaps0x440(%rsp),%zmm2 e12:11 e13:62 e1 7c48294424vmovaps %zmm16,0x440(%rsp) e1a:11 e1b:62 e21548 7f cc vpermt2ps %zmm4,%zmm13,%zmm17 e21:62 f1 fd48294424vmovapd %zmm0,0x880(%rsp) e28:22 e29:62 a3 c54023 e9 e4 vshuff64x2$0xe4,%zmm17,%zmm23,%zmm21 e30:6281 7c4828 cfvmovaps %zmm31,%zmm17 e36:62420548 7f fa vpermt2ps %zmm10,%zmm15,%zmm31 e3c:62 e1 7c4828 7c24vmovaps0x740(%rsp),%zmm23 e43:1d e44:62 f1 6c48 5c d7vsubps %zmm7,%zmm2,%zmm2 e4a:62 c2 0d48 7f ca vpermt2ps %zmm10,%zmm14,%zmm17 e50:6211 7c4828 d4vmovaps %zmm28,%zmm10 e56:62 f1 7c48295424vmovaps %zmm2,0x580(%rsp) e5d:16 e5e:62020548 7f e6 vpermt2ps %zmm30,%zmm15,%zmm28 e64:6212 0d48 7f d6 vpermt2ps %zmm30,%zmm14,%zmm10 e6a:6203 9d4023 f7 e4 vshuff64x2$0xe4,%zmm31,%zmm28,%zmm30 e71:62 a3 ad4823 d9 e4 vshuff64x2$0xe4,%zmm17,%zmm10,%zmm19 e78:6251 7c4828 d3vmovaps %zmm11,%zmm10 e7e:62522548 a8 d0 vfmadd213ps %zmm8,%zmm11,%zmm10 e84:6252 1d48 b8 d4 vfmadd231ps %zmm12,%zmm12,%zmm10 e8a:6232 5d40 b8 d4 vfmadd231ps %zmm20,%zmm20,%zmm10 e90:62 c2 7d48 4e ca vrsqrt14ps %zmm10,%zmm17 e96:6231 2c4859 d1vmulps %zmm17,%zmm10,%zmm10 e9c:62127540 a8 d2 vfmadd213ps %zmm26,%zmm17,%zmm10 ea2:62 a1744059 cfvmulps %zmm23,%zmm17,%zmm17 ea8:62 d1744059 davmulps %zmm10,%zmm17,%zmm3 eae:6271 7c48285424vmovaps0x500(%rsp),%zmm10 eb5:14 eb6:6261 2c48 5c cdvsubps %zmm5,%zmm10,%zmm25 ebc:6271 7c48285424vmovaps0x200(%rsp),%zmm10 ec3:08 ec4:6281 7c4828 c9vmovaps %zmm25,%zmm17 eca:62 c23540 a8 c8 vfmadd213ps %zmm8,%zmm25,%zmm17 ed0:62 e2 6d48 b8 ca vfmadd231ps %zmm2,%zmm2,%zmm17 ed6:62 d1 2c48 5c c1vsubps %zmm9,%zmm10,%zmm0 edc:6271 7c48285424vmovaps0x2c0(%rsp),%zmm10 ee3:0b ee4:62 e2 7d48 b8 c8 vfmadd231ps %zmm0,%zmm0,%zmm17 eea:62 f1 7c48294424vmovaps %zmm0,0x5c0(%rsp) ef1:17 ef2:62 d1 4c40 5c c1vsubps %zmm9,%zmm22,%zmm0 ef8:6222 7d48 4e c1 vrsqrt14ps %zmm17,%zmm24 efe:62 f1 7c48294424vmovaps %zmm0,0x200(%rsp) f05:08 f06:6281744059 c8vmulps %zmm24,%zmm17,%zmm17 f0c:6282 3d40 a8 ca vfmadd213ps %zmm26,%zmm24,%zmm17 f12:6221 3c4059 c7vmulps %zmm23,%zmm24,%zmm24 f18:62 b1 3c4059 d1vmulps %zmm17,%zmm24,%zmm2 f1e:62 e1 7c4828 4c24vmovaps0x380(%rsp),%zmm17 f25:0e f26:6271 2c48 5c d7vsubps %zmm7,%zmm10,%zmm10 f2c:6271 7c48295424vmovaps %zmm10,0x380(%rsp) f33:0e f34:62617440 5c c5vsubps %zmm5,%zmm17,%zmm24 f3a:6281 7c4828 c8vmovaps %zmm24,%zmm17 f40:62 c2 3d40 a8 c8 vfmadd213ps %zmm8,%zmm24,%zmm17 f46:62 c2 2d48 b8 ca vfmadd231ps %zmm10,%zmm10,%zmm17 f4c:6271 7c48285424vmovaps0x340(%rsp),%zmm10 f53:0d f54:62 e2 7d48 b8 c8 vfmadd231ps %zmm0,%zmm0,%zmm17 f5a:62 a2 7d48 4e f1 vrsqrt14ps %zmm17,%zmm22 f60:62 a1744059 cevmulps %zmm22,%zmm17,%zmm17 f66:6282 4d40 a8 ca vfmadd213ps %zmm26,%zmm22,%zmm17 f6c:62 a1 4c4059 f7vmulps %zmm23,%zmm22,%zmm22 f72:62 a1 4c4059 f1vmulps %zmm17,%zmm22,%zmm22 f78:62 e1 7c4828 4c24vmovaps0x3c0(%rsp),%zmm17 f7f:0f f80:6271 2c48 5c d7vsubps %zmm7,%zmm10,%zmm10 f86:6271 7c48295424vmovaps %zmm10,0x340(%rsp) f8d:0d f8e:62 e17440 5c cdvsubps %zmm5,%zmm17,%zmm17 f94:6221 7c4828 e9vmovaps %zmm17,%zmm29 f9a:62427540 a8 e8 vfmadd213ps %zmm8,%zmm17,%zmm29 fa0:6242 2d48 b8 ea vfmadd231ps %zmm10,%zmm10,%zmm29 fa6:62716448595424vmulps0x280(%rsp),%zmm3,%zmm10 fad:0a fae:62 f1644859 dbvmulps %zmm3,%zmm3,%zmm3 fb4:6222 7d40 b8 e8 vfmadd231ps %zmm16,%zmm16,%zmm29 fba:6292 7d48 4e c5 vrsqrt14ps %zmm29,%zmm0 fc0:62 e1144059 c0vmulps %zmm0,%zmm29,%zmm16 fc6:6261 7c4828 e9vmovaps %zmm1,%zmm29 fcc:62920548 7f cb vpermt2ps %zmm27,%zmm15,%zmm1 fd2:6282 7d48 a8 c2 vfmadd213ps %zmm26,%zmm0,%zmm16 fd8:62 b1 7c4859 c7vmulps %zmm23,%zmm0,%zmm0 fde:62 d1644859 davmulps %zmm10,%zmm3,%zmm3 fe4:6202 0d48 7f eb vpermt2ps %zmm27,%zmm14,%zmm29 fea:62 a1 7c4859 c0vmulps %zmm16,%zmm0,%zmm16 ff0:62 f1 7c4828 c6vmovaps %zmm6,%zmm0 ff6:62 f20548 7f f4 vpermt2ps %zmm4,%zmm15,%zmm6 ffc:62 f2 0d48 7f c4 vpermt2ps %zmm4,%zmm14,%zmm01002:62 f1 6c48596424vmulps0x480(%rsp),%zmm2,%zmm41009:12 100a:62 f1 6c4859 d2vmulps %zmm2,%zmm2,%zmm21010:6261 6c4859 e4vmulps %zmm4,%zmm2,%zmm281016:62 f1 4c40595424vmulps0x300(%rsp),%zmm22,%zmm2 101d:0c 101e:6273 f54823 d6 e4 vshuff64x2$0xe4,%zmm6,%zmm1,%zmm101025:62 b1 4c4059 cevmulps %zmm22,%zmm22,%zmm1 102b:62 d16440 5c f1vsubps %zmm9,%zmm19,%zmm61031:62 e1 7c4059 5c24vmulps0x180(%rsp),%zmm16,%zmm191038:061039:62 a1 7c4059 c0vmulps %zmm16,%zmm16,%zmm16 103f:6263954023 e8 e4 vshuff64x2$0xe4,%zmm0,%zmm29,%zmm291046:62 f1 7c48284424vmovaps0x880(%rsp),%zmm0 104d:22 104e:62 f1 7c48297424vmovaps %zmm6,0x280(%rsp)1055:0a1056:62 e1744859 f2vmulps %zmm2,%zmm1,%zmm22 105c:62 f1 7c48285424vmovaps0x540(%rsp),%zmm21063:151064:6291 7c4828 4c 2evmovaps0xd80(%r14,%r13,1),%zmm1 106b:36 106c:62511440 5c c9vsubps %zmm9,%zmm29,%zmm91072:6271 7c4829 4c24vmovaps %zmm9,0x4c0(%rsp)1079:13 107a:6261 7c48 5c ddvsubps %zmm5,%zmm0,%zmm271080:62 f1 7c4828 c7vmovaps %zmm7,%zmm01086:62 f1 6c48 5c e5vsubps %zmm5,%zmm2,%zmm4 108c:62 f1 7c48285424vmovaps0x8c0(%rsp),%zmm21093:231094:6291 7c4828 ebvmovaps %zmm27,%zmm5 109a:6261 7c4829 5c24vmovaps %zmm27,0x400(%rsp) 10a1:10 10a2:62 f1 7c48296424vmovaps %zmm4,0x3c0(%rsp) 10a9:0f 10aa:62 d2 5d48 a8 e0 vfmadd213ps %zmm8,%zmm4,%zmm4 10b0:62 d22540 a8 e8 vfmadd213ps %zmm8,%zmm27,%zmm5 10b6:6221 7c4059 dbvmulps %zmm19,%zmm16,%zmm27 10bc:6211 7c482844 2evmovaps0xcc0(%r14,%r13,1),%zmm8 10c3:33 10c4:6281 7c482844 2evmovaps0xc40(%r14,%r13,1),%zmm16 10cb:31 10cc:62 f1 6c48 5c ffvsubps %zmm7,%zmm2,%zmm7 10d2:62 f15440 5c d0vsubps %zmm0,%zmm21,%zmm2 10d8:62 e1 7c4828 e8vmovaps %zmm0,%zmm21 10de:62 f1 7c48284424vmovaps0x80(%rsp),%zmm0 10e5:02 10e6:62 f24548 b8 e7 vfmadd231ps %zmm7,%zmm7,%zmm4 10ec:62 f1 7c4829 7c24vmovaps %zmm7,0x480(%rsp) 10f3:12 10f4:62 f2 6d48 b8 ea vfmadd231ps %zmm2,%zmm2,%zmm5 10fa:62 f1 7c48295424vmovaps %zmm2,0x500(%rsp)1101:141102:6291 7c482854 2evmovaps0xc00(%r14,%r13,1),%zmm21109:30 110a:62 f2 4d48 b8 e6 vfmadd231ps %zmm6,%zmm6,%zmm41110:62 d23548 b8 e9 vfmadd231ps %zmm9,%zmm9,%zmm51116:6291 7c482874 2evmovaps0xdc0(%r14,%r13,1),%zmm6 111d:37 111e:62 f2 7d48 4e fc vrsqrt14ps %zmm4,%zmm71124:62 f1 5c4859 e7vmulps %zmm7,%zmm4,%zmm4 112a:62924548 a8 e2 vfmadd213ps %zmm26,%zmm7,%zmm41130:62 b1444859 ffvmulps %zmm23,%zmm7,%zmm71136:62 f1444859 e4vmulps %zmm4,%zmm7,%zmm4 113c:62 f2 7d48 4e fd vrsqrt14ps %zmm5,%zmm71142:62 f1544859 efvmulps %zmm7,%zmm5,%zmm51148:62 d26548 b8 c3 vfmadd231ps %zmm11,%zmm3,%zmm0 114e:62924548 a8 ea vfmadd213ps %zmm26,%zmm7,%zmm51154:62 b1444859 ffvmulps %zmm23,%zmm7,%zmm7 115a:6271 7c4828 d9vmovaps %zmm1,%zmm111160:62 f1 7c48294424vmovaps %zmm0,0x80(%rsp)1167:021168:62 f1 7c48284424vmovaps0xc0(%rsp),%zmm0 116f:031170:6271444859 cdvmulps %zmm5,%zmm7,%zmm91176:6291 7c4828 6c 2evmovaps0xc80(%r14,%r13,1),%zmm5 117d:32 117e:62 f1 0c4059 fcvmulps %zmm4,%zmm30,%zmm71184:6261 7c4828 eavmovaps %zmm2,%zmm29 118a:62 f1 5c4859 e4vmulps %zmm4,%zmm4,%zmm41190:6261 7c4828 f2vmovaps %zmm2,%zmm301196:6261 5c4859 ffvmulps %zmm7,%zmm4,%zmm31 119c:6291 7c482864 2evmovaps0xd40(%r14,%r13,1),%zmm4 11a3:35 11a4:62221548 7f e8 vpermt2ps %zmm16,%zmm13,%zmm29 11aa:6222 6d40 7f f0 vpermt2ps %zmm16,%zmm18,%zmm30 11b0:62721548 7f de vpermt2ps %zmm6,%zmm13,%zmm11 11b6:6251 2c4859 d1vmulps %zmm9,%zmm10,%zmm10 11bc:6251344859 c9vmulps %zmm9,%zmm9,%zmm9 11c2:62 f1 7c4828 f9vmovaps %zmm1,%zmm7 11c8:62 f2 6d40 7f fe vpermt2ps %zmm6,%zmm18,%zmm7 11ce:62 d26548 b8 c4 vfmadd231ps %zmm12,%zmm3,%zmm0 11d4:6271 7c48286424vmovaps0x100(%rsp),%zmm12 11db:04 11dc:62 f1 7c48294424vmovaps %zmm0,0xc0(%rsp) 11e3:03 11e4:6291 7c482844 2evmovaps0xd00(%r14,%r13,1),%zmm0 11eb:34 11ec:62326548 b8 e4 vfmadd231ps %zmm20,%zmm3,%zmm12 11f2:62 e1 7c4828 e5vmovaps %zmm5,%zmm20 11f8:62 f1 7c4828 ddvmovaps %zmm5,%zmm3 11fe:62 c21548 7f e0 vpermt2ps %zmm8,%zmm13,%zmm201204:62 d2 6d40 7f d8 vpermt2ps %zmm8,%zmm18,%zmm3 120a:62 e1 7c4828 d8vmovaps %zmm0,%zmm191210:6272 4d40 b86424 vfmadd231ps0x200(%rsp),%zmm22,%zmm121217:081218:62 e2 6d40 7f dc vpermt2ps %zmm4,%zmm18,%zmm19 121e:62720540 b86424 vfmadd231ps0x280(%rsp),%zmm31,%zmm121225:0a1226:62 a3954023 e4 e4 vshuff64x2$0xe4,%zmm20,%zmm29,%zmm20 122d:6261 7c4828 e8vmovaps %zmm0,%zmm291233:6263 8d4023 f3 e4 vshuff64x2$0xe4,%zmm3,%zmm30,%zmm30 123a:62 d1344859 davmulps %zmm10,%zmm9,%zmm31240:6271 7c4828 d2vmovaps %zmm2,%zmm101246:62 b20548 7f d0 vpermt2ps %zmm16,%zmm15,%zmm2 124c:62621548 7f ec vpermt2ps %zmm4,%zmm13,%zmm291252:62 f1 7c4829 5c24vmovaps %zmm3,0x540(%rsp)1259:15 125a:62 f1 7c4828 ddvmovaps %zmm5,%zmm31260:6232 0d48 7f d0 vpermt2ps %zmm16,%zmm14,%zmm101266:62 e1 7c48284424vmovaps0x700(%rsp),%zmm16 126d:1c 126e:62 d20548 7f e8 vpermt2ps %zmm8,%zmm15,%zmm51274:62 f3 e54023 ff e4 vshuff64x2$0xe4,%zmm7,%zmm19,%zmm7 127b:6271 7c48296424vmovaps %zmm12,0x100(%rsp)1282:041283:62 d2 0d48 7f d8 vpermt2ps %zmm8,%zmm14,%zmm31289:6271 7c4828 c1vmovaps %zmm1,%zmm8 128f:62 f20548 7f ce vpermt2ps %zmm6,%zmm15,%zmm11295:6272 0d48 7f c6 vpermt2ps %zmm6,%zmm14,%zmm8 129b:6243954023 eb e4 vshuff64x2$0xe4,%zmm11,%zmm29,%zmm29 12a2:6271 7c4828 d8vmovaps %zmm0,%zmm11 12a8:62 f20548 7f c4 vpermt2ps %zmm4,%zmm15,%zmm0 12ae:62 f3 ed4823 ed e4 vshuff64x2$0xe4,%zmm5,%zmm2,%zmm5 12b5:62 b1 7c4828 d5vmovaps %zmm21,%zmm2 12bb:6273 ad4823 cb e4 vshuff64x2$0xe4,%zmm3,%zmm10,%zmm9 12c2:62 f1 7c4828 5c24vmovaps0x240(%rsp),%zmm3 12c9:09 12ca:62 b1 0c40 5c f0vsubps %zmm16,%zmm30,%zmm6 12d0:6221 7c4828 f0vmovaps %zmm16,%zmm30 12d6:6272 0d48 7f dc vpermt2ps %zmm4,%zmm14,%zmm11 12dc:62 f1 7c48286424vmovaps0x80(%rsp),%zmm4 12e3:02 12e4:62711440 5c d2vsubps %zmm2,%zmm29,%zmm10 12ea:6271 7c48295424vmovaps %zmm10,0x180(%rsp) 12f1:06 12f2:62 e3 fd4823 d9 e4 vshuff64x2$0xe4,%zmm1,%zmm0,%zmm19 12f9:62 b14448 5c c0vsubps %zmm16,%zmm7,%zmm0 12ff:62 e1 7c48284424vmovaps0x600(%rsp),%zmm161306:181307:6253 a54823 c0 e4 vshuff64x2$0xe4,%zmm8,%zmm11,%zmm8 130e:6271 7c4828 5c24vmovaps0x140(%rsp),%zmm111315:051316:62 f1 7c48294424vmovaps %zmm0,0x2c0(%rsp) 131d:0b 131e:6292 1d40 b8 d9 vfmadd231ps %zmm25,%zmm28,%zmm31324:6221 5c40 5c cdvsubps %zmm21,%zmm20,%zmm25 132a:62 e1 7c48286424vmovaps0x680(%rsp),%zmm201331:1a1332:6292 4d40 b8 e0 vfmadd231ps %zmm24,%zmm22,%zmm41338:62 e1 7c4828 6c24vmovaps0x1c0(%rsp),%zmm21 133f:071340:6201 7c482844 2evmovaps0xfc0(%r14,%r13,1),%zmm241347:3f1348:62 b22540 b8 d9 vfmadd231ps %zmm17,%zmm27,%zmm3 134e:6261 7c4829 4c24vmovaps %zmm25,0x300(%rsp)1355:0c1356:62 f20540 b86424 vfmadd231ps0x3c0(%rsp),%zmm31,%zmm4 135d:0f 135e:62 f1 7c48296424vmovaps %zmm4,0x80(%rsp)1365:021366:62 b2 7d48 a8 c0 vfmadd213ps %zmm16,%zmm0,%zmm0 136c:62 d2 2d48 b8 c2 vfmadd231ps %zmm10,%zmm10,%zmm01372:6271 7c4828 d3vmovaps %zmm3,%zmm101378:62 f1 7c4828 5c24vmovaps0xc0(%rsp),%zmm3 137f:031380:6272 1d40 b8 5c24 vfmadd231ps0x580(%rsp),%zmm28,%zmm111387:161388:62 f1 7c48297424vmovaps %zmm6,0x580(%rsp) 138f:161390:62 b2 4d48 a8 f0 vfmadd213ps %zmm16,%zmm6,%zmm61396:62 b13448 5c fcvsubps %zmm20,%zmm9,%zmm7 139c:62923540 b8 f1 vfmadd231ps %zmm25,%zmm25,%zmm6 13a2:6201 7c4828 4c 2evmovaps0xf80(%r14,%r13,1),%zmm25 13a9:3e 13aa:6231 3c48 5c c4vsubps %zmm20,%zmm8,%zmm8 13b0:6211 7c4828 4c 2evmovaps0xf40(%r14,%r13,1),%zmm9 13b7:3d 13b8:62 e2 1d40 b8 6c24 vfmadd231ps0x5c0(%rsp),%zmm28,%zmm21 13bf:17 13c0:62 f24548 b8 f7 vfmadd231ps %zmm7,%zmm7,%zmm6 13c6:62 f1 7c4829 7c24vmovaps %zmm7,0x240(%rsp) 13cd:09 13ce:62 d2 3d48 b8 c0 vfmadd231ps %zmm8,%zmm8,%zmm0 13d4:6271 7c48294424vmovaps %zmm8,0x1c0(%rsp) 13db:07 13dc:62 f2 7d48 4e fe vrsqrt14ps %zmm6,%zmm7 13e2:62 f2 7d48 4e d0 vrsqrt14ps %zmm0,%zmm2 13e8:62 f1 4c4859 cfvmulps %zmm7,%zmm6,%zmm1 13ee:62 f1 7c4859 f2vmulps %zmm2,%zmm0,%zmm6 13f4:62722540 b8 5c24 vfmadd231ps0x340(%rsp),%zmm27,%zmm11 13fb:0d 13fc:62924548 a8 ca vfmadd213ps %zmm26,%zmm7,%zmm11402:62 b1444859 ffvmulps %zmm23,%zmm7,%zmm71408:6292 6d48 a8 f2 vfmadd213ps %zmm26,%zmm2,%zmm6 140e:62 e22540 b8 6c24 vfmadd231ps0x440(%rsp),%zmm27,%zmm211415:111416:62 f1444859 c9vmulps %zmm1,%zmm7,%zmm1 141c:62 b1 6c4859 ffvmulps %zmm23,%zmm2,%zmm71422:62 f1444859 f6vmulps %zmm6,%zmm7,%zmm61428:62 f1744859 c1vmulps %zmm1,%zmm1,%zmm0 142e:62 f1544859 c9vmulps %zmm1,%zmm5,%zmm11434:6291 7c4828 7c 2evmovaps0xe80(%r14,%r13,1),%zmm7 143b:3a 143c:6291 7c4828 6c 2evmovaps0xe00(%r14,%r13,1),%zmm51443:381444:62 f2 4d40 b8 5c24 vfmadd231ps0x380(%rsp),%zmm22,%zmm3 144b:0e 144c:6281 7c482874 2evmovaps0xf00(%r14,%r13,1),%zmm221453:3c1454:62 e1 7c4859 c9vmulps %zmm1,%zmm0,%zmm17 145a:62 f1644059 c6vmulps %zmm6,%zmm19,%zmm01460:6271 4c4859 c6vmulps %zmm6,%zmm6,%zmm81466:6291 7c4828 4c 2evmovaps0xe40(%r14,%r13,1),%zmm1 146d:39 146e:6291 7c482874 2evmovaps0xec0(%r14,%r13,1),%zmm61475:3b1476:6271 7c4829 5c24vmovaps %zmm11,0x140(%rsp) 147d:05 147e:62 f1 3c4859 d0vmulps %zmm0,%zmm8,%zmm21484:6211 7c4828 c1vmovaps %zmm25,%zmm8 148a:6212 6d40 7f c0 vpermt2ps %zmm24,%zmm18,%zmm81490:62 f20540 b8 5c24 vfmadd231ps0x480(%rsp),%zmm31,%zmm31497:121498:6261 7c4828 dfvmovaps %zmm7,%zmm27 149e:62 f1 7c4828 c5vmovaps %zmm5,%zmm0 14a4:6271 7c4828 dfvmovaps %zmm7,%zmm11 14aa:6271 7c4828 e7vmovaps %zmm7,%zmm12 14b0:62 e1 7c4828 ddvmovaps %zmm5,%zmm19 14b6:6221 7c4828 e6vmovaps %zmm22,%zmm28 14bc:6221 7c4828 eevmovaps %zmm22,%zmm29 14c2:6262 6d40 7f de vpermt2ps %zmm6,%zmm18,%zmm27 14c8:62 f2 6d40 7f c1 vpermt2ps %zmm1,%zmm18,%zmm0 14ce:62721548 7f de vpermt2ps %zmm6,%zmm13,%zmm11 14d4:6272 0d48 7f e6 vpermt2ps %zmm6,%zmm14,%zmm12 14da:62 e2 0d48 7f d9 vpermt2ps %zmm1,%zmm14,%zmm19 14e0:62 f20548 7f fe vpermt2ps %zmm6,%zmm15,%zmm7 14e6:6242 6d40 7f e1 vpermt2ps %zmm9,%zmm18,%zmm28 14ec:6281 7c4828 d1vmovaps %zmm25,%zmm18 14f2:62421548 7f e9 vpermt2ps %zmm9,%zmm13,%zmm29 14f8:62821548 7f d0 vpermt2ps %zmm24,%zmm13,%zmm18 14fe:6293 fd4823 e3 e4 vshuff64x2$0xe4,%zmm27,%zmm0,%zmm41505:62 f1 7c48284424vmovaps0x540(%rsp),%zmm0 150c:15 150d:6253 e54023 e4 e4 vshuff64x2$0xe4,%zmm12,%zmm19,%zmm121514:6281 7c4828 d9vmovaps %zmm25,%zmm19 151a:62020548 7f c8 vpermt2ps %zmm24,%zmm15,%zmm251520:6253 9d4023 c0 e4 vshuff64x2$0xe4,%zmm8,%zmm28,%zmm81527:6261 7c4828 e5vmovaps %zmm5,%zmm28 152d:62 f20548 7f e9 vpermt2ps %zmm1,%zmm15,%zmm51533:6282 0d48 7f d8 vpermt2ps %zmm24,%zmm14,%zmm191539:62621548 7f e1 vpermt2ps %zmm1,%zmm13,%zmm28 153f:6272 7d48 b85424 vfmadd231ps0x400(%rsp),%zmm0,%zmm101546:101547:62 e2 7d48 b8 6c24 vfmadd231ps0x4c0(%rsp),%zmm0,%zmm21 154e:13 154f:62 f3 d54823 cf e4 vshuff64x2$0xe4,%zmm7,%zmm5,%zmm11556:6291 3c48 5c eevsubps %zmm30,%zmm8,%zmm5 155c:6231 1c48 5c c4vsubps %zmm20,%zmm12,%zmm81562:6243 9d4023 db e4 vshuff64x2$0xe4,%zmm11,%zmm28,%zmm271569:6223954023 e2 e4 vshuff64x2$0xe4,%zmm18,%zmm29,%zmm281570:62 e1 7c48285424vmovaps0x140(%rsp),%zmm181577:051578:6271 7c4828 dbvmovaps %zmm3,%zmm11 157e:62 b1 7c4828 devmovaps %zmm22,%zmm31584:62 c20548 7f f1 vpermt2ps %zmm9,%zmm15,%zmm22 158a:62 d2 0d48 7f d9 vpermt2ps %zmm9,%zmm14,%zmm31590:6271 7c48287424vmovaps0x80(%rsp),%zmm141597:021598:6251 7c4828 ebvmovaps %zmm11,%zmm13 159e:6271 7c4828 5c24vmovaps0x100(%rsp),%zmm11 15a5:04 15a6:62727540 b8 6c24 vfmadd231ps0x300(%rsp),%zmm17,%zmm13 15ad:0c 15ae:6272 6d48 b85424 vfmadd231ps0x2c0(%rsp),%zmm2,%zmm10 15b5:0b 15b6:62 e2 6d48 b8 6c24 vfmadd231ps0x1c0(%rsp),%zmm2,%zmm21 15bd:07 15be:62 e2 7d48 b85424 vfmadd231ps0x500(%rsp),%zmm0,%zmm18 15c5:14 15c6:6291 5c48 5c c6vsubps %zmm30,%zmm4,%zmm0 15cc:62 f1 7c48286424vmovaps0x6c0(%rsp),%zmm4 15d3:1b 15d4:62 a3 e54823 db e4 vshuff64x2$0xe4,%zmm19,%zmm3,%zmm19 15db:6293 cd4023 d9 e4 vshuff64x2$0xe4,%zmm25,%zmm22,%zmm3 15e2:6271 7c4828 e0vmovaps %zmm0,%zmm12 15e8:62727540 b87424 vfmadd231ps0x580(%rsp),%zmm17,%zmm14 15ef:16 15f0:62727540 b8 5c24 vfmadd231ps0x240(%rsp),%zmm17,%zmm11 15f7:09 15f8:6232 7d48 a8 e0 vfmadd213ps %zmm16,%zmm0,%zmm12 15fe:62316440 5c ccvsubps %zmm20,%zmm19,%zmm91604:62 e2 6d48 b85424 vfmadd231ps0x180(%rsp),%zmm2,%zmm18 160b:06 160c:62 f12440 5c f4vsubps %zmm4,%zmm27,%zmm61612:62 f1 1c40 5c fcvsubps %zmm4,%zmm28,%zmm71618:62 f1 7c4828 e5vmovaps %zmm5,%zmm4 161e:62 b25548 a8 e0 vfmadd213ps %zmm16,%zmm5,%zmm41624:6272 4d48 b8 e6 vfmadd231ps %zmm6,%zmm6,%zmm12 162a:62 f24548 b8 e7 vfmadd231ps %zmm7,%zmm7,%zmm41630:6252 3d48 b8 e0 vfmadd231ps %zmm8,%zmm8,%zmm121636:62 d23548 b8 e1 vfmadd231ps %zmm9,%zmm9,%zmm4 163c:62 c2 7d48 4e e4 vrsqrt14ps %zmm12,%zmm201642:62 e2 7d48 4e c4 vrsqrt14ps %zmm4,%zmm161648:6231 1c4859 e4vmulps %zmm20,%zmm12,%zmm12 164e:62 a1 5c4859 d8vmulps %zmm16,%zmm4,%zmm191654:6212 5d40 a8 e2 vfmadd213ps %zmm26,%zmm20,%zmm12 165a:62 a1 5c4059 e7vmulps %zmm23,%zmm20,%zmm201660:62 b1 7c4059 e7vmulps %zmm23,%zmm16,%zmm41666:6282 7d40 a8 da vfmadd213ps %zmm26,%zmm16,%zmm19 166c:6251 5c4059 e4vmulps %zmm12,%zmm20,%zmm121672:62 b1 5c4859 e3vmulps %zmm19,%zmm4,%zmm41678:62 c1 1c4859 c4vmulps %zmm12,%zmm12,%zmm16 167e:62 d1744859 ccvmulps %zmm12,%zmm1,%zmm11684:62 f1644859 d4vmulps %zmm4,%zmm3,%zmm2 168a:62 e1 5c4859 ccvmulps %zmm4,%zmm4,%zmm171690:62 f1 7c4059 c9vmulps %zmm1,%zmm16,%zmm11696:62 d1 7c4828 devmovaps %zmm14,%zmm3 169c:62 b1 7c4828 e2vmovaps %zmm18,%zmm4 16a2:62 f1744059 d2vmulps %zmm2,%zmm17,%zmm2 16a8:62 f27548 b8 d8 vfmadd231ps %zmm0,%zmm1,%zmm3 16ae:6272 6d48 b8 d5 vfmadd231ps %zmm5,%zmm2,%zmm10 16b4:62 f2 6d48 b8 e7 vfmadd231ps %zmm7,%zmm2,%zmm4 16ba:62 c2 6d48 b8 e9 vfmadd231ps %zmm9,%zmm2,%zmm21 16c0:62 f1 2c4858 c3vaddps %zmm3,%zmm10,%zmm0 16c6:62 d1 7c4828 ddvmovaps %zmm13,%zmm3 16cc:62 f3 fd481b c201 vextractf64x4$0x1,%zmm0,%ymm2 16d3:62 f27548 b8 de vfmadd231ps %zmm6,%zmm1,%zmm3 16d9:62 f1 5c4858 dbvaddps %zmm3,%zmm4,%zmm3 16df:62 d1 7c4828 e3vmovaps %zmm11,%zmm4 16e5:62 f1 7c4858 c2vaddps %zmm2,%zmm0,%zmm0 16eb:62 d27548 b8 e0 vfmadd231ps %zmm8,%zmm1,%zmm4 16f1:c4 e3 7d19 c201vextractf128$0x1,%ymm0,%xmm2 16f7:62 f1544058 ccvaddps %zmm4,%zmm21,%zmm1 16fd:62 f3 fd481b dc01 vextractf64x4$0x1,%zmm3,%ymm41704:c5 f858 c2vaddps %xmm2,%xmm0,%xmm01708:62 f1644858 dcvaddps %zmm4,%zmm3,%zmm3 170e:c4 e3 7d19 dc01vextractf128$0x1,%ymm3,%xmm41714:c4 e37905 d001vpermilpd$0x1,%xmm0,%xmm2 171a:c5 e058 dcvaddps %xmm4,%xmm3,%xmm3 171e:c5 f858 c2vaddps %xmm2,%xmm0,%xmm01722:c5 fa16 d0vmovshdup %xmm0,%xmm21726:c5 fa58 c2vaddss %xmm2,%xmm0,%xmm0 172a:c5 f82984240001vmovaps %xmm0,0x100(%rsp)1731:00001733:c4 e37905 c301vpermilpd$0x1,%xmm3,%xmm01739:c5 e058 c0vaddps %xmm0,%xmm3,%xmm0 173d:62 f3 fd481b cb01 vextractf64x4$0x1,%zmm1,%ymm31744:62 f1744858 cbvaddps %zmm3,%zmm1,%zmm1 174a:c5 fa16 d0vmovshdup %xmm0,%xmm2 174e:c5 fa58 c2vaddss %xmm2,%xmm0,%xmm01752:c5 f8298424 c000vmovaps %xmm0,0xc0(%rsp)1759:0000 175b:c4 e3 7d19 c801vextractf128$0x1,%ymm1,%xmm01761:c5 f058 c0vaddps %xmm0,%xmm1,%xmm01765:c4 e37905 c801vpermilpd$0x1,%xmm0,%xmm1 176b:c5 f858 c1vaddps %xmm1,%xmm0,%xmm0 176f:c5 fa16 c8vmovshdup %xmm0,%xmm11773:c5 fa58 c1vaddss %xmm1,%xmm0,%xmm01777:c5 f82984248000vmovaps %xmm0,0x80(%rsp) 177e:00001780:c5 f877vzeroupper1783:41 ff d4call*%r121786:4181 c700010000add$0x100,%r15d 178d:4139 dfcmp %ebx,%r15d1790:0f82 5a ea ff ffjb 1f0 <simplified_nbody+0x1f0>1796:eb27jmp 17bf <simplified_nbody+0x17bf>1798:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0 179c:c5 f82984240001vmovaps %xmm0,0x100(%rsp) 17a3:0000 17a5:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0 17a9:c5 f8298424 c000vmovaps %xmm0,0xc0(%rsp) 17b0:0000 17b2:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0 17b6:c5 f82984248000vmovaps %xmm0,0x80(%rsp) 17bd:0000 17bf:48 8b442450mov0x50(%rsp),%rax 17c4:c5 f82894240001vmovaps0x100(%rsp),%xmm2 17cb:0000 17cd:c5 f828 9c24 c000vmovaps0xc0(%rsp),%xmm3 17d4:0000 17d6:c5 f828 a4248000vmovaps0x80(%rsp),%xmm4 17dd:0000 17df:48 8b 4c2458mov0x58(%rsp),%rcx 17e4:c5 fa1000vmovss (%rax),%xmm0 17e8:48 b80000000000 movabs$0x0,%rax 17ef:000000 17f2:c4 c1 7a10 0c06vmovss (%r14,%rax,1),%xmm1 17f8:48 8b442468mov0x68(%rsp),%rax 17fd:c4 e279 a95424 3c vfmadd213ss0x3c(%rsp),%xmm0,%xmm21804:c4 e279 a9 5c2440 vfmadd213ss0x40(%rsp),%xmm0,%xmm3 180b:c4 e279 a9642444 vfmadd213ss0x44(%rsp),%xmm0,%xmm41812:c5 ea59 d1vmulss %xmm1,%xmm2,%xmm21816:c5 e259 d9vmulss %xmm1,%xmm3,%xmm3 181a:c5 da59 c9vmulss %xmm1,%xmm4,%xmm1 181e:c5 fa102408vmovss (%rax,%rcx,1),%xmm41823:c4 e269 b9 e0 vfmadd231ss %xmm0,%xmm2,%xmm41828:c5 fa112408vmovss %xmm4,(%rax,%rcx,1) 182d:c5 fa10640804vmovss0x4(%rax,%rcx,1),%xmm41833:c4 e261 b9 e0 vfmadd231ss %xmm0,%xmm3,%xmm41838:c5 fa11640804vmovss %xmm4,0x4(%rax,%rcx,1) 183e:c4 e271 a9440808 vfmadd213ss0x8(%rax,%rcx,1),%xmm1,%xmm01845:c5 fa11440808vmovss %xmm0,0x8(%rax,%rcx,1) 184b:48 8b 4c2460mov0x60(%rsp),%rcx1850:c5 fa1111vmovss %xmm2,(%rcx)1854:c5 fa115904vmovss %xmm3,0x4(%rcx)1859:c5 fa114908vmovss %xmm1,0x8(%rcx) 185e:48 8d65 d8lea-0x28(%rbp),%rsp1862:5bpop %rbx1863:41 5cpop %r121865:41 5dpop %r131867:41 5epop %r141869:41 5fpop %r15 186b:5dpop %rbp 186c:c3ret
Host-Compute (ARM CPU)
Note that the compiler would usually directly output a.bin file (ELF format). The output below comes from disassembling it withobjdump -d
. Also note that this has been compiled for thearm-7
target (ARMv8.6 + FP16 + FP16FML, e.g. Apple M2+/A15+).nbody_aarch64.bin:file formatelf64-littleaarch64Disassembly of section .text:0000000000000000 <simplified_nbody>:0:d104c3ffsubsp,sp, #0x1304:90000008 adrpx8,0 <floor_global_idx>8:6d0a33ed stpd13, d12,[sp, #160] c:6d0b2beb stpd11, d10,[sp, #176]10:6d0c23e9 stpd9, d8,[sp, #192]14:a90d7bfd stpx29, x30,[sp, #208]18:910343fdaddx29,sp, #0xd0 1c:a90e6ffc stpx28, x27,[sp, #224]20:a90f67fa stpx26, x25,[sp, #240]24:a9105ff8 stpx24, x23,[sp, #256]28:a91157f6 stpx22, x21,[sp, #272] 2c:a9124ff4 stpx20, x19,[sp, #288]30:f9400108 ldrx8,[x8]34:b9400117 ldrw23,[x8]38:52800188movw8, #0xc // #12 3c:9b080af6 maddx22, x23, x8, x240:90000008 adrpx8,0 <floor_global_work_size>44:aa1603f8movx24, x2248:f9400108 ldrx8,[x8] 4c:fd4002c8 ldrd8,[x22]50:bc408f09 ldrs9,[x24, #8]!54:b9400119 ldrw25,[x8]58:34000d79 cbzw25,204 <simplified_nbody+0x204> 5c:2f00e403 movid3, #0x060:8b171008addx8, x0, x23,lsl #464:9000001c adrpx28,0 <floor_local_idx>68:a90007e3 stpx3, x1,[sp] 6c:90000013 adrpx19,0 <simplified_nbody>70:90000014 adrpx20,0 <host_compute_device_barrier>74:aa0003f5movx21, x078:2a1f03famovw26, wzr 7c:f940039c ldrx28,[x28]80:3c9a03a3 sturq3,[x29, #-96]84:2d400500 ldps0, s1,[x8]88:bd400902 ldrs2,[x8, #8] 8c:5296e2e8movw8, #0xb717 // #4687190:4f03f603 fmovv3.4s, #1.000000000000000000e+0094:72a71a28 movkw8, #0x38d1,lsl #1698:2a1f03fbmovw27, wzr 9c:3d8017e3strq3,[sp, #80] a0:4e040403 dupv3.4s, v0.s[0] a4:4e040d00 dupv0.4s, w8 a8:f9400273 ldrx19,[x19] ac:ad018fe0 stpq0, q3,[sp, #48] b0:2f00e400 movid0, #0x0 b4:4e040423 dupv3.4s, v1.s[0] b8:3d801be0strq0,[sp, #96] bc:2f00e400 movid0, #0x0 c0:3c9b03a0 sturq0,[x29, #-80] c4:4e040440 dupv0.4s, v2.s[0] c8:f9400294 ldrx20,[x20] cc:ad008fe0 stpq0, q3,[sp, #16] d0:b9400388 ldrw8,[x28] d4:0b1b2109addw9, w8, w27,lsl #8 d8:3ce95aa0 ldrq0,[x21, w9, uxtw #4] dc:3ca87a60strq0,[x19, x8,lsl #4] e0:d63f0280 blrx20 e4:6f00e400 moviv0.2d, #0x0 e8:3cda03a4 ldurq4,[x29, #-96] ec:6f00e402 moviv2.2d, #0x0 f0:aa1f03e8movx8, xzr f4:6f00e403 moviv3.2d, #0x0 f8:ad41abeb ldpq11, q10,[sp, #48] fc:6e040480movv0.s[0], v4.s[0]100:6f00e401 moviv1.2d, #0x0104:6f00e405 moviv5.2d, #0x0108:ad4293ff ldpq31, q4,[sp, #80] 10c:6e040482movv2.s[0], v4.s[0]110:3cdb03a4 ldurq4,[x29, #-80]114:ad40b3ed ldpq13, q12,[sp, #16]118:6e040483movv3.s[0], v4.s[0] 11c:6f00e404 moviv4.2d, #0x0120:8b080269addx9, x19, x8124:91020108addx8, x8, #0x80128:4eab1d67movv7.16b, v11.16b 12c:f140051fcmpx8, #0x1,lsl #12130:4eab1d7bmovv27.16b, v11.16b134:4cdf0930 ld4{v16.4s-v19.4s},[x9], #64138:4eaad606fsubv6.4s, v16.4s, v10.4s 13c:4eacd638fsubv24.4s, v17.4s, v12.4s140:4eadd659fsubv25.4s, v18.4s, v13.4s144:4e26ccc7 fmlav7.4s, v6.4s, v6.4s148:4e38cf07 fmlav7.4s, v24.4s, v24.4s 14c:4c400934 ld4{v20.4s-v23.4s},[x9]150:4e39cf27 fmlav7.4s, v25.4s, v25.4s154:6ea1f8e7fsqrtv7.4s, v7.4s158:4eaad69afsubv26.4s, v20.4s, v10.4s 15c:4eacd6bcfsubv28.4s, v21.4s, v12.4s160:4eadd6ddfsubv29.4s, v22.4s, v13.4s164:4e3acf5b fmlav27.4s, v26.4s, v26.4s168:6e27ffe7fdivv7.4s, v31.4s, v7.4s 16c:4e3ccf9b fmlav27.4s, v28.4s, v28.4s170:4e3dcfbb fmlav27.4s, v29.4s, v29.4s174:6ea1fb7bfsqrtv27.4s, v27.4s178:6e27dcfefmulv30.4s, v7.4s, v7.4s 17c:6e27de67fmulv7.4s, v19.4s, v7.4s180:6e27dfc7fmulv7.4s, v30.4s, v7.4s184:6e3bfffbfdivv27.4s, v31.4s, v27.4s188:4e26cce3 fmlav3.4s, v7.4s, v6.4s 18c:4e38cce2 fmlav2.4s, v7.4s, v24.4s190:4e39cce0 fmlav0.4s, v7.4s, v25.4s194:6e3bdf70fmulv16.4s, v27.4s, v27.4s198:6e3bdef1fmulv17.4s, v23.4s, v27.4s 19c:6e31de10fmulv16.4s, v16.4s, v17.4s 1a0:4e3ace05 fmlav5.4s, v16.4s, v26.4s 1a4:4e3cce04 fmlav4.4s, v16.4s, v28.4s 1a8:4e3dce01 fmlav1.4s, v16.4s, v29.4s 1ac:54fffba1 b.ne120 <simplified_nbody+0x120> // b.any 1b0:4e23d4a3faddv3.4s, v5.4s, v3.4s 1b4:4e20d420faddv0.4s, v1.4s, v0.4s 1b8:4e22d482faddv2.4s, v4.4s, v2.4s 1bc:6e20d461faddpv1.4s, v3.4s, v0.4s 1c0:6e20d442faddpv2.4s, v2.4s, v0.4s 1c4:6e20d400faddpv0.4s, v0.4s, v0.4s 1c8:7e30d821faddps1, v1.2s 1cc:7e30d800faddps0, v0.2s 1d0:ad3d07a0 stpq0, q1,[x29, #-96] 1d4:7e30d841faddps1, v2.2s 1d8:3d801be1strq1,[sp, #96] 1dc:d63f0280 blrx20 1e0:1104035aaddw26, w26, #0x100 1e4:1100077baddw27, w27, #0x1 1e8:6b19035fcmpw26, w25 1ec:54fff723 b.ccd0 <simplified_nbody+0xd0> // b.lo, b.ul, b.last 1f0:ad7d07a2 ldpq2, q1,[x29, #-96] 1f4:3dc01be0 ldrq0,[sp, #96] 1f8:a94007e3 ldpx3, x1,[sp] 1fc:6e0c0401movv1.s[1], v0.s[0]200:14000003 b20c <simplified_nbody+0x20c>204:2f00e401 movid1, #0x0208:2f00e402 movid2, #0x0 20c:5297cee8movw8, #0xbe77 // #48759210:bd400060 ldrs0,[x3]214:72a7efe8 movkw8, #0x3f7f,lsl #16218:8b171029addx9, x1, x23,lsl #4 21c:a9524ff4 ldpx20, x19,[sp, #288]220:0f801028 fmlav8.2s, v1.2s, v0.s[0]224:1f022402 fmadds2, s0, s2, s9228:0e040d01 dupv1.2s, w8 22c:1e270103 fmovs3, w8230:fd400124 ldrd4,[x9]234:a94f67fa ldpx26, x25,[sp, #240]238:1e230842fmuls2, s2, s3 23c:2e21dd01fmulv1.2s, v8.2s, v1.2s240:bd400923 ldrs3,[x9, #8]244:a94e6ffc ldpx28, x27,[sp, #224]248:bd000302strs2,[x24] 24c:0f801024 fmlav4.2s, v1.2s, v0.s[0]250:1f000c40 fmadds0, s2, s0, s3254:fd0002c1strd1,[x22]258:a95157f6 ldpx22, x21,[sp, #272] 25c:a9505ff8 ldpx24, x23,[sp, #256]260:fd000124strd4,[x9]264:a94d7bfd ldpx29, x30,[sp, #208]268:bd000920strs0,[x9, #8] 26c:6d4c23e9 ldpd9, d8,[sp, #192]270:6d4b2beb ldpd11, d10,[sp, #176]274:6d4a33ed ldpd13, d12,[sp, #160]278:9104c3ffaddsp,sp, #0x130 27c:d65f03c0ret
Metal / AIR
Note that the compiler would usually directly output a.metallib file. The output below comes from disassembling it withmetallib-dis
(provided by thetoolchain).; ModuleID = 'bc_module'source_filename ="simplified_nbody"targetdatalayout ="e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"targettriple ="air64-apple-macosx14.0.0"%class.vector4 =type {%union.anon }%union.anon =type {%struct.anon }%struct.anon =type {float,float,float,float }%class.vector3 =type {%union.anon.8 }%union.anon.8 =type {%struct.anon.9 }%struct.anon.9 =type {float,float,float }@_ZZ16simplified_nbodyE20local_body_positions =internaladdrspace(3)unnamed_addrglobal [256 x%class.vector4]undef,align16; Function Attrs: nounwinddefinevoid@simplified_nbody(%class.vector4addrspace(1)*noaliasnocapturereadonly%0,%class.vector4addrspace(1)*noaliasnocapture%1,%class.vector3addrspace(1)*noaliasnocapture%2,floataddrspace(2)*noaliasnocapturereadonlyalign4dereferenceable(4)%3, <3 xi32>%4, <3 xi32>%5, <3 xi32>%6, <3 xi32>%7, <3 xi32>%8, <3 xi32>%9,i32%10,i32%11,i32%12,i32%13)local_unnamed_addr #0!reqd_work_group_size!33!kernel_dim!34 {%15 =extractelement <3 xi32>%4,i640%16 =zexti32%15toi64%17 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%16,i320,i320,i320%18 =bitcastfloataddrspace(1)*%17to <3 xfloat>addrspace(1)*%19 =load <3 xfloat>, <3 xfloat>addrspace(1)*%18,align4%20 =extractelement <3 xfloat>%19,i640%21 =getelementptrinbounds%class.vector3,%class.vector3addrspace(1)*%2,i64%16,i320,i320,i320%22 =bitcastfloataddrspace(1)*%21to <3 xfloat>addrspace(1)*%23 =load <3 xfloat>, <3 xfloat>addrspace(1)*%22,align4%24 =extractelement <3 xi32>%5,i640%25 =extractelement <3 xi32>%6,i640%26 =zexti32%25toi64%27 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@_ZZ16simplified_nbodyE20local_body_positions,i640,i64%26,i320,i320,i320%28 =bitcastfloataddrspace(3)*%27to <4 xfloat>addrspace(3)*%29 =shufflevector <3 xfloat>%19, <3 xfloat>undef, <2 xi32> <i321,i322>brlabel%5730:; preds = %68%31 =extractelement <3 xfloat>%23,i640%32 =loadfloat,floataddrspace(2)*%3,align4%33 =fmul fastfloat%32,%100%34 =insertelement <2 xfloat>undef,float%32,i640%35 =shufflevector <2 xfloat>%34, <2 xfloat>undef, <2 xi32>zeroinitializer%36 =fmul fast <2 xfloat>%35,%101%37 =fadd fastfloat%33,%31%38 =shufflevector <3 xfloat>%23, <3 xfloat>undef, <2 xi32> <i321,i322>%39 =fadd fast <2 xfloat>%36,%38%40 =fmul fastfloat%37,0x3FEFF7CEE0000000%41 =fmul fast <2 xfloat>%39, <float0x3FEFF7CEE0000000,float0x3FEFF7CEE0000000>%42 =fmul fastfloat%40,%32%43 =fmul fast <2 xfloat>%41,%35%44 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%1,i64%16,i320,i320,i320%45 =bitcastfloataddrspace(1)*%44to <3 xfloat>addrspace(1)*%46 =load <3 xfloat>, <3 xfloat>addrspace(1)*%45,align4,!tbaa!35%47 =extractelement <3 xfloat>%46,i640%48 =fadd fastfloat%42,%47%49 =shufflevector <3 xfloat>%46, <3 xfloat>undef, <2 xi32> <i321,i322>%50 =fadd fast <2 xfloat>%43,%49%51 =insertelement <3 xfloat>undef,float%48,i640%52 =shufflevector <2 xfloat>%50, <2 xfloat>undef, <3 xi32> <i320,i321,i32undef>%53 =shufflevector <3 xfloat>%51, <3 xfloat>%52, <3 xi32> <i320,i323,i324>store <3 xfloat>%53, <3 xfloat>addrspace(1)*%45,align4,!tbaa!35%54 =insertelement <3 xfloat>undef,float%40,i640%55 =shufflevector <2 xfloat>%41, <2 xfloat>undef, <3 xi32> <i320,i321,i32undef>%56 =shufflevector <3 xfloat>%54, <3 xfloat>%55, <3 xi32> <i320,i323,i324>store <3 xfloat>%56, <3 xfloat>addrspace(1)*%22,align4,!tbaa!35retvoid57:; preds = %68, %14%58 =phii32 [0,%14 ], [%69,%68 ]%59 =phii32 [0,%14 ], [%70,%68 ]%60 =phifloat [0.000000e+00,%14 ], [%100,%68 ]%61 =phi <2 xfloat> [zeroinitializer,%14 ], [%101,%68 ]%62 =shli32%59,8%63 =addi32%25,%62%64 =zexti32%63toi64%65 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%64,i320,i320,i320%66 =bitcastfloataddrspace(1)*%65to <4 xfloat>addrspace(1)*%67 =load <4 xfloat>, <4 xfloat>addrspace(1)*%66,align4store <4 xfloat>%67, <4 xfloat>addrspace(3)*%28,align4,!tbaa!35tailcallvoid@air.wg.barrier(i322,i321) #3brlabel%7268:; preds = %72tailcallvoid@air.wg.barrier(i322,i321) #3%69 =addi32%58,256%70 =addi32%59,1%71 =icmpulti32%69,%24bri1%71,label%57,label%30,!llvm.loop!3872:; preds = %72, %57%73 =phii32 [0,%57 ], [%102,%72 ]%74 =phifloat [%60,%57 ], [%100,%72 ]%75 =phi <2 xfloat> [%61,%57 ], [%101,%72 ]%76 =zexti32%73toi64%77 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@_ZZ16simplified_nbodyE20local_body_positions,i640,i64%76,i320,i320,i320%78 =bitcastfloataddrspace(3)*%77to <4 xfloat>addrspace(3)*%79 =load <4 xfloat>, <4 xfloat>addrspace(3)*%78,align4%80 =extractelement <4 xfloat>%79,i640%81 =extractelement <4 xfloat>%79,i643%82 =fsub fastfloat%80,%20%83 =shufflevector <4 xfloat>%79, <4 xfloat>undef, <2 xi32> <i321,i322>%84 =fsub fast <2 xfloat>%83,%29%85 =fmul fastfloat%82,%82%86 =fmul fast <2 xfloat>%84,%84%87 =extractelement <2 xfloat>%86,i640%88 =extractelement <2 xfloat>%86,i641%89 =fadd fastfloat%85,0x3F1A36E2E0000000%90 =fadd fastfloat%89,%87%91 =fadd fastfloat%90,%88%92 =tailcall fastfloat@air.fast_rsqrt.f32(float%91) #4%93 =fmul fastfloat%92,%92%94 =fmul fastfloat%93,%92%95 =fmul fastfloat%94,%81%96 =fmul fastfloat%95,%82%97 =insertelement <2 xfloat>undef,float%95,i640%98 =shufflevector <2 xfloat>%97, <2 xfloat>undef, <2 xi32>zeroinitializer%99 =fmul fast <2 xfloat>%98,%84%100 =fadd fastfloat%96,%74%101 =fadd fast <2 xfloat>%99,%75%102 =addnuwnswi32%73,1%103 =icmpeqi32%102,256bri1%103,label%68,label%72,!llvm.loop!40}; Function Attrs: nounwind readnonedeclarefloat@air.fast_rsqrt.f32(float)local_unnamed_addr #1; Function Attrs: convergent noduplicatedeclarevoid@air.wg.barrier(i32,i32)local_unnamed_addr #2attributes #0 = {nounwind"approx-func-fp-math"="true""frame-pointer"="all""less-precise-fpmad"="true""no-infs-fp-math"="true""no-nans-fp-math"="true""no-signed-zeros-fp-math"="true""no-trapping-math"="true""stack-protector-buffer-size"="8""uniform-work-group-size"="true""unsafe-fp-math"="true" }attributes #1 = {nounwindreadnone"approx-func-fp-math"="true""frame-pointer"="all""less-precise-fpmad"="true""no-infs-fp-math"="true""no-nans-fp-math"="true""no-signed-zeros-fp-math"="true""no-trapping-math"="true""stack-protector-buffer-size"="8""unsafe-fp-math"="true" }attributes #2 = {convergentnoduplicate"approx-func-fp-math"="true""frame-pointer"="all""less-precise-fpmad"="true""no-infs-fp-math"="true""no-nans-fp-math"="true""no-signed-zeros-fp-math"="true""no-trapping-math"="true""stack-protector-buffer-size"="8""unsafe-fp-math"="true" }attributes #3 = {convergentnoduplicatenounwind }attributes #4 = {nounwindreadnone }!air.kernel = !{!0}!air.version = !{!18}!air.language_version = !{!19}!air.compile_options = !{!20,!21,!22}!llvm.module.flags = !{!23,!24,!25,!26,!27,!28,!29,!30,!31}!llvm.ident = !{!32}!0 = !{void (%class.vector4addrspace(1)*,%class.vector4addrspace(1)*,%class.vector3addrspace(1)*,floataddrspace(2)*, <3 xi32>, <3 xi32>, <3 xi32>, <3 xi32>, <3 xi32>, <3 xi32>,i32,i32,i32,i32)*@simplified_nbody,!1,!2,!17}!1 = !{}!2 = !{!3,!4,!5,!6,!7,!8,!9,!10,!11,!12,!13,!14,!15,!16}!3 = !{i320, !"air.buffer", !"air.location_index",i320,i321, !"air.read", !"air.address_space",i321, !"air.arg_type_size",i3216, !"air.arg_type_align_size",i3216, !"air.arg_type_name", !"float4", !"air.arg_name", !"in_positions"}!4 = !{i321, !"air.buffer", !"air.location_index",i321,i321, !"air.read_write", !"air.address_space",i321, !"air.arg_type_size",i3216, !"air.arg_type_align_size",i3216, !"air.arg_type_name", !"float4", !"air.arg_name", !"out_positions"}!5 = !{i322, !"air.buffer", !"air.location_index",i322,i321, !"air.read_write", !"air.address_space",i321, !"air.arg_type_size",i3212, !"air.arg_type_align_size",i3212, !"air.arg_type_name", !"float3", !"air.arg_name", !"inout_velocities"}!6 = !{i323, !"air.buffer", !"air.buffer_size",i324, !"air.location_index",i323,i321, !"air.read", !"air.address_space",i322, !"air.arg_type_size",i324, !"air.arg_type_align_size",i324, !"air.arg_type_name", !"float", !"air.arg_name", !"time_delta"}!7 = !{i324, !"air.thread_position_in_grid", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__global_id__"}!8 = !{i325, !"air.threads_per_grid", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__global_size__"}!9 = !{i326, !"air.thread_position_in_threadgroup", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__local_id__"}!10 = !{i327, !"air.threads_per_threadgroup", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__local_size__"}!11 = !{i328, !"air.threadgroup_position_in_grid", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__group_id__"}!12 = !{i329, !"air.threadgroups_per_grid", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__group_size__"}!13 = !{i3210, !"air.simdgroup_index_in_threadgroup", !"air.arg_type_name", !"uint", !"air.arg_name", !"__metal__sub_group_id__"}!14 = !{i3211, !"air.thread_index_in_simdgroup", !"air.arg_type_name", !"uint", !"air.arg_name", !"__metal__sub_group_local_id__"}!15 = !{i3212, !"air.threads_per_simdgroup", !"air.arg_type_name", !"uint", !"air.arg_name", !"__metal__sub_group_size__"}!16 = !{i3213, !"air.simdgroups_per_threadgroup", !"air.arg_type_name", !"uint", !"air.arg_name", !"__metal__num_sub_groups__"}!17 = !{!"air.max_work_group_size",i32256}!18 = !{i322,i326,i320}!19 = !{!"Metal",i323,i321,i320}!20 = !{!"air.compile.denorms_disable"}!21 = !{!"air.compile.fast_math_enable"}!22 = !{!"air.compile.framebuffer_fetch_enable"}!23 = !{i327, !"air.max_device_buffers",i3231}!24 = !{i327, !"air.max_constant_buffers",i3231}!25 = !{i327, !"air.max_threadgroup_buffers",i3231}!26 = !{i327, !"air.max_textures",i32128}!27 = !{i327, !"air.max_read_write_textures",i328}!28 = !{i327, !"air.max_samplers",i3216}!29 = !{i321, !"wchar_size",i324}!30 = !{i327, !"frame-pointer",i322}!31 = !{i322, !"SDK Version", [2 xi32] [i3214,i320]}!32 = !{!"Apple metal version 32023.155 (metalfe-32023.155)"}!33 = !{i32256,i321,i321}!34 = !{i321}!35 = !{!36,!36,i640}!36 = !{!"omnipotent char",!37,i640}!37 = !{!"Simple C++ TBAA"}!38 = distinct !{!38,!39}!39 = !{!"llvm.loop.mustprogress"}!40 = distinct !{!40,!39}
OpenCL / SPIR
Note that the compiler would usually directly output a.bc file. The output below comes from disassembling it withllvm-dis
(provided by thetoolchain). Also note that the bitcode file is exported in a LLVM 3.2 / SPIR 1.2 compatible format, but the output below uses LLVM 14.0 syntax.; ModuleID = 'spir.bc'source_filename ="spir.bc"targetdatalayout ="e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024"targettriple ="spir64-unknown-unknown"%class.vector4 =type {%union.anon }%union.anon =type {%struct.anon }%struct.anon =type {float,float,float,float }%class.vector3 =type {%union.anon.8 }%union.anon.8 =type {%struct.anon.9 }%struct.anon.9 =type {float,float,float }@simplified_nbody.local_body_positions =internalunnamed_addraddrspace(3)global [256 x%class.vector4]undef,align4define floor_kernelvoid@simplified_nbody(%class.vector4addrspace(1)*%0,%class.vector4addrspace(1)*%1,%class.vector3addrspace(1)*%2,float%3) {%5 =tailcall floor_funci64@_Z13get_global_idj(i320),!range!14%6 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%5,i320,i320,i320%7 =loadfloat,floataddrspace(1)*%6,align4%8 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%5,i320,i320,i321%9 =loadfloat,floataddrspace(1)*%8,align4%10 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%5,i320,i320,i322%11 =loadfloat,floataddrspace(1)*%10,align4%12 =getelementptrinbounds%class.vector3,%class.vector3addrspace(1)*%2,i64%5,i320,i320,i320%13 =loadfloat,floataddrspace(1)*%12,align4%14 =getelementptrinbounds%class.vector3,%class.vector3addrspace(1)*%2,i64%5,i320,i320,i321%15 =loadfloat,floataddrspace(1)*%14,align4%16 =getelementptrinbounds%class.vector3,%class.vector3addrspace(1)*%2,i64%5,i320,i320,i322%17 =loadfloat,floataddrspace(1)*%16,align4%18 =tailcall floor_funci64@_Z15get_global_sizej(i320),!range!15%19 =trunci64%18toi32,!range!16%20 =tailcall floor_funci64@_Z12get_local_idj(i320),!range!17%21 =trunci64%20toi32,!range!18%22 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%20,i320,i320,i320%23 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%20,i320,i320,i321%24 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%20,i320,i320,i322%25 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%20,i320,i320,i323brlabel%4826:; preds = %65%27 =fmulfloat%98,%3%28 =fmulfloat%99,%3%29 =fmulfloat%100,%3%30 =faddfloat%27,%13%31 =faddfloat%28,%15%32 =faddfloat%29,%17%33 =fmulfloat%30,0x3FEFF7CEE0000000%34 =fmulfloat%31,0x3FEFF7CEE0000000%35 =fmulfloat%32,0x3FEFF7CEE0000000%36 =fmulfloat%33,%3%37 =fmulfloat%34,%3%38 =fmulfloat%35,%3%39 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%1,i64%5,i320,i320,i320%40 =loadfloat,floataddrspace(1)*%39,align4,!tbaa!19%41 =faddfloat%40,%36storefloat%41,floataddrspace(1)*%39,align4,!tbaa!19%42 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%1,i64%5,i320,i320,i321%43 =loadfloat,floataddrspace(1)*%42,align4,!tbaa!19%44 =faddfloat%43,%37storefloat%44,floataddrspace(1)*%42,align4,!tbaa!19%45 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%1,i64%5,i320,i320,i322%46 =loadfloat,floataddrspace(1)*%45,align4,!tbaa!19%47 =faddfloat%46,%38storefloat%47,floataddrspace(1)*%45,align4,!tbaa!19storefloat%33,floataddrspace(1)*%12,align4,!tbaa!19storefloat%34,floataddrspace(1)*%14,align4,!tbaa!19storefloat%35,floataddrspace(1)*%16,align4,!tbaa!19retvoid48:; preds = %65, %4%49 =phii32 [0,%4 ], [%66,%65 ]%50 =phii32 [0,%4 ], [%67,%65 ]%51 =phifloat [0.000000e+00,%4 ], [%100,%65 ]%52 =phifloat [0.000000e+00,%4 ], [%99,%65 ]%53 =phifloat [0.000000e+00,%4 ], [%98,%65 ]%54 =shli32%50,8%55 =addi32%54,%21%56 =zexti32%55toi64%57 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%56,i320,i320,i320%58 =loadfloat,floataddrspace(1)*%57,align4%59 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%56,i320,i320,i321%60 =loadfloat,floataddrspace(1)*%59,align4%61 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%56,i320,i320,i322%62 =loadfloat,floataddrspace(1)*%61,align4%63 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%56,i320,i320,i323%64 =loadfloat,floataddrspace(1)*%63,align4storefloat%58,floataddrspace(3)*%22,align4,!tbaa!19storefloat%60,floataddrspace(3)*%23,align4,!tbaa!19storefloat%62,floataddrspace(3)*%24,align4,!tbaa!19storefloat%64,floataddrspace(3)*%25,align4,!tbaa!19tailcall floor_funcvoid@_Z7barrierj(i321)brlabel%6965:; preds = %69tailcall floor_funcvoid@_Z7barrierj(i321)%66 =addi32%49,256%67 =addi32%50,1%68 =icmpulti32%66,%19bri1%68,label%48,label%26,!llvm.loop!2269:; preds = %69, %48%70 =phii64 [0,%48 ], [%101,%69 ]%71 =phifloat [%51,%48 ], [%100,%69 ]%72 =phifloat [%52,%48 ], [%99,%69 ]%73 =phifloat [%53,%48 ], [%98,%69 ]%74 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%70,i320,i320,i320%75 =loadfloat,floataddrspace(3)*%74,align4%76 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%70,i320,i320,i321%77 =loadfloat,floataddrspace(3)*%76,align4%78 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%70,i320,i320,i322%79 =loadfloat,floataddrspace(3)*%78,align4%80 =fsubfloat%75,%7%81 =fsubfloat%77,%9%82 =fsubfloat%79,%11%83 =fmulfloat%80,%80%84 =fmulfloat%81,%81%85 =fmulfloat%82,%82%86 =faddfloat%83,0x3F1A36E2E0000000%87 =faddfloat%86,%84%88 =faddfloat%87,%85%89 =tailcall floor_funcfloat@_Z5rsqrtf(float%88)%90 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%70,i320,i320,i323%91 =loadfloat,floataddrspace(3)*%90,align4,!tbaa!19%92 =fmulfloat%89,%89%93 =fmulfloat%92,%89%94 =fmulfloat%93,%91%95 =fmulfloat%94,%80%96 =fmulfloat%94,%81%97 =fmulfloat%94,%82%98 =faddfloat%95,%73%99 =faddfloat%96,%72%100 =faddfloat%97,%71%101 =addnuwnswi64%70,1%102 =icmpeqi64%101,256bri1%102,label%65,label%69,!llvm.loop!24}declare floor_funci64@_Z13get_global_idj(i32)declare floor_funci64@_Z15get_global_sizej(i32)declare floor_funci64@_Z12get_local_idj(i32)declare floor_funcfloat@_Z5rsqrtf(float)declare floor_funcvoid@_Z7barrierj(i32)!opencl.kernels = !{!0}!llvm.linker.options = !{}!llvm.module.flags = !{!7,!8}!opencl.ocl.version = !{!9}!opencl.spir.version = !{!9}!opencl.enable.FP_CONTRACT = !{}!opencl.used.extensions = !{!10}!opencl.used.optional.core.features = !{!11}!opencl.compiler.options = !{!12}!llvm.ident = !{!13}!0 = !{void (%class.vector4addrspace(1)*,%class.vector4addrspace(1)*,%class.vector3addrspace(1)*,float)*@simplified_nbody,!1,!2,!3,!4,!5,!6}!1 = !{!"kernel_arg_addr_space",i321,i321,i321,i320}!2 = !{!"kernel_arg_access_qual", !"none", !"none", !"none", !"none"}!3 = !{!"kernel_arg_type", !"compute_global_buffer<const float4>", !"compute_global_buffer<float4>", !"compute_global_buffer<float3>", !"param<float>"}!4 = !{!"kernel_arg_base_type", !"struct __class vector4<float>*", !"struct __class vector4<float>*", !"struct __class vector3<float>*", !"float"}!5 = !{!"kernel_arg_type_qual", !"restrict const", !"restrict", !"restrict", !"const"}!6 = !{!"kernel_arg_name", !"in_positions", !"out_positions", !"inout_velocities", !"time_delta"}!7 = !{i321, !"wchar_size",i324}!8 = !{i327, !"frame-pointer",i322}!9 = !{i321,i322}!10 = !{!"cl_khr_byte_addressable_store", !"cl_khr_global_int32_base_atomics", !"cl_khr_global_int32_extended_atomics", !"cl_khr_local_int32_base_atomics", !"cl_khr_local_int32_extended_atomics", !"cl_khr_fp64", !"cl_khr_fp16", !"cl_khr_gl_msaa_sharing"}!11 = !{!"cl_doubles"}!12 = !{!"-cl-kernel-arg-info", !"-cl-mad-enable", !"-cl-denorms-are-zero", !"-cl-unsafe-math-optimizations"}!13 = !{!"clang version 14.0.6 (https://github.com/a2flo/floor_llvm.git 85a83a4073c340ac03ca1c8fcd131db30339db24)"}!14 = !{i640,i644294967295}!15 = !{i641,i644294967295}!16 = !{i321,i32 -1}!17 = !{i640,i642048}!18 = !{i320,i322048}!19 = !{!20,!20,i640}!20 = !{!"omnipotent char",!21,i640}!21 = !{!"Simple C++ TBAA"}!22 = distinct !{!22,!23}!23 = !{!"llvm.loop.mustprogress"}!24 = distinct !{!24,!23}
OpenCL / SPIR-V
Note that the compiler would usually directly output a.spv file. The output below comes from disassembling it withspirv-dis
(provided by thetoolchain). Also note that the output below has been generated with extended readability (--debug-asm).; SPIR-V; Version: 1.0; Generator: Khronos LLVM/SPIR-V Translator; 14; Bound: 153; Schema: 0 Capability Addresses Capability Linkage Capability Kernel Capability Int64%1 = ExtInstImport"OpenCL.std" MemoryModel Physical64 OpenCL EntryPoint Kernel%simplified_nbody"simplified_nbody"%__spirv_BuiltInGlobalInvocationId%__spirv_BuiltInGlobalSize%__spirv_BuiltInLocalInvocationId ExecutionMode%simplified_nbody LocalSize25611 SourceExtension"cl_khr_byte_addressable_store" SourceExtension"cl_khr_fp16" SourceExtension"cl_khr_fp64" SourceExtension"cl_khr_gl_msaa_sharing" SourceExtension"cl_khr_global_int32_base_atomics" SourceExtension"cl_khr_global_int32_extended_atomics" SourceExtension"cl_khr_local_int32_base_atomics" SourceExtension"cl_khr_local_int32_extended_atomics" Source OpenCL_C102000 Decorate%simplified_nbody.local_body_positions Alignment4 Decorate%19 FuncParamAttr NoAlias Decorate%19 FuncParamAttr NoCapture Decorate%19 FuncParamAttr NoWrite Decorate%20 FuncParamAttr NoAlias Decorate%20 FuncParamAttr NoCapture Decorate%21 FuncParamAttr NoAlias Decorate%21 FuncParamAttr NoCapture Decorate%__spirv_BuiltInGlobalInvocationId LinkageAttributes"__spirv_BuiltInGlobalInvocationId" Import Decorate%__spirv_BuiltInGlobalInvocationId Constant Decorate%__spirv_BuiltInGlobalInvocationId BuiltIn GlobalInvocationId Decorate%__spirv_BuiltInGlobalSize LinkageAttributes"__spirv_BuiltInGlobalSize" Import Decorate%__spirv_BuiltInGlobalSize Constant Decorate%__spirv_BuiltInGlobalSize BuiltIn GlobalSize Decorate%__spirv_BuiltInLocalInvocationId LinkageAttributes"__spirv_BuiltInLocalInvocationId" Import Decorate%__spirv_BuiltInLocalInvocationId Constant Decorate%__spirv_BuiltInLocalInvocationId BuiltIn LocalInvocationId Decorate%70 FPFastMathMode Fast Decorate%72 FPFastMathMode Fast Decorate%74 FPFastMathMode Fast Decorate%101 FPFastMathMode Fast Decorate%102 FPFastMathMode Fast Decorate%103 FPFastMathMode Fast Decorate%104 FPFastMathMode Fast Decorate%105 FPFastMathMode Fast Decorate%106 FPFastMathMode Fast Decorate%108 FPFastMathMode Fast Decorate%109 FPFastMathMode Fast Decorate%110 FPFastMathMode Fast Decorate%114 FPFastMathMode Fast Decorate%115 FPFastMathMode Fast Decorate%116 FPFastMathMode Fast Decorate%117 FPFastMathMode Fast Decorate%118 FPFastMathMode Fast Decorate%119 FPFastMathMode Fast Decorate%131 FPFastMathMode Fast Decorate%132 FPFastMathMode Fast Decorate%133 FPFastMathMode Fast Decorate%134 FPFastMathMode Fast Decorate%135 FPFastMathMode Fast Decorate%136 FPFastMathMode Fast Decorate%138 FPFastMathMode Fast Decorate%139 FPFastMathMode Fast Decorate%140 FPFastMathMode Fast Decorate%141 FPFastMathMode Fast Decorate%142 FPFastMathMode Fast Decorate%143 FPFastMathMode Fast Decorate%146 FPFastMathMode Fast Decorate%149 FPFastMathMode Fast Decorate%152 FPFastMathMode Fast%ulong = TypeInt640%uint = TypeInt320%256ul = Constant%ulong256%0u = Constant%uint0%1u = Constant%uint1%2u = Constant%uint2%0ul = Constant%ulong0%3u = Constant%uint3%8u = Constant%uint8%272u = Constant%uint272%0ul_0 = Constant%ulong0%1ul = Constant%ulong1%256u = Constant%uint256%float = TypeFloat32%struct.anon = TypeStruct%float%float%float%float%union.anon = TypeStruct%struct.anon%class.vector4 = TypeStruct%union.anon%class.vector4[256ul] = TypeArray%class.vector4%256ul %(Workgroup)class.vector4[256ul]* = TypePointer Workgroup%class.vector4[256ul]%void = TypeVoid %(CrossWorkgroup)class.vector4* = TypePointer CrossWorkgroup%class.vector4%struct.anon.9 = TypeStruct%float%float%float%union.anon.8 = TypeStruct%struct.anon.9%class.vector3 = TypeStruct%union.anon.8 %(CrossWorkgroup)class.vector3* = TypePointer CrossWorkgroup%class.vector3%void(#4) = TypeFunction%void %(CrossWorkgroup)class.vector4* %(CrossWorkgroup)class.vector4* %(CrossWorkgroup)class.vector3*%float %<3xulong> = TypeVector%ulong3 %(Input)<3xulong>* = TypePointer Input %<3xulong> %(CrossWorkgroup)float* = TypePointer CrossWorkgroup%float %(Workgroup)float* = TypePointer Workgroup%float%bool = TypeBool%simplified_nbody.local_body_positions = Variable %(Workgroup)class.vector4[256ul]* Workgroup%__spirv_BuiltInGlobalInvocationId = Variable %(Input)<3xulong>* Input%__spirv_BuiltInGlobalSize = Variable %(Input)<3xulong>* Input%__spirv_BuiltInLocalInvocationId = Variable %(Input)<3xulong>* Input%0.0f = Constant%float0%9.99999975e-05f = Constant%float9.99999975e-05%0.999000013f = Constant%float0.999000013functionvoid simplified_nbody (%void(#4) ) {%19 = FunctionParameter %(CrossWorkgroup)class.vector4*%20 = FunctionParameter %(CrossWorkgroup)class.vector4*%21 = FunctionParameter %(CrossWorkgroup)class.vector3*%22 = FunctionParameter%float23:%31 = Load %<3xulong>%__spirv_BuiltInGlobalInvocationId Aligned32%32 = CompositeExtract%ulong%310%36 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%32%0u%0u%0u%37 = Load%float%36 Aligned4%39 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%32%0u%0u%1u%40 = Load%float%39 Aligned4%42 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%32%0u%0u%2u%43 = Load%float%42 Aligned4%44 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%21%32%0u%0u%0u%45 = Load%float%44 Aligned4%46 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%21%32%0u%0u%1u%47 = Load%float%46 Aligned4%48 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%21%32%0u%0u%2u%49 = Load%float%48 Aligned4%51 = Load %<3xulong>%__spirv_BuiltInGlobalSize Aligned32%52 = CompositeExtract%ulong%510%53 = UConvert%uint%52%55 = Load %<3xulong>%__spirv_BuiltInLocalInvocationId Aligned32%56 = CompositeExtract%ulong%550%57 = UConvert%uint%56%60 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%56%0u%0u%0u%61 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%56%0u%0u%1u%62 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%56%0u%0u%2u%64 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%56%0u%0u%3u Branch%2424:%66 = Phi%uint (%65 <-%26,%0u <-%23 )%68 = Phi%uint (%67 <-%26,%0u <-%23 )%71 = Phi%float (%0.0f <-%23,%70 <-%26 )%73 = Phi%float (%0.0f <-%23,%72 <-%26 )%75 = Phi%float (%0.0f <-%23,%74 <-%26 )%77 = ShiftLeftLogical%uint%68%8u%78 = IAdd%uint%77%57%79 = UConvert%ulong%78%80 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%79%0u%0u%0u%81 = Load%float%80 Aligned4%82 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%79%0u%0u%1u%83 = Load%float%82 Aligned4%84 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%79%0u%0u%2u%85 = Load%float%84 Aligned4%86 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%79%0u%0u%3u%87 = Load%float%86 Aligned4 Store%60%81 Aligned4 Store%61%83 Aligned4 Store%62%85 Aligned4 Store%64%87 Aligned4 ControlBarrier%2u%2u%272u Branch%2525:%91 = Phi%ulong (%89 <-%25,%0ul_0 <-%24 )%92 = Phi%float (%71 <-%24,%70 <-%25 )%93 = Phi%float (%73 <-%24,%72 <-%25 )%94 = Phi%float (%75 <-%24,%74 <-%25 )%95 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%91%0u%0u%0u%96 = Load%float%95 Aligned4%97 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%91%0u%0u%1u%98 = Load%float%97 Aligned4%99 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%91%0u%0u%2u%100 = Load%float%99 Aligned4%101 = FSub%float%96%37%102 = FSub%float%98%40%103 = FSub%float%100%43%104 = FMul%float%101%101%105 = FMul%float%102%102%106 = FMul%float%103%103%108 = FAdd%float%104%9.99999975e-05f%109 = FAdd%float%108%105%110 = FAdd%float%109%106%111 = ExtInst%float%1 rsqrt%110%112 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%91%0u%0u%3u%113 = Load%float%112 Aligned4%114 = FMul%float%111%111%115 = FMul%float%114%111%116 = FMul%float%115%113%117 = FMul%float%116%101%118 = FMul%float%116%102%119 = FMul%float%116%103%74 = FAdd%float%117%94%72 = FAdd%float%118%93%70 = FAdd%float%119%92%89 = IAdd%ulong%91%1ul%126 = IEqual%bool%89%256ul BranchConditional%126%26%2526: ControlBarrier%2u%2u%272u%65 = IAdd%uint%66%256u%67 = IAdd%uint%68%1u%130 = ULessThan%bool%65%53 BranchConditional%130%24%2727:%131 = FMul%float%74%22%132 = FMul%float%72%22%133 = FMul%float%70%22%134 = FAdd%float%131%45%135 = FAdd%float%132%47%136 = FAdd%float%133%49%138 = FMul%float%134%0.999000013f%139 = FMul%float%135%0.999000013f%140 = FMul%float%136%0.999000013f%141 = FMul%float%138%22%142 = FMul%float%139%22%143 = FMul%float%140%22%144 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%20%32%0u%0u%0u%145 = Load%float%144 Aligned4%146 = FAdd%float%145%141 Store%144%146 Aligned4%147 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%20%32%0u%0u%1u%148 = Load%float%147 Aligned4%149 = FAdd%float%148%142 Store%147%149 Aligned4%150 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%20%32%0u%0u%2u%151 = Load%float%150 Aligned4%152 = FAdd%float%151%143 Store%150%152 Aligned4 Store%44%138 Aligned4 Store%46%139 Aligned4 Store%48%140 Aligned4 Return}
Vulkan / SPIR-V
Note that the compiler would usually directly output a.spvc file (asimple container format for multiple SPIR-V binaries). The output below comes from disassembling it withspirv-dis
(provided by thetoolchain). Also note that the output below has been generated with extended readability (--debug-asm).; SPIR-V; Version: 1.6; Generator: Khronos LLVM/SPIR-V Translator; 14; Bound: 210; Schema: 0 Capability Matrix Capability Shader Capability Int64 Capability GroupNonUniform Capability VariablePointersStorageBuffer Capability VariablePointers Capability ShaderNonUniform Capability UniformBufferArrayNonUniformIndexing Capability SampledImageArrayNonUniformIndexing Capability StorageBufferArrayNonUniformIndexing Capability StorageImageArrayNonUniformIndexing Capability VulkanMemoryModel Capability VulkanMemoryModelDeviceScope Capability PhysicalStorageBufferAddresses%1 = ExtInstImport"GLSL.std.450" MemoryModel PhysicalStorageBuffer64 Vulkan EntryPoint GLCompute %simplified_nbody "simplified_nbody" %simplified_nbody.vulkan_uniform. %simplified_nbody.vulkan_uniform..1 %simplified_nbody.vulkan_uniform..2 %simplified_nbody.vulkan_uniform..3 %simplified_nbody.vulkan_builtin_input. %simplified_nbody.vulkan_builtin_input..4 %simplified_nbody.vulkan_builtin_input..5 %simplified_nbody.vulkan_builtin_input..6 %simplified_nbody.vulkan_builtin_input..7 %simplified_nbody.vulkan_builtin_input..8 %vulkan.immutable_sampler_0 %vulkan.immutable_sampler_1 %vulkan.immutable_sampler_2 %vulkan.immutable_sampler_3 %vulkan.immutable_sampler_4 %vulkan.immutable_sampler_5 %vulkan.immutable_sampler_6 %vulkan.immutable_sampler_7 %vulkan.immutable_sampler_8 %vulkan.immutable_sampler_9 %vulkan.immutable_sampler_10 %vulkan.immutable_sampler_11 %vulkan.immutable_sampler_12 %vulkan.immutable_sampler_13 %vulkan.immutable_sampler_14 %vulkan.immutable_sampler_15 %vulkan.immutable_sampler_16 %vulkan.immutable_sampler_17 %vulkan.immutable_sampler_18 %vulkan.immutable_sampler_19 %vulkan.immutable_sampler_20 %vulkan.immutable_sampler_21 %vulkan.immutable_sampler_22 %vulkan.immutable_sampler_23 %vulkan.immutable_sampler_24 %vulkan.immutable_sampler_25 %vulkan.immutable_sampler_26 %vulkan.immutable_sampler_27 %vulkan.immutable_sampler_28 %vulkan.immutable_sampler_29 %vulkan.immutable_sampler_30 %vulkan.immutable_sampler_31 %vulkan.immutable_sampler_32 %vulkan.immutable_sampler_33 %vulkan.immutable_sampler_34 %vulkan.immutable_sampler_35 %vulkan.immutable_sampler_36 %vulkan.immutable_sampler_37 %vulkan.immutable_sampler_38 %vulkan.immutable_sampler_39 %vulkan.immutable_sampler_40 %vulkan.immutable_sampler_41 %vulkan.immutable_sampler_42 %vulkan.immutable_sampler_43 %vulkan.immutable_sampler_44 %vulkan.immutable_sampler_45 %vulkan.immutable_sampler_46 %vulkan.immutable_sampler_47 %_ZZ16simplified_nbodyE20local_body_positions ExecutionMode%simplified_nbody LocalSize25611 SourceExtension"vk_capability_int16" SourceExtension"vk_capability_int64" SourceExtension"vk_capability_multiview" Source GLSL450 Decorate%vulkan.immutable_sampler_0 DescriptorSet0 Decorate%vulkan.immutable_sampler_0 Binding0 Decorate%vulkan.immutable_sampler_1 DescriptorSet0 Decorate%vulkan.immutable_sampler_1 Binding1 Decorate%vulkan.immutable_sampler_2 DescriptorSet0 Decorate%vulkan.immutable_sampler_2 Binding2 Decorate%vulkan.immutable_sampler_3 DescriptorSet0 Decorate%vulkan.immutable_sampler_3 Binding3 Decorate%vulkan.immutable_sampler_4 DescriptorSet0 Decorate%vulkan.immutable_sampler_4 Binding4 Decorate%vulkan.immutable_sampler_5 DescriptorSet0 Decorate%vulkan.immutable_sampler_5 Binding5 Decorate%vulkan.immutable_sampler_6 DescriptorSet0 Decorate%vulkan.immutable_sampler_6 Binding6 Decorate%vulkan.immutable_sampler_7 DescriptorSet0 Decorate%vulkan.immutable_sampler_7 Binding7 Decorate%vulkan.immutable_sampler_8 DescriptorSet0 Decorate%vulkan.immutable_sampler_8 Binding8 Decorate%vulkan.immutable_sampler_9 DescriptorSet0 Decorate%vulkan.immutable_sampler_9 Binding9 Decorate%vulkan.immutable_sampler_10 DescriptorSet0 Decorate%vulkan.immutable_sampler_10 Binding10 Decorate%vulkan.immutable_sampler_11 DescriptorSet0 Decorate%vulkan.immutable_sampler_11 Binding11 Decorate%vulkan.immutable_sampler_12 DescriptorSet0 Decorate%vulkan.immutable_sampler_12 Binding12 Decorate%vulkan.immutable_sampler_13 DescriptorSet0 Decorate%vulkan.immutable_sampler_13 Binding13 Decorate%vulkan.immutable_sampler_14 DescriptorSet0 Decorate%vulkan.immutable_sampler_14 Binding14 Decorate%vulkan.immutable_sampler_15 DescriptorSet0 Decorate%vulkan.immutable_sampler_15 Binding15 Decorate%vulkan.immutable_sampler_16 DescriptorSet0 Decorate%vulkan.immutable_sampler_16 Binding16 Decorate%vulkan.immutable_sampler_17 DescriptorSet0 Decorate%vulkan.immutable_sampler_17 Binding17 Decorate%vulkan.immutable_sampler_18 DescriptorSet0 Decorate%vulkan.immutable_sampler_18 Binding18 Decorate%vulkan.immutable_sampler_19 DescriptorSet0 Decorate%vulkan.immutable_sampler_19 Binding19 Decorate%vulkan.immutable_sampler_20 DescriptorSet0 Decorate%vulkan.immutable_sampler_20 Binding20 Decorate%vulkan.immutable_sampler_21 DescriptorSet0 Decorate%vulkan.immutable_sampler_21 Binding21 Decorate%vulkan.immutable_sampler_22 DescriptorSet0 Decorate%vulkan.immutable_sampler_22 Binding22 Decorate%vulkan.immutable_sampler_23 DescriptorSet0 Decorate%vulkan.immutable_sampler_23 Binding23 Decorate%vulkan.immutable_sampler_24 DescriptorSet0 Decorate%vulkan.immutable_sampler_24 Binding24 Decorate%vulkan.immutable_sampler_25 DescriptorSet0 Decorate%vulkan.immutable_sampler_25 Binding25 Decorate%vulkan.immutable_sampler_26 DescriptorSet0 Decorate%vulkan.immutable_sampler_26 Binding26 Decorate%vulkan.immutable_sampler_27 DescriptorSet0 Decorate%vulkan.immutable_sampler_27 Binding27 Decorate%vulkan.immutable_sampler_28 DescriptorSet0 Decorate%vulkan.immutable_sampler_28 Binding28 Decorate%vulkan.immutable_sampler_29 DescriptorSet0 Decorate%vulkan.immutable_sampler_29 Binding29 Decorate%vulkan.immutable_sampler_30 DescriptorSet0 Decorate%vulkan.immutable_sampler_30 Binding30 Decorate%vulkan.immutable_sampler_31 DescriptorSet0 Decorate%vulkan.immutable_sampler_31 Binding31 Decorate%vulkan.immutable_sampler_32 DescriptorSet0 Decorate%vulkan.immutable_sampler_32 Binding32 Decorate%vulkan.immutable_sampler_33 DescriptorSet0 Decorate%vulkan.immutable_sampler_33 Binding33 Decorate%vulkan.immutable_sampler_34 DescriptorSet0 Decorate%vulkan.immutable_sampler_34 Binding34 Decorate%vulkan.immutable_sampler_35 DescriptorSet0 Decorate%vulkan.immutable_sampler_35 Binding35 Decorate%vulkan.immutable_sampler_36 DescriptorSet0 Decorate%vulkan.immutable_sampler_36 Binding36 Decorate%vulkan.immutable_sampler_37 DescriptorSet0 Decorate%vulkan.immutable_sampler_37 Binding37 Decorate%vulkan.immutable_sampler_38 DescriptorSet0 Decorate%vulkan.immutable_sampler_38 Binding38 Decorate%vulkan.immutable_sampler_39 DescriptorSet0 Decorate%vulkan.immutable_sampler_39 Binding39 Decorate%vulkan.immutable_sampler_40 DescriptorSet0 Decorate%vulkan.immutable_sampler_40 Binding40 Decorate%vulkan.immutable_sampler_41 DescriptorSet0 Decorate%vulkan.immutable_sampler_41 Binding41 Decorate%vulkan.immutable_sampler_42 DescriptorSet0 Decorate%vulkan.immutable_sampler_42 Binding42 Decorate%vulkan.immutable_sampler_43 DescriptorSet0 Decorate%vulkan.immutable_sampler_43 Binding43 Decorate%vulkan.immutable_sampler_44 DescriptorSet0 Decorate%vulkan.immutable_sampler_44 Binding44 Decorate%vulkan.immutable_sampler_45 DescriptorSet0 Decorate%vulkan.immutable_sampler_45 Binding45 Decorate%vulkan.immutable_sampler_46 DescriptorSet0 Decorate%vulkan.immutable_sampler_46 Binding46 Decorate%vulkan.immutable_sampler_47 DescriptorSet0 Decorate%vulkan.immutable_sampler_47 Binding47 Decorate%class.vector4[256l] ArrayStride16 MemberDecorate%class.vector40 Offset0 MemberDecorate%union.anon0 Offset0 MemberDecorate%struct.anon0 Offset0 MemberDecorate%struct.anon1 Offset4 MemberDecorate%struct.anon2 Offset8 MemberDecorate%struct.anon3 Offset12 Decorate%enclose.class.vector4 Block MemberDecorate%enclose.class.vector40 Offset0 Decorate%class.vector4[] ArrayStride16 Decorate %(StorageBuffer)enclose.class.vector4* ArrayStride16 Decorate%simplified_nbody.vulkan_uniform. NonWritable Decorate%simplified_nbody.vulkan_uniform. DescriptorSet1 Decorate%simplified_nbody.vulkan_uniform. Binding0 Decorate%enclose.class.vector4_0 Block MemberDecorate%enclose.class.vector4_00 Offset0 Decorate%class.vector4[]_0 ArrayStride16 Decorate %(StorageBuffer)enclose.class.vector4_0* ArrayStride16 Decorate%simplified_nbody.vulkan_uniform..1 DescriptorSet1 Decorate%simplified_nbody.vulkan_uniform..1 Binding1 Decorate%enclose.class.vector3 Block MemberDecorate%enclose.class.vector30 Offset0 Decorate%class.vector3[] ArrayStride12 Decorate %(StorageBuffer)enclose.class.vector3* ArrayStride12 MemberDecorate%class.vector30 Offset0 MemberDecorate%union.anon.80 Offset0 MemberDecorate%struct.anon.90 Offset0 MemberDecorate%struct.anon.91 Offset4 MemberDecorate%struct.anon.92 Offset8 Decorate%simplified_nbody.vulkan_uniform..2 DescriptorSet1 Decorate%simplified_nbody.vulkan_uniform..2 Binding2 Decorate%enclose. Block MemberDecorate%enclose.0 Offset0 Decorate%simplified_nbody.vulkan_uniform..3 NonWritable Decorate%simplified_nbody.vulkan_uniform..3 Uniform Decorate%simplified_nbody.vulkan_uniform..3 DescriptorSet1 Decorate%simplified_nbody.vulkan_uniform..3 Binding3 Decorate%simplified_nbody.vulkan_builtin_input. BuiltIn WorkgroupId Decorate%simplified_nbody.vulkan_builtin_input..4 BuiltIn NumWorkgroups Decorate%simplified_nbody.vulkan_builtin_input..5 BuiltIn SubgroupId Decorate%simplified_nbody.vulkan_builtin_input..6 BuiltIn SubgroupLocalInvocationId Decorate%simplified_nbody.vulkan_builtin_input..7 BuiltIn SubgroupSize Decorate%simplified_nbody.vulkan_builtin_input..8 BuiltIn NumSubgroups Decorate %(Workgroup)class.vector4[256l]* ArrayStride4096 Decorate%155 NoSignedWrap Decorate%155 NoUnsignedWrap%ilong = TypeInt641%iint = TypeInt321%256l = Constant%ilong256%8i = Constant%iint8%0i = Constant%iint0%1i = Constant%iint1%2i = Constant%iint2%3i = Constant%iint3%2504i = Constant%iint2504%0l = Constant%ilong0%1l = Constant%ilong1%256i = Constant%iint256%Sampler = TypeSampler %(UniformConstant)Sampler* = TypePointer UniformConstant%Sampler%float = TypeFloat32%struct.anon = TypeStruct%float%float%float%float%union.anon = TypeStruct%struct.anon%class.vector4 = TypeStruct%union.anon%class.vector4[256l] = TypeArray%class.vector4%256l %(Workgroup)class.vector4[256l]* = TypePointer Workgroup%class.vector4[256l]%void = TypeVoid%void() = TypeFunction%void%class.vector4[] = TypeRuntimeArray%class.vector4%enclose.class.vector4 = TypeStruct%class.vector4[] %(StorageBuffer)enclose.class.vector4* = TypePointer StorageBuffer%enclose.class.vector4%class.vector4[]_0 = TypeRuntimeArray%class.vector4%enclose.class.vector4_0 = TypeStruct%class.vector4[]_0 %(StorageBuffer)enclose.class.vector4_0* = TypePointer StorageBuffer%enclose.class.vector4_0%struct.anon.9 = TypeStruct%float%float%float%union.anon.8 = TypeStruct%struct.anon.9%class.vector3 = TypeStruct%union.anon.8%class.vector3[] = TypeRuntimeArray%class.vector3%enclose.class.vector3 = TypeStruct%class.vector3[] %(StorageBuffer)enclose.class.vector3* = TypePointer StorageBuffer%enclose.class.vector3%enclose. = TypeStruct%float %(Uniform)enclose.* = TypePointer Uniform%enclose. %<3xiint> = TypeVector%iint3 %(Input)<3xiint>* = TypePointer Input %<3xiint> %(Input)iint* = TypePointer Input%iint %(StorageBuffer)float* = TypePointer StorageBuffer%float %(Workgroup)float* = TypePointer Workgroup%float%bool = TypeBool %(Uniform)float* = TypePointer Uniform%float%vulkan.immutable_sampler_0 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_1 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_2 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_3 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_4 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_5 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_6 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_7 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_8 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_9 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_10 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_11 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_12 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_13 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_14 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_15 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_16 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_17 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_18 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_19 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_20 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_21 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_22 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_23 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_24 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_25 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_26 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_27 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_28 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_29 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_30 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_31 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_32 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_33 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_34 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_35 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_36 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_37 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_38 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_39 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_40 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_41 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_42 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_43 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_44 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_45 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_46 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_47 = Variable %(UniformConstant)Sampler* UniformConstant%_ZZ16simplified_nbodyE20local_body_positions = Variable %(Workgroup)class.vector4[256l]* Workgroup%simplified_nbody.vulkan_uniform. = Variable %(StorageBuffer)enclose.class.vector4* StorageBuffer%simplified_nbody.vulkan_uniform..1 = Variable %(StorageBuffer)enclose.class.vector4_0* StorageBuffer%simplified_nbody.vulkan_uniform..2 = Variable %(StorageBuffer)enclose.class.vector3* StorageBuffer%simplified_nbody.vulkan_uniform..3 = Variable %(Uniform)enclose.* Uniform%simplified_nbody.vulkan_builtin_input. = Variable %(Input)<3xiint>* Input%simplified_nbody.vulkan_builtin_input..4 = Variable %(Input)<3xiint>* Input%simplified_nbody.vulkan_builtin_input..5 = Variable %(Input)iint* Input%simplified_nbody.vulkan_builtin_input..6 = Variable %(Input)iint* Input%simplified_nbody.vulkan_builtin_input..7 = Variable %(Input)iint* Input%simplified_nbody.vulkan_builtin_input..8 = Variable %(Input)iint* Input%0.0f = Constant%float0%9.99999975e-05f = Constant%float9.99999975e-05%0.999000013f = Constant%float0.999000013functionvoid simplified_nbody (%void() ) {92:%98 = Load %<3xiint>%simplified_nbody.vulkan_builtin_input. Aligned16%99 = CompositeExtract%iint%980%101 = ShiftLeftLogical%iint%99%8i%102 = Load%iint%simplified_nbody.vulkan_builtin_input..6 Aligned4%103 = Load%iint%simplified_nbody.vulkan_builtin_input..5 Aligned4%104 = Load%iint%simplified_nbody.vulkan_builtin_input..7 Aligned4%105 = IMul%iint%103%104%106 = IAdd%iint%105%102%107 = IAdd%iint%101%106%108 = Load %<3xiint>%simplified_nbody.vulkan_builtin_input..4 Aligned16%109 = CompositeExtract%iint%1080%110 = ShiftLeftLogical%iint%109%8i%113 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%107%0i%0i%0i%115 = Load%float%113 Aligned|MakePointerVisible|NonPrivatePointer4%1i%116 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%107%0i%0i%1i%117 = Load%float%116 Aligned|MakePointerVisible|NonPrivatePointer4%1i%119 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%107%0i%0i%2i%120 = Load%float%119 Aligned|MakePointerVisible|NonPrivatePointer4%1i%121 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..2%0i%0i%107%0i%0i%0i%122 = Load%float%121 Aligned|MakePointerVisible|NonPrivatePointer4%1i%123 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..2%0i%0i%107%0i%0i%1i%124 = Load%float%123 Aligned|MakePointerVisible|NonPrivatePointer4%1i%125 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..2%0i%0i%107%0i%0i%2i%126 = Load%float%125 Aligned|MakePointerVisible|NonPrivatePointer4%1i%128 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%106%0i%0i%0i%129 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%106%0i%0i%1i%130 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%106%0i%0i%2i%132 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%106%0i%0i%3i Branch%9393:%134 = Phi%iint (%133 <-%96,%0i <-%92 )%136 = Phi%iint (%135 <-%96,%0i <-%92 )%139 = Phi%float (%0.0f <-%92,%138 <-%96 )%141 = Phi%float (%0.0f <-%92,%140 <-%96 )%143 = Phi%float (%0.0f <-%92,%142 <-%96 )%144 = ShiftLeftLogical%iint%136%8i%145 = IAdd%iint%106%144%146 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%145%0i%0i%0i%147 = Load%float%146 Aligned|MakePointerVisible|NonPrivatePointer4%1i%148 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%145%0i%0i%1i%149 = Load%float%148 Aligned|MakePointerVisible|NonPrivatePointer4%1i%150 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%145%0i%0i%2i%151 = Load%float%150 Aligned|MakePointerVisible|NonPrivatePointer4%1i%152 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%145%0i%0i%3i%153 = Load%float%152 Aligned|MakePointerVisible|NonPrivatePointer4%1i Store%128%147 Aligned4 Store%129%149 Aligned4 Store%130%151 Aligned4 Store%132%153 Aligned4 ControlBarrier%2i%2i%2504i LoopMerge%97%96 None Branch%9494:%157 = Phi%ilong (%155 <-%94,%0l <-%93 )%158 = Phi%float (%139 <-%93,%138 <-%94 )%159 = Phi%float (%141 <-%93,%140 <-%94 )%160 = Phi%float (%143 <-%93,%142 <-%94 )%161 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%157%0i%0i%0i%162 = Load%float%161 Aligned4%163 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%157%0i%0i%1i%164 = Load%float%163 Aligned4%165 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%157%0i%0i%2i%166 = Load%float%165 Aligned4%167 = FSub%float%162%115%168 = FSub%float%164%117%169 = FSub%float%166%120%171 = ExtInst%float%1 Fma%167%167%9.99999975e-05f%172 = ExtInst%float%1 Fma%168%168%171%173 = ExtInst%float%1 Fma%169%169%172%174 = ExtInst%float%1 InverseSqrt%173%175 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%157%0i%0i%3i%176 = Load%float%175 Aligned4%177 = FMul%float%174%174%178 = FMul%float%177%174%179 = FMul%float%178%176%142 = ExtInst%float%1 Fma%179%167%160%140 = ExtInst%float%1 Fma%179%168%159%138 = ExtInst%float%1 Fma%179%169%158%155 = IAdd%ilong%157%1l%186 = IEqual%bool%155%256l LoopMerge%95%94 None BranchConditional%186%95%9495: Branch%9696: ControlBarrier%2i%2i%2504i%133 = IAdd%iint%134%256i%135 = IAdd%iint%136%1i%190 = ULessThan%bool%133%110 BranchConditional%190%93%9797:%192 = InBoundsAccessChain %(Uniform)float*%simplified_nbody.vulkan_uniform..3%0i%193 = Load%float%192 Aligned4%194 = ExtInst%float%1 Fma%193%142%122%195 = ExtInst%float%1 Fma%193%140%124%196 = ExtInst%float%1 Fma%193%138%126%198 = FMul%float%194%0.999000013f%199 = FMul%float%195%0.999000013f%200 = FMul%float%196%0.999000013f%201 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..1%0i%0i%107%0i%0i%0i%202 = Load%float%201 Aligned|MakePointerVisible|NonPrivatePointer4%1i%203 = ExtInst%float%1 Fma%198%193%202 Store%201%203 Aligned|MakePointerAvailable|NonPrivatePointer4%1i%204 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..1%0i%0i%107%0i%0i%1i%205 = Load%float%204 Aligned|MakePointerVisible|NonPrivatePointer4%1i%206 = ExtInst%float%1 Fma%199%193%205 Store%204%206 Aligned|MakePointerAvailable|NonPrivatePointer4%1i%207 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..1%0i%0i%107%0i%0i%2i%208 = Load%float%207 Aligned|MakePointerVisible|NonPrivatePointer4%1i%209 = ExtInst%float%1 Fma%200%193%208 Store%207%209 Aligned|MakePointerAvailable|NonPrivatePointer4%1i Store%121%198 Aligned|MakePointerAvailable|NonPrivatePointer4%1i Store%123%199 Aligned|MakePointerAvailable|NonPrivatePointer4%1i Store%125%200 Aligned|MakePointerAvailable|NonPrivatePointer4%1i Return}
OS:
only AMD64/Intel64/ARM64 are supported
Windows: NT 6.1+
macOS: 13.0+
iOS: 16.0+
Linux: any current x64 distribution
other Unix: if other requirements are met
compiler/toolchain:
libraries and optional requirements:
SDL3 3.1.3+
(opt) OpenCL: requires OpenCL 1.2+ SDK and CPU/GPU drivers (Intel,AMD)
(opt) CUDA: requires sm_50+/Maxwell+ GPU and CUDA 12.0+ drivers (CUDA SDK not required!)
(opt) Metal: requires iOS 16.0+ or macOS 13.0+, and a Metal 3.0 capable GPU
(opt) Host-Compute: requires just the compiler/toolchain that is stated above
(opt) Vulkan: requires 1.4.309+ICD loader / headers / SDK,volk included as submodule
(opt) OpenVR: requiresOpenVR
(opt) OpenXR: requiresOpenXR
ensure git submodules are cloned and up-to-date:
git submodule update --init --recursive
run
./build.sh
(use./build.sh help
to get a list of all options)configuration of optional parts:
to disable OpenCL:define
FLOOR_NO_OPENCL
or./build.sh no-opencl
to disable CUDA:define
FLOOR_NO_CUDA
or./build.sh no-cuda
to disable Metal (only affects macOS/iOS builds):define
FLOOR_NO_METAL
or./build.sh no-metal
to disable Host Compute:define
FLOOR_NO_HOST_COMPUTE
or./build.sh no-host-compute
to disable Vulkan:define
FLOOR_NO_VULKAN
or./build.sh no-vulkan
to disable OpenVR:define
FLOOR_NO_OPENVR
or./build.sh no-openvr
to disable OpenXR:define
FLOOR_NO_OPENXR
or./build.sh no-openxr
to build with libstdc++ (GCC 13.0+) instead of libc++:
./build.sh libstdc++
this is provided as an alternative to build.sh and Xcode
create a build folder and
cd
into itrun
cmake -G "Ninja" -S "<path-to-libfloor>" <options>
options:
to build a static library instead of a shared/dynamic one:
-DBUILD_SHARED_LIBS=OFF
to explicitly use libc++:
-DWITH_LIBCXX=ON
to build with address sanitizer:
-DWITH_ASAN=ON
run
ninja
open
floor.xcodeproj
and buildsome notes:
almost all optional parts of floor are enabled here and you’ll have to install all dependencies or disable them manually
Homebrew is the recommended way to install additional dependencies:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
(opt) downloadOpenVR and manually install it:
mkdir -p {/usr/local/include/openvr,/usr/local/lib}
cp openvr/headers/* /usr/local/include/openvr/
cp openvr/bin/osx32/libopenvr_api.dylib /usr/local/lib/
command line tools might be necessary, install them with:
xcode-select --install
on iOS, either copy dependencies into your iPhoneOS and iPhoneSimulator SDK, or
floor/ios/deps/{include,lib}
iOS linker flags for a depending project:
-lSDL3 -lfloor
installVisual Studio 2022
in "Workloads" select "Desktop development with C++", in "Individual components" search for and select all clang packages
install and wait
installVulkan SDK
install vcpkg (somewhere, not within libfloor):
cd vcpkg
.\bootstrap-vcpkg.bat -disableMetrics
.\vcpkg integrate install
install vcpkg packages:
.\vcpkg --triplet x64-windows install sdl3 OpenCL vulkan openvr openxr-loader
add a user (or system) environment variable
VCPKG_ROOT
that points to the vcpkg folderin Visual Studio: Tools → Options → search for vcpkg and set the custom vcpkg.exe path
in Visual Studio: open folder
floor
(wait a little until build files are generated)select
Debug
orRelease
configuration and buildNOTE: all dependencies (optional parts) are enabled here
NOTE: having other build environments/systems in
PATH
(e.g. MSYS2/MinGW) may result in install/build issues
sudo mkdir -p /opt/floor/include
sudo ln -sf /path/to/floor /opt/floor/include/floor
sudo ln -sf /path/to/floor/bin /opt/floor/lib
alternatively: copy these files/folders there
create a
%%ProgramFiles%%/floor
folder (C:/Program Files/floor)inside this folder:
create a
lib
folderVS2022:
copy everything from bin/ in there (dlls/lib/exp)
MinGW/MSYS2:
copy libfloor_static.a/libfloord_static.a there
create an
include
folder and copy the originalfloor
folder in there (containing all floor source code)
automated builds for Linux, macOS and Windows can be found at:https://libfloor.org/builds/toolchain
NOTE: this requires a Unix environment with all LLVM build dependencies installed - use MSYS2 on Windows
NOTE: the absolute build path must not contain spaces
compile the toolchain:
cd floor/etc/llvm140/ && ./build.sh
if successful, package it (in addition to a .zip file, this also creates a folder with all necessary binaries and include files):
./pkg.sh
install the toolchain:
Unix:
automatic:
development: run
./deploy_dev.sh
from the floor/etc/llvm140/ folder (this will create symlinks to everything in floor and floor/etc/llvm140)release: run
./deploy_pkg.sh
from inside the toolchain package folder (floor/etc/llvm140/toolchain_140006_*; this will copy everything)
manual:
copy the toolchain folder as
toolchain
to/opt/floor/
(should then be/opt/floor/toolchain/{bin,clang,libcxx}
)inside
/opt/floor/toolchain
, add a symlink to thefloor
include folder:sudo ln -sf ../include floor
Windows:
copy the toolchain folder as
toolchain
to%%ProgramFiles%%/floor
(should then be%%ProgramFiles%%/floor/toolchain/{bin,clang,libcxx}
)inside
%%ProgramFiles%%/floor/toolchain
, copy thefloor
folder from theinclude
folder above it into this folder
NOTE: this is the expected default setup - paths can be changed inside config.json (toolchain.generic.paths)
when using X11 forwarding, set these env variables:
export SDL_VIDEO_X11_NODIRECTCOLOR=yes
depending on how your Linux distribution handles OpenCL headers and library, you might need to manually install OpenCL 1.2+ compatible ones
Host-Compute device execution requires locked/pinned memory, which may be very limited in default Linux configurations (usually 64KiB)
libfloor will try to increase the limit to 32MiB per logical CPU core, but this may fail if the max limit is too low
to increase the max limit,/etc/security/limits.conf must be modified
as a simple workaround, add the following line to it (replace user_name with your user name) and relog:
user_name hard memlock unlimited
NOTE: when using ssh, PAM must be enabled for this to apply
depending on your Vulkan implementation, you may also need to increase the max amount of open files (usual default is 1024 files)
libfloor will try to increase the limit to 256 files per logical CPU core, but this may fail if the max limit is too low
to increase the max limit,/etc/security/limits.conf must be modified
as a simple workaround, add the following line to it (replace user_name with your user name) and relog:
user_name hard nofile unlimited
NOTE: when using ssh, PAM must be enabled for this to apply
floor_examples (dnn, nbody, warp, hlbvh, path tracer, other)
libwarp (image-space warping library)
obsolete:oclraster (Flexible Rasterizer in OpenCL)
obsolete:a2elight (Albion 2 Engine)
obsolete:unibot (IRC bot)
About
A C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.