Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.

License

NotificationsYou must be signed in to change notification settings

a2flo/floor

Repository files navigation

What is it?

This project provides a unified compute & graphics host API, as well as a unified compute & graphics C++ device language and library to enable same-source CUDA/Host/Metal/OpenCL/Vulkan programming and execution.

The unified host API is implemented atcompute andgraphics.All backends (CUDA/Host/Metal/OpenCL/Vulkan) currently provide compute support, while graphics support is limited to Metal and Vulkan.

To provide a unified device language, a clang/LLVM/libc++ 14.0 toolchain has beenmodified.

Certain parts of libfloor are used by both host and device code (math andconstexpr). Additional device library code is located atdevice.

Advanced examples can be found in thefloor_examples repository.

Example

Let’s take this fairly simple C++ kernel below that computes the body/body-interactions in aN-body simulation and compile it for each backend. Note that loop unrolling is omitted for conciseness.

// define global constantsstaticconstexpr constantconstuint32_t NBODY_TILE_SIZE {256u };staticconstexpr constantconstfloat NBODY_DAMPING {0.999f };staticconstexpr constantconstfloat NBODY_SOFTENING {0.01f };// define a 1D kernel with a required local size of (NBODY_TILE_SIZE = 256, 1, 1)kernel_1d(NBODY_TILE_SIZE)void simplified_nbody(buffer<const float4> in_positions,// read-only global memory buffer                      buffer<float4> out_positions,// read-write global memory buffer                      buffer<float3> inout_velocities,// read-write global memory buffer                      param<float> time_delta) {// read-only parameter// each work-item represents/computes one bodyconstauto position = in_positions[global_id.x];auto velocity = inout_velocities[global_id.x];  float3 acceleration;// vectors are automatically zero-initialized  local_buffer<float4, NBODY_TILE_SIZE> local_body_positions;// local memory array allocation// loop over all bodiesfor (uint32_t i =0, tile =0, count = global_size.x; i < count; i += NBODY_TILE_SIZE, ++tile) {// move resp. body position/mass from global to local memory    local_body_positions[local_id.x] = in_positions[tile * NBODY_TILE_SIZE + local_id.x];local_barrier();// barrier across all work-items in this work-group// loop over bodies in this work-groupfor (uint32_t j =0; j < NBODY_TILE_SIZE; ++j) {constauto r = local_body_positions[j].xyz - position.xyz;constauto dist_sq = r.dot(r) + (NBODY_SOFTENING * NBODY_SOFTENING);constauto inv_dist =rsqrt(dist_sq);constauto s = local_body_positions[j].w * (inv_dist * inv_dist * inv_dist);// .w is mass      acceleration += r * s;    }local_barrier();  }  velocity = (velocity + acceleration * time_delta) * NBODY_DAMPING;  out_positions[global_id.x].xyz += velocity * time_delta;// update XYZ position  inout_velocities[global_id.x] = velocity;// update velocity}

click to unfold the output for each backend

CUDA / PTX You can download the PTX filehere and the CUBIN filehere (note that building CUBINs is optional and requiresptxas).
//// Generated by LLVM NVPTX Back-End//.version8.4.target sm_86.address_size64// .globlsimplified_nbody// _ZZ16simplified_nbodyE20local_body_positions has been demoted.visible .entry simplified_nbody(.param .u64 simplified_nbody_param_0,.param .u64 simplified_nbody_param_1,.param .u64 simplified_nbody_param_2,.param .f32 simplified_nbody_param_3).reqntid256,1,1{.reg .pred %p<3>;.reg .b32 %r<25>;.reg .f32 %f<71>;.reg .b64 %rd<18>;// demoted variable.shared.align4 .b8 _ZZ16simplified_nbodyE20local_body_positions[4096];mov.u32 %r1, %tid.x;mov.u32%r11, %ntid.x;mov.u32%r12, %ctaid.x;mad.lo.s32%r13,%r12,%r11, %r1;cvt.u64.u32 %rd3,%r13;mul.wide.u32 %rd7,%r13,12;ld.param.u64 %rd8, [simplified_nbody_param_2];cvta.to.global.u64 %rd9, %rd8;add.s64 %rd4, %rd9, %rd7;ld.global.f32 %f6, [%rd4+8];add.s64 %rd6, %rd4,8;ld.global.f32 %f5, [%rd4+4];add.s64 %rd5, %rd4,4;ld.global.f32 %f4, [%rd4];mul.wide.u32 %rd10,%r13,16;ld.param.u64 %rd11, [simplified_nbody_param_0];cvta.to.global.u64 %rd2, %rd11;add.s64 %rd12, %rd2, %rd10;ld.global.nc.f32 %f3, [%rd12+8];ld.global.nc.f32 %f2, [%rd12+4];ld.global.nc.f32 %f1, [%rd12];mov.u32%r14, %nctaid.x;mul.lo.s32 %r2,%r14,%r11;shl.b32%r15, %r1,4;mov.u32 %r16, _ZZ16simplified_nbodyE20local_body_positions;add.s32 %r3, %r16,%r15;ld.param.u64 %rd13, [simplified_nbody_param_1];cvta.to.global.u64 %rd1, %rd13;mov.f32 %f68, 0f00000000;mov.u32%r10,0;ld.param.f32 %f16, [simplified_nbody_param_3];mov.u32 %r22,%r10;mov.u32 %r23,%r10;mov.f32 %f69, %f68;mov.f32 %f70, %f68;LBB0_1:shl.b32 %r18, %r23,8;add.s32 %r19, %r18, %r1;mul.wide.u32 %rd14, %r19,16;add.s64 %rd15, %rd2, %rd14;ld.global.nc.f32 %f18, [%rd15];st.shared.f32 [%r3], %f18;ld.global.nc.f32 %f19, [%rd15+4];st.shared.f32 [%r3+4], %f19;ld.global.nc.f32 %f20, [%rd15+8];st.shared.f32 [%r3+8], %f20;ld.global.nc.f32 %f21, [%rd15+12];st.shared.f32 [%r3+12], %f21;barrier.sync0;mov.u32 %r24,%r10;LBB0_2:add.s32 %r21, %r16, %r24;ld.shared.f32 %f22, [%r21+4];sub.f32 %f23, %f22, %f2;ld.shared.f32 %f24, [%r21];sub.f32 %f25, %f24, %f1;fma.rn.f32 %f26, %f25, %f25, 0f38D1B717;fma.rn.f32 %f27, %f23, %f23, %f26;ld.shared.f32 %f28, [%r21+8];sub.f32 %f29, %f28, %f3;fma.rn.f32 %f30, %f29, %f29, %f27;rsqrt.approx.ftz.f32 %f31, %f30;mul.f32 %f32, %f31, %f31;mul.f32 %f33, %f32, %f31;ld.shared.f32 %f34, [%r21+12];mul.f32 %f35, %f33, %f34;fma.rn.f32 %f36, %f35, %f29, %f68;ld.shared.f32 %f37, [%r21+20];sub.f32 %f38, %f37, %f2;ld.shared.f32 %f39, [%r21+16];sub.f32 %f40, %f39, %f1;fma.rn.f32 %f41, %f40, %f40, 0f38D1B717;fma.rn.f32 %f42, %f38, %f38, %f41;ld.shared.f32 %f43, [%r21+24];sub.f32 %f44, %f43, %f3;fma.rn.f32 %f45, %f44, %f44, %f42;rsqrt.approx.ftz.f32 %f46, %f45;mul.f32 %f47, %f46, %f46;mul.f32 %f48, %f47, %f46;ld.shared.f32 %f49, [%r21+28];mul.f32 %f50, %f48, %f49;fma.rn.f32 %f68, %f50, %f44, %f36;fma.rn.f32 %f51, %f35, %f23, %f69;fma.rn.f32 %f69, %f50, %f38, %f51;fma.rn.f32 %f52, %f35, %f25, %f70;fma.rn.f32 %f70, %f50, %f40, %f52;add.s32 %r24, %r24,32;setp.eq.s32 %p1, %r24,4096;@%p1 bra LBB0_3;bra.uni LBB0_2;LBB0_3:add.s32 %r22, %r22,256;setp.lt.u32 %p2, %r22, %r2;barrier.sync0;add.s32 %r23, %r23,1;@%p2 bra LBB0_1;fma.rn.f32 %f53, %f70, %f16, %f4;mul.f32 %f54, %f53, 0f3F7FBE77;shl.b64 %rd16, %rd3,4;add.s64 %rd17, %rd1, %rd16;ld.global.f32 %f55, [%rd17];fma.rn.f32 %f56, %f54, %f16, %f55;st.global.f32 [%rd17], %f56;fma.rn.f32 %f57, %f69, %f16, %f5;mul.f32 %f58, %f57, 0f3F7FBE77;ld.global.f32 %f59, [%rd17+4];fma.rn.f32 %f60, %f58, %f16, %f59;st.global.f32 [%rd17+4], %f60;fma.rn.f32 %f61, %f68, %f16, %f6;mul.f32 %f62, %f61, 0f3F7FBE77;ld.global.f32 %f63, [%rd17+8];fma.rn.f32 %f64, %f62, %f16, %f63;st.global.f32 [%rd17+8], %f64;st.global.f32 [%rd4], %f54;st.global.f32 [%rd5], %f58;st.global.f32 [%rd6], %f62;ret;}
Host-Compute (x86 CPU) Note that the compiler would usually directly output a.bin file (ELF format). The output below comes from disassembling it withobjdump -d. Also note that this has been compiled for thex86-5 target (AVX-512+).
nbody.bin:     file formatelf64-x86-64Disassembly of section .text:0000000000000000 <simplified_nbody>:0:55push   %rbp1:4889 e5mov    %rsp,%rbp4:4157push   %r156:4156push   %r148:4155push   %r13       a:4154push   %r12       c:53push   %rbx       d:4883 e4 c0and$0xffffffffffffffc0,%rsp11:4881 ec40090000sub$0x940,%rsp18:48 8d05 f9 ff ff fflea-0x7(%rip),%rax        #18 <simplified_nbody+0x18>      1f:49 be0000000000 movabs$0x0,%r1426:00000029:4889 4c2450mov    %rcx,0x50(%rsp)      2e:4889742468mov    %rsi,0x68(%rsp)33:4889 7c2448mov    %rdi,0x48(%rsp)38:4901 c6add    %rax,%r14      3b:48 b80000000000 movabs$0x0,%rax42:00000045:49 8b0406mov    (%r14,%rax,1),%rax49:8b00mov    (%rax),%eax      4b:48 8d 0c40lea    (%rax,%rax,2),%rcx      4f:4889 c6mov    %rax,%rsi52:48 c1 e604shl$0x4,%rsi56:4889742458mov    %rsi,0x58(%rsp)      5b:48 8d04 8alea    (%rdx,%rcx,4),%rax      5f:c5 fa1004 8avmovss (%rdx,%rcx,4),%xmm064:c5 f9 6e54 8a04vmovd0x4(%rdx,%rcx,4),%xmm2      6a:c5 fa10 4c 8a08vmovss0x8(%rdx,%rcx,4),%xmm170:4889442460mov    %rax,0x60(%rsp)75:48 b80000000000 movabs$0x0,%rax      7c:000000      7f:49 8b0406mov    (%r14,%rax,1),%rax83:8b18mov    (%rax),%ebx85:c5 fa114424 3cvmovss %xmm0,0x3c(%rsp)      8b:c5 f9 7e542440vmovd  %xmm2,0x40(%rsp)91:c5 fa11 4c2444vmovss %xmm1,0x44(%rsp)97:85 dbtest   %ebx,%ebx99:0f84 f9160000je1798 <simplified_nbody+0x1798>      9f:48 8b442448mov0x48(%rsp),%rax      a4:49 bd0000000000 movabs$0x0,%r13      ab:000000      ae:4531 ffxor    %r15d,%r15d      b1:c5 fa100430vmovss (%rax,%rsi,1),%xmm0      b6:c5 fa10 4c3004vmovss0x4(%rax,%rsi,1),%xmm1      bc:c5 fa10543008vmovss0x8(%rax,%rsi,1),%xmm2      c2:48 b80000000000 movabs$0x0,%rax      c9:000000      cc:49 8b0406mov    (%r14,%rax,1),%rax      d0:4889442478mov    %rax,0x78(%rsp)      d5:4b 8d04 2elea    (%r14,%r13,1),%rax      d9:4889442470mov    %rax,0x70(%rsp)      de:48 b80000000000 movabs$0x0,%rax      e5:000000      e8:62 f2 7d4818 c0vbroadcastss %xmm0,%zmm0      ee:4d 8b2406mov    (%r14,%rax,1),%r12      f2:62 f2 7d4818 c9vbroadcastss %xmm1,%zmm1      f8:48 b80000000000 movabs$0x0,%rax      ff:000000102:62 f1 7c48294424vmovaps %zmm0,0x700(%rsp)109:1c     10a:62 f2 7d4818 c2vbroadcastss %xmm2,%zmm0110:62 d2 fd48 5b1406 vbroadcasti64x4 (%r14,%rax,1),%zmm2117:48 b80000000000 movabs$0x0,%rax     11e:000000121:62 f1 7c4829 4c24vmovaps %zmm1,0x6c0(%rsp)128:1b129:62 d2 fd48 5b 0c06 vbroadcasti64x4 (%r14,%rax,1),%zmm1130:48 b80000000000 movabs$0x0,%rax137:000000     13a:62 f1 7c48294424vmovaps %zmm0,0x680(%rsp)141:1a142:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0146:c5 f82984248000vmovaps %xmm0,0x80(%rsp)     14d:0000     14f:62 f1 fd48 7f5424 vmovdqa64 %zmm2,0x640(%rsp)156:19157:62 d2 fd48 5b1406 vbroadcasti64x4 (%r14,%rax,1),%zmm2     15e:48 b80000000000 movabs$0x0,%rax165:000000168:62 f1 fd48 7f 4c24 vmovdqa64 %zmm1,0x840(%rsp)     16f:21170:62 d2 7d4818 0c06vbroadcastss (%r14,%rax,1),%zmm1177:48 b80000000000 movabs$0x0,%rax     17e:000000181:62 f1 fd48 7f5424 vmovdqa64 %zmm2,0x800(%rsp)188:20189:62 d2 fd48 5b1406 vbroadcasti64x4 (%r14,%rax,1),%zmm2190:48 b80000000000 movabs$0x0,%rax197:000000     19a:62 f1 7c4829 4c24vmovaps %zmm1,0x600(%rsp)     1a1:18     1a2:62 d2 7d4818 0c06vbroadcastss (%r14,%rax,1),%zmm1     1a9:48 b80000000000 movabs$0x0,%rax     1b0:000000     1b3:62 d2 7d48180406vbroadcastss (%r14,%rax,1),%zmm0     1ba:62 f1 fd48 7f5424 vmovdqa64 %zmm2,0x7c0(%rsp)     1c1:1f     1c2:62 f1 7c4829 4c24vmovaps %zmm1,0x780(%rsp)     1c9:1e     1ca:62 f1 7c48294424vmovaps %zmm0,0x740(%rsp)     1d1:1d     1d2:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0     1d6:c5 f8298424 c000vmovaps %xmm0,0xc0(%rsp)     1dd:0000     1df:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0     1e3:c5 f82984240001vmovaps %xmm0,0x100(%rsp)     1ea:0000     1ec:0f 1f4000          nopl0x0(%rax)     1f0:48 8b442478mov0x78(%rsp),%rax     1f5:48 8b542448mov0x48(%rsp),%rdx     1fa:8b00mov    (%rax),%eax     1fc:42 8d 0c38lea    (%rax,%r15,1),%ecx200:48 c1 e004shl$0x4,%rax204:48 c1 e104shl$0x4,%rcx208:c5 f81004 0avmovups (%rdx,%rcx,1),%xmm0     20d:48 8b 4c2470mov0x70(%rsp),%rcx212:c5 f8290408vmovaps %xmm0,(%rax,%rcx,1)217:c5 f877vzeroupper     21a:41 ff d4call*%r12     21d:6291 7c4828 5c 2evmovaps0x80(%r14,%r13,1),%zmm3224:02225:62 f1 7c48286424vmovaps0x640(%rsp),%zmm4     22c:19     22d:6281 7c4828 5c 2evmovaps0xc0(%r14,%r13,1),%zmm19234:03235:6291 7c482854 2evmovaps0x180(%r14,%r13,1),%zmm2     23c:06     23d:6211 7c4828 4c 2evmovaps0x100(%r14,%r13,1),%zmm9244:04245:6211 7c4828 6c 2evmovaps0x140(%r14,%r13,1),%zmm13     24c:05     24d:6281 7c4828 4c 2evmovaps0x1c0(%r14,%r13,1),%zmm17254:07255:6271 7c48287424vmovaps0x800(%rsp),%zmm14     25c:20     25d:6291 7c482804 2evmovaps (%r14,%r13,1),%zmm0264:6281 7c482854 2evmovaps0x40(%r14,%r13,1),%zmm18     26b:01     26c:62 f1 7c48287424vmovaps0x7c0(%rsp),%zmm6273:1f274:6201 7c482844 2evmovaps0x280(%r14,%r13,1),%zmm24     27b:0a     27c:6281 7c482874 2evmovaps0x200(%r14,%r13,1),%zmm22283:08284:6281 7c4828 6c 2evmovaps0x240(%r14,%r13,1),%zmm21     28b:09     28c:6281 7c4828 7c 2evmovaps0x2c0(%r14,%r13,1),%zmm23293:0b294:6201 7c482864 2evmovaps0x380(%r14,%r13,1),%zmm28     29b:0e     29c:6201 7c482854 2evmovaps0x300(%r14,%r13,1),%zmm26     2a3:0c     2a4:6201 7c4828 5c 2evmovaps0x3c0(%r14,%r13,1),%zmm27     2ab:0f     2ac:62 f1 7c4828 cbvmovaps %zmm3,%zmm1     2b2:62 e1 7c4828 e2vmovaps %zmm2,%zmm20     2b8:62 d1 7c4828 e9vmovaps %zmm9,%zmm5     2be:6261 7c4828 cavmovaps %zmm2,%zmm25     2c4:62 f1 7c4828 f8vmovaps %zmm0,%zmm7     2ca:6271 7c4828 fbvmovaps %zmm3,%zmm15     2d0:62 e1 7c4828 c0vmovaps %zmm0,%zmm16     2d6:6271 7c4828 c3vmovaps %zmm3,%zmm8     2dc:6271 7c4828 e0vmovaps %zmm0,%zmm12     2e2:6271 7c4828 d2vmovaps %zmm2,%zmm10     2e8:62 b2 4d48 7f db    vpermt2ps %zmm19,%zmm6,%zmm3     2ee:62 b2 4d48 7f c2    vpermt2ps %zmm18,%zmm6,%zmm0     2f4:6261 7c4828 f4vmovaps %zmm4,%zmm30     2fa:62 b2 4d48 7f d1    vpermt2ps %zmm17,%zmm6,%zmm2300:6251 7c4828 d9vmovaps %zmm9,%zmm11306:6201 7c4828 e8vmovaps %zmm24,%zmm29     30c:6201 7c4828 fcvmovaps %zmm28,%zmm31312:62 b2 5d48 7f cb    vpermt2ps %zmm19,%zmm4,%zmm1318:62 a2 5d48 7f e1    vpermt2ps %zmm17,%zmm4,%zmm20     31e:62 d2 5d48 7f ed    vpermt2ps %zmm13,%zmm4,%zmm5324:6222 0d48 7f c9    vpermt2ps %zmm17,%zmm14,%zmm25     32a:62 b2 5d48 7f fa    vpermt2ps %zmm18,%zmm4,%zmm7330:62 d1 7c4828 e1vmovaps %zmm9,%zmm4336:6232 0d48 7f fb    vpermt2ps %zmm19,%zmm14,%zmm15     33c:62 a2 0d48 7f c2    vpermt2ps %zmm18,%zmm14,%zmm16342:6252 4d48 7f cd    vpermt2ps %zmm13,%zmm6,%zmm9348:6252 0d48 7f dd    vpermt2ps %zmm13,%zmm14,%zmm11     34e:6291 7c4828 f2vmovaps %zmm26,%zmm6354:6222 0d40 7f ef    vpermt2ps %zmm23,%zmm30,%zmm29     35a:62 f3 fd4823 c3 e4 vshuff64x2$0xe4,%zmm3,%zmm0,%zmm0361:6291 7c4828 dcvmovaps %zmm28,%zmm3367:62 f1 7c4829 4c24vmovaps %zmm1,0x140(%rsp)     36e:05     36f:62 f1 7c4828 4c24vmovaps0x840(%rsp),%zmm1376:21377:62 b3 d54823 ec e4 vshuff64x2$0xe4,%zmm20,%zmm5,%zmm5     37e:6261 7c4829 4c24vmovaps %zmm25,0x280(%rsp)385:0a386:6201 7c4828 4c 2evmovaps0x340(%r14,%r13,1),%zmm25     38d:0d     38e:62 a1 7c4828 e6vmovaps %zmm22,%zmm20394:62 f3 b54823 d2 e4 vshuff64x2$0xe4,%zmm2,%zmm9,%zmm2     39b:6271 7c4828 4c24vmovaps0x640(%rsp),%zmm9     3a2:19     3a3:6292 0d48 7f db    vpermt2ps %zmm27,%zmm14,%zmm3     3a9:62 f3 c54823 7c24 vshuff64x2$0xe4,0x140(%rsp),%zmm7,%zmm7     3b0:05 e4     3b2:62 a2 0d48 7f e5    vpermt2ps %zmm21,%zmm14,%zmm20     3b8:62 f1 fd48294424vmovapd %zmm0,0x140(%rsp)     3bf:05     3c0:62 f1 fd4829 6c24vmovapd %zmm5,0x4c0(%rsp)     3c7:13     3c8:62 f1 7c4828 6c24vmovaps0x7c0(%rsp),%zmm5     3cf:1f     3d0:62 f1 fd48295424vmovapd %zmm2,0x500(%rsp)     3d7:14     3d8:62327548 7f c3    vpermt2ps %zmm19,%zmm1,%zmm8     3de:62327548 7f e2    vpermt2ps %zmm18,%zmm1,%zmm12     3e4:62 a1 7c4828 devmovaps %zmm22,%zmm19     3ea:6281 7c4828 d0vmovaps %zmm24,%zmm18     3f0:62327548 7f d1    vpermt2ps %zmm17,%zmm1,%zmm10     3f6:6281 7c4828 c8vmovaps %zmm24,%zmm17     3fc:62 d27548 7f e5    vpermt2ps %zmm13,%zmm1,%zmm4402:6211 7c4828 eevmovaps %zmm30,%zmm13408:6221 7c4828 f6vmovaps %zmm22,%zmm30     40e:62 a2 0d48 7f d7    vpermt2ps %zmm23,%zmm14,%zmm18414:62 a27548 7f cf    vpermt2ps %zmm23,%zmm1,%zmm17     41a:62 a27548 7f dd    vpermt2ps %zmm21,%zmm1,%zmm19420:62021548 7f fb    vpermt2ps %zmm27,%zmm13,%zmm31426:62921548 7f f1    vpermt2ps %zmm25,%zmm13,%zmm6     42c:62221548 7f f5    vpermt2ps %zmm21,%zmm13,%zmm30432:6211 7c4828 ecvmovaps %zmm28,%zmm13438:62 f1 fd4829 7c24vmovapd %zmm7,0x240(%rsp)     43f:09440:62 f3 a54823 7c24 vshuff64x2$0xe4,0x280(%rsp),%zmm11,%zmm7447:0a e4449:62025548 7f e3    vpermt2ps %zmm27,%zmm5,%zmm28     44f:62225548 7f c7    vpermt2ps %zmm23,%zmm5,%zmm24455:62 a25548 7f f5    vpermt2ps %zmm21,%zmm5,%zmm22     45b:62127548 7f eb    vpermt2ps %zmm27,%zmm1,%zmm13461:6281 7c4828 7c 2evmovaps0x4c0(%r14,%r13,1),%zmm23468:13469:62 e1 7c4828 6c24vmovaps0x6c0(%rsp),%zmm21470:1b471:62 d3 dd4823 c2 e4 vshuff64x2$0xe4,%zmm10,%zmm4,%zmm0478:6253 fd4023 d7 e4 vshuff64x2$0xe4,%zmm15,%zmm16,%zmm10     47f:6211 7c4828 favmovaps %zmm26,%zmm15485:6253 9d4823 c0 e4 vshuff64x2$0xe4,%zmm8,%zmm12,%zmm8     48c:6211 7c4828 e2vmovaps %zmm26,%zmm12492:62025548 7f d1    vpermt2ps %zmm25,%zmm5,%zmm26498:6281 7c482844 2evmovaps0x540(%r14,%r13,1),%zmm16     49f:15     4a0:6233 e54023 d9 e4 vshuff64x2$0xe4,%zmm17,%zmm19,%zmm11     4a7:62 a3 dd4023 d2 e4 vshuff64x2$0xe4,%zmm18,%zmm20,%zmm18     4ae:6281 7c482864 2evmovaps0x580(%r14,%r13,1),%zmm20     4b5:16     4b6:6281 7c4828 4c 2evmovaps0x500(%r14,%r13,1),%zmm17     4bd:14     4be:6212 0d48 7f f9    vpermt2ps %zmm25,%zmm14,%zmm15     4c4:62127548 7f e1    vpermt2ps %zmm25,%zmm1,%zmm12     4ca:6201 7c4828 4c 2evmovaps0x5c0(%r14,%r13,1),%zmm25     4d1:17     4d2:6293 8d4023 d5 e4 vshuff64x2$0xe4,%zmm29,%zmm30,%zmm2     4d9:62 e1 7c4828 5c24vmovaps0x780(%rsp),%zmm19     4e0:1e     4e1:62 f1 fd48294424vmovapd %zmm0,0x440(%rsp)     4e8:11     4e9:6293 cd4823 c7 e4 vshuff64x2$0xe4,%zmm31,%zmm6,%zmm0     4f0:62 f1 fd48295424vmovapd %zmm2,0x200(%rsp)     4f7:08     4f8:62 f1 7c4828 d5vmovaps %zmm5,%zmm2     4fe:62 f1 fd48294424vmovapd %zmm0,0x400(%rsp)505:10506:6293 cd4023 c0 e4 vshuff64x2$0xe4,%zmm24,%zmm22,%zmm0     50d:6281 7c482874 2evmovaps0x400(%r14,%r13,1),%zmm22514:10515:6201 7c482844 2evmovaps0x480(%r14,%r13,1),%zmm24     51c:12     51d:62 f1 fd48294424vmovapd %zmm0,0x480(%rsp)524:12525:6293 ad4023 e4 e4 vshuff64x2$0xe4,%zmm28,%zmm26,%zmm4     52c:62 d3 9d4823 ed e4 vshuff64x2$0xe4,%zmm13,%zmm12,%zmm5533:62 f3854823 db e4 vshuff64x2$0xe4,%zmm3,%zmm15,%zmm3     53a:6221 7c4828 dcvmovaps %zmm20,%zmm27540:6221 7c4828 e1vmovaps %zmm17,%zmm28546:62 f1 fd48296424vmovapd %zmm4,0x280(%rsp)     54d:0a     54e:6291 7c482864 2evmovaps0x440(%r14,%r13,1),%zmm4555:11556:6221 7c4828 f4vmovaps %zmm20,%zmm30     55c:6221 7c4828 f9vmovaps %zmm17,%zmm31562:62023548 7f d9    vpermt2ps %zmm25,%zmm9,%zmm27568:62223548 7f e0    vpermt2ps %zmm16,%zmm9,%zmm28     56e:6202 0d48 7f f1    vpermt2ps %zmm25,%zmm14,%zmm30574:6222 0d48 7f f8    vpermt2ps %zmm16,%zmm14,%zmm31     57a:6201 7c4828 d0vmovaps %zmm24,%zmm26580:6231 7c4828 eevmovaps %zmm22,%zmm13586:6211 7c4828 f8vmovaps %zmm24,%zmm15     58c:6221 7c4828 eevmovaps %zmm22,%zmm29592:62223548 7f d7    vpermt2ps %zmm23,%zmm9,%zmm26598:62327548 7f ff    vpermt2ps %zmm23,%zmm1,%zmm15     59e:6293 9d4023 f3 e4 vshuff64x2$0xe4,%zmm27,%zmm28,%zmm6     5a5:62723548 7f ec    vpermt2ps %zmm4,%zmm9,%zmm13     5ab:6221 7c4828 e4vmovaps %zmm20,%zmm28     5b1:6262 0d48 7f ec    vpermt2ps %zmm4,%zmm14,%zmm29     5b7:62027548 7f e1    vpermt2ps %zmm25,%zmm1,%zmm28     5bd:62 f1 fd48297424vmovapd %zmm6,0x1c0(%rsp)     5c4:07     5c5:62 b1 7c4828 f6vmovaps %zmm22,%zmm6     5cb:62 f27548 7f f4    vpermt2ps %zmm4,%zmm1,%zmm6     5d1:6293954823 c2 e4 vshuff64x2$0xe4,%zmm26,%zmm13,%zmm0     5d8:6271 7c4828 e9vmovaps %zmm1,%zmm13     5de:62 f1 fd48294424vmovapd %zmm0,0x180(%rsp)     5e5:06     5e6:62 d3 cd4823 c7 e4 vshuff64x2$0xe4,%zmm15,%zmm6,%zmm0     5ed:62 f1 7c48287424vmovaps0x600(%rsp),%zmm6     5f4:18     5f5:62 f1 fd48294424vmovapd %zmm0,0x300(%rsp)     5fc:0c     5fd:62 b1 7c4828 c1vmovaps %zmm17,%zmm0603:62 b27548 7f c0    vpermt2ps %zmm16,%zmm1,%zmm0609:62 f1 7c4828 4c24vmovaps0x240(%rsp),%zmm1610:09611:6293 fd4823 c4 e4 vshuff64x2$0xe4,%zmm28,%zmm0,%zmm0618:6261 7c4828 e2vmovaps %zmm2,%zmm28     61e:62 e2 1d40 7f f4    vpermt2ps %zmm4,%zmm28,%zmm22624:62 f1 7c48286424vmovaps0x4c0(%rsp),%zmm4     62b:13     62c:62 a2 1d40 7f c8    vpermt2ps %zmm16,%zmm28,%zmm17632:6282 1d40 7f e1    vpermt2ps %zmm25,%zmm28,%zmm20638:62 e1 7c48284424vmovaps0x1c0(%rsp),%zmm16     63f:07640:62 f1 fd48294424vmovapd %zmm0,0x2c0(%rsp)647:0b648:6291 7c4828 c0vmovaps %zmm24,%zmm0     64e:6222 6d48 7f c7    vpermt2ps %zmm23,%zmm2,%zmm24654:62 f1 7c48285424vmovaps0x680(%rsp),%zmm2     65b:1a     65c:62 b2 0d48 7f c7    vpermt2ps %zmm23,%zmm14,%zmm0662:62 e1 7c4828 7c24vmovaps0x740(%rsp),%zmm23669:1d     66a:62 a3 f54023 e4 e4 vshuff64x2$0xe4,%zmm20,%zmm17,%zmm20671:6283 cd4023 f0 e4 vshuff64x2$0xe4,%zmm24,%zmm22,%zmm22678:62 f3954023 c0 e4 vshuff64x2$0xe4,%zmm0,%zmm29,%zmm0     67f:6203854023 ee e4 vshuff64x2$0xe4,%zmm30,%zmm31,%zmm29686:6221 3c48 5c f5vsubps %zmm21,%zmm8,%zmm30     68c:6271 7c48284424vmovaps0x440(%rsp),%zmm8693:11694:6261 2c48 5c favsubps %zmm2,%zmm10,%zmm31     69a:62614448 5c davsubps %zmm2,%zmm7,%zmm27     6a0:62 b1 7c4828 fbvmovaps %zmm19,%zmm7     6a6:62 f16448 5c davsubps %zmm2,%zmm3,%zmm3     6ac:62 f1 7c4829 5c24vmovaps %zmm3,0x240(%rsp)     6b3:09     6b4:62 f1 fd48294424vmovapd %zmm0,0x5c0(%rsp)     6bb:17     6bc:62 f1 7c48284424vmovaps0x700(%rsp),%zmm0     6c3:1c     6c4:6221 3c48 5c d5vsubps %zmm21,%zmm8,%zmm26     6ca:62717448 5c e0vsubps %zmm0,%zmm1,%zmm12     6d0:62 f1 5c48 5c e0vsubps %zmm0,%zmm4,%zmm4     6d6:62 e1 7c40 5c c0vsubps %zmm0,%zmm16,%zmm16     6dc:6251 7c4828 ccvmovaps %zmm12,%zmm9     6e2:6271 7c4828 c4vmovaps %zmm4,%zmm8     6e8:62 e1 7c48294424vmovaps %zmm16,0x340(%rsp)     6ef:0d     6f0:6272 1d48 a8 ce    vfmadd213ps %zmm6,%zmm12,%zmm9     6f6:6272 5d48 a8 c6    vfmadd213ps %zmm6,%zmm4,%zmm8     6fc:6212 0d40 b8 ce    vfmadd231ps %zmm30,%zmm30,%zmm9702:6212 2d40 b8 c2    vfmadd231ps %zmm26,%zmm26,%zmm8708:62120540 b8 cf    vfmadd231ps %zmm31,%zmm31,%zmm9     70e:62122540 b8 c3    vfmadd231ps %zmm27,%zmm27,%zmm8714:6252 7d48 4e d1    vrsqrt14ps %zmm9,%zmm10     71a:6252 7d48 4e f8    vrsqrt14ps %zmm8,%zmm15720:6251344859 cavmulps %zmm10,%zmm9,%zmm9726:6251 3c4859 c7vmulps %zmm15,%zmm8,%zmm8     72c:6232 2d48 a8 cb    vfmadd213ps %zmm19,%zmm10,%zmm9732:6231 2c4859 d7vmulps %zmm23,%zmm10,%zmm10738:62320548 a8 c3    vfmadd213ps %zmm19,%zmm15,%zmm8     73e:6251 2c4859 d1vmulps %zmm9,%zmm10,%zmm10744:6231044859 cfvmulps %zmm23,%zmm15,%zmm9     74a:6271 7c4828 7c24vmovaps0x200(%rsp),%zmm15751:08752:62 d1344859 c8vmulps %zmm8,%zmm9,%zmm1758:62312448 5c cdvsubps %zmm21,%zmm11,%zmm9     75e:6271 6c40 5c c2vsubps %zmm2,%zmm18,%zmm8764:6271 7c4829 4c24vmovaps %zmm9,0x200(%rsp)     76b:08     76c:6271 7c48294424vmovaps %zmm8,0x3c0(%rsp)773:0f774:62 e10448 5c d8vsubps %zmm0,%zmm15,%zmm19     77a:6231 7c4828 dbvmovaps %zmm19,%zmm11780:62726540 a8 de    vfmadd213ps %zmm6,%zmm19,%zmm11786:62523548 b8 d9    vfmadd231ps %zmm9,%zmm9,%zmm11     78c:6271 7c4828 4c24vmovaps0x400(%rsp),%zmm9793:10794:6252 3d48 b8 d8    vfmadd231ps %zmm8,%zmm8,%zmm11     79a:62315448 5c c5vsubps %zmm21,%zmm5,%zmm8     7a0:62 c2 7d48 4e d3    vrsqrt14ps %zmm11,%zmm18     7a6:6271 7c48294424vmovaps %zmm8,0x380(%rsp)     7ad:0e     7ae:6231244859 davmulps %zmm18,%zmm11,%zmm11     7b4:6272 6d40 a8 df    vfmadd213ps %zmm7,%zmm18,%zmm11     7ba:62 a1 6c4059 d7vmulps %zmm23,%zmm18,%zmm18     7c0:62 c1 6c4059 d3vmulps %zmm11,%zmm18,%zmm18     7c6:6261 6c40594424vmulps0x480(%rsp),%zmm18,%zmm24     7cd:12     7ce:62713448 5c f8vsubps %zmm0,%zmm9,%zmm15     7d4:62 d1 7c4828 efvmovaps %zmm15,%zmm5     7da:62 f20548 a8 ee    vfmadd213ps %zmm6,%zmm15,%zmm5     7e0:62 d2 3d48 b8 e8    vfmadd231ps %zmm8,%zmm8,%zmm5     7e6:62717448594424vmulps0x500(%rsp),%zmm1,%zmm8     7ed:14     7ee:62 f1744859 c9vmulps %zmm1,%zmm1,%zmm1     7f4:62 f26548 b8 eb    vfmadd231ps %zmm3,%zmm3,%zmm5     7fa:62 f1 2c4859 5c24vmulps0x140(%rsp),%zmm10,%zmm3801:05802:6251 2c4859 d2vmulps %zmm10,%zmm10,%zmm10808:6272 7d48 4e dd    vrsqrt14ps %zmm5,%zmm11     80e:62 d1544859 ebvmulps %zmm11,%zmm5,%zmm5814:62 f22548 a8 ef    vfmadd213ps %zmm7,%zmm11,%zmm5     81a:6231244859 dfvmulps %zmm23,%zmm11,%zmm11820:6251744859 c0vmulps %zmm8,%zmm1,%zmm8826:6291 7c4828 4c 2evmovaps0x780(%r14,%r13,1),%zmm1     82d:1e     82e:6261 2c4859 cbvmulps %zmm3,%zmm10,%zmm25834:c4412857 d2vxorps %xmm10,%xmm10,%xmm10839:c46329 0c 8c2400vblendps$0x1,0x100(%rsp),%xmm10,%xmm9840:01000001844:62 f1244859 edvmulps %zmm5,%zmm11,%zmm5     84a:c46329 0c 9c24 c0vblendps$0x1,0xc0(%rsp),%xmm10,%xmm11851:00000001855:c4 e329 0c 9c2480vblendps$0x1,0x80(%rsp),%xmm10,%xmm3     85c:00000001860:6271 7c48285424vmovaps0x180(%rsp),%zmm10867:06868:62 e1 2c48 5c c8vsubps %zmm0,%zmm10,%zmm17     86e:62 f1 7c48284424vmovaps0x300(%rsp),%zmm0875:0c876:6271 7c48285424vmovaps0x2c0(%rsp),%zmm10     87d:0b     87e:62 f1 7c4829 5c24vmovaps %zmm3,0x100(%rsp)885:04886:62 b1 6c4059 davmulps %zmm18,%zmm18,%zmm3     88c:62 e15448595424vmulps0x280(%rsp),%zmm5,%zmm18893:0a894:62 f1544859 edvmulps %zmm5,%zmm5,%zmm5     89a:62123540 b8 de    vfmadd231ps %zmm30,%zmm25,%zmm11     8a0:6201 7c482874 2evmovaps0x600(%r14,%r13,1),%zmm30     8a7:18     8a8:62523540 b8 cc    vfmadd231ps %zmm12,%zmm25,%zmm9     8ae:6201644859 c0vmulps %zmm24,%zmm3,%zmm24     8b4:62 f11440 5c davsubps %zmm2,%zmm29,%zmm3     8ba:6221 7c4828 e8vmovaps %zmm16,%zmm29     8c0:62 e1 7c4829 4c24vmovaps %zmm17,0x80(%rsp)     8c7:02     8c8:62 e27540 a8 ce    vfmadd213ps %zmm6,%zmm17,%zmm17     8ce:62621540 a8 ee    vfmadd213ps %zmm6,%zmm29,%zmm29     8d4:62 f1 3c4859 f4vmulps %zmm4,%zmm8,%zmm6     8da:6232 3d40 b8 cb    vfmadd231ps %zmm19,%zmm24,%zmm9     8e0:6281 7c4828 5c 2evmovaps0x700(%r14,%r13,1),%zmm19     8e7:1c     8e8:62 f1 7c4829 5c24vmovaps %zmm3,0x1c0(%rsp)     8ef:07     8f0:62 a1544859 c2vmulps %zmm18,%zmm5,%zmm16     8f6:62 e1 7c48285424vmovaps0x640(%rsp),%zmm18     8fd:19     8fe:62 d2 7d40 b8 f7    vfmadd231ps %zmm15,%zmm16,%zmm6904:6211 7c4828 fcvmovaps %zmm28,%zmm15     90a:62 b1 7c48 5c c5vsubps %zmm21,%zmm0,%zmm0910:6231 2c48 5c d5vsubps %zmm21,%zmm10,%zmm10916:62 e1 7c4828 6c24vmovaps0x5c0(%rsp),%zmm21     91d:17     91e:6211 7c4828 e6vmovaps %zmm30,%zmm12924:62 e2 7d48 b8 c8    vfmadd231ps %zmm0,%zmm0,%zmm17     92a:6242 2d48 b8 ea    vfmadd231ps %zmm10,%zmm10,%zmm29930:62 f1 7c48294424vmovaps %zmm0,0xc0(%rsp)937:03938:6271 7c48295424vmovaps %zmm10,0x140(%rsp)     93f:05940:6251 7c4828 d3vmovaps %zmm11,%zmm10946:6271 7c4828 devmovaps %zmm6,%zmm11     94c:62626548 b8 eb    vfmadd231ps %zmm3,%zmm3,%zmm29952:62 b1 7c4828 f3vmovaps %zmm19,%zmm6958:6292 7d48 4e c5    vrsqrt14ps %zmm29,%zmm0     95e:62 f1144059 e8vmulps %zmm0,%zmm29,%zmm5964:62 f2 7d48 a8 ef    vfmadd213ps %zmm7,%zmm0,%zmm5     96a:62 e15440 5c eavsubps %zmm2,%zmm21,%zmm21970:62 a25540 b8 cd    vfmadd231ps %zmm21,%zmm21,%zmm17976:62 e1 7c4829 6c24vmovaps %zmm21,0x180(%rsp)     97d:06     97e:6281 3c4859 eavmulps %zmm26,%zmm8,%zmm21984:6201 7c482854 2evmovaps0x940(%r14,%r13,1),%zmm26     98b:25     98c:62 b2 7d48 4e d1    vrsqrt14ps %zmm17,%zmm2992:62 e2 7d40 b8 6c24 vfmadd231ps0x380(%rsp),%zmm16,%zmm21999:0e     99a:62 f1744059 e2vmulps %zmm2,%zmm17,%zmm4     9a0:62 f2 6d48 a8 e7    vfmadd213ps %zmm7,%zmm2,%zmm4     9a6:62 b1 6c4859 d7vmulps %zmm23,%zmm2,%zmm2     9ac:62 f1 6c4859 d4vmulps %zmm4,%zmm2,%zmm2     9b2:62 b1 7c4859 e7vmulps %zmm23,%zmm0,%zmm4     9b8:6281 3c4859 fbvmulps %zmm27,%zmm8,%zmm23     9be:6251 7c4828 c1vmovaps %zmm9,%zmm8     9c4:6201 7c4828 5c 2evmovaps0xb40(%r14,%r13,1),%zmm27     9cb:2d     9cc:6261 5c4859 edvmulps %zmm5,%zmm4,%zmm29     9d2:62 f1 7c4828 6c24vmovaps0x100(%rsp),%zmm5     9d9:04     9da:62 f1 6c4859 e2vmulps %zmm2,%zmm2,%zmm4     9e0:62 f1 4c4059 d2vmulps %zmm2,%zmm22,%zmm2     9e6:6281 7c482874 2evmovaps0x640(%r14,%r13,1),%zmm22     9ed:19     9ee:62 e1 5c4859 cavmulps %zmm2,%zmm4,%zmm17     9f4:6291 7c482854 2evmovaps0x6c0(%r14,%r13,1),%zmm2     9fb:1b     9fc:6291 7c482864 2evmovaps0x740(%r14,%r13,1),%zmm4     a03:1d     a04:62727540 b84424 vfmadd231ps0x80(%rsp),%zmm17,%zmm8     a0b:02     a0c:6271 7c48294424vmovaps %zmm8,0x80(%rsp)     a13:02     a14:6271 7c48284424vmovaps0x600(%rsp),%zmm8     a1b:18     a1c:62923540 b8 ef    vfmadd231ps %zmm31,%zmm25,%zmm5     a22:6201 7c4828 7c 2evmovaps0x680(%r14,%r13,1),%zmm31     a29:1a     a2a:6201 7c4828 4c 2evmovaps0x7c0(%r14,%r13,1),%zmm25     a31:1f     a32:6232 6d40 7f e6    vpermt2ps %zmm22,%zmm18,%zmm12     a38:62 f21548 7f f4    vpermt2ps %zmm4,%zmm13,%zmm6     a3e:62 f2 3d40 b8 6c24 vfmadd231ps0x3c0(%rsp),%zmm24,%zmm5     a45:0f     a46:62 f27540 b8 6c24 vfmadd231ps0x180(%rsp),%zmm17,%zmm5     a4d:06     a4e:6291 7c4828 ffvmovaps %zmm31,%zmm7     a54:6211 7c4828 cfvmovaps %zmm31,%zmm9     a5a:62 f2 6d40 7f fa    vpermt2ps %zmm2,%zmm18,%zmm7     a60:62721548 7f ca    vpermt2ps %zmm2,%zmm13,%zmm9     a66:62 f1 7c4829 6c24vmovaps %zmm5,0x100(%rsp)     a6d:04     a6e:62 f1 7c4828 6c24vmovaps0x700(%rsp),%zmm5     a75:1c     a76:62 f3 9d4823 c7 e4 vshuff64x2$0xe4,%zmm7,%zmm12,%zmm0     a7d:62 f1 7c4828 f9vmovaps %zmm1,%zmm7     a83:6231 7c4828 e3vmovaps %zmm19,%zmm12     a89:6292 6d40 7f f9    vpermt2ps %zmm25,%zmm18,%zmm7     a8f:6272 6d40 7f e4    vpermt2ps %zmm4,%zmm18,%zmm12     a95:62 f1 fd48294424vmovapd %zmm0,0x4c0(%rsp)     a9c:13     a9d:62 f3 9d4823 df e4 vshuff64x2$0xe4,%zmm7,%zmm12,%zmm3     aa4:6271 7c4828 e1vmovaps %zmm1,%zmm12aaa:6291 7c4828 fevmovaps %zmm30,%zmm7     ab0:62121548 7f e1    vpermt2ps %zmm25,%zmm13,%zmm12     ab6:62 b21548 7f fe    vpermt2ps %zmm22,%zmm13,%zmm7     abc:62 f1 fd4829 5c24vmovapd %zmm3,0x500(%rsp)     ac3:14     ac4:62 d3 cd4823 f4 e4 vshuff64x2$0xe4,%zmm12,%zmm6,%zmm6     acb:6253 c54823 c9 e4 vshuff64x2$0xe4,%zmm9,%zmm7,%zmm9     ad2:6291 7c4828 ffvmovaps %zmm31,%zmm7     ad8:6262 1d40 7f fa    vpermt2ps %zmm2,%zmm28,%zmm31     ade:6211 7c482864 2evmovaps0x980(%r14,%r13,1),%zmm12     ae5:26     ae6:62 f2 0d48 7f fa    vpermt2ps %zmm2,%zmm14,%zmm7     aec:62 f1 7c4828 d1vmovaps %zmm1,%zmm2     af2:6292 1d40 7f c9    vpermt2ps %zmm25,%zmm28,%zmm1     af8:62 f1 fd48297424vmovapd %zmm6,0x440(%rsp)     aff:11     b00:6291 7c4828 f6vmovaps %zmm30,%zmm6     b06:6292 0d48 7f d1    vpermt2ps %zmm25,%zmm14,%zmm2     b0c:6222 1d40 7f f6    vpermt2ps %zmm22,%zmm28,%zmm30     b12:6201 7c4828 4c 2evmovaps0x880(%r14,%r13,1),%zmm25     b19:22     b1a:62 b2 0d48 7f f6    vpermt2ps %zmm22,%zmm14,%zmm6     b20:62 f3 cd4823 df e4 vshuff64x2$0xe4,%zmm7,%zmm6,%zmm3     b27:6291 5c4059 f5vmulps %zmm29,%zmm20,%zmm6     b2d:6293 8d4023 ff e4 vshuff64x2$0xe4,%zmm31,%zmm30,%zmm7     b34:6201 7c482874 2evmovaps0x9c0(%r14,%r13,1),%zmm30     b3b:27     b3c:62 c1 7c4828 e4vmovaps %zmm12,%zmm20     b42:6201 7c4828 7c 2evmovaps0xa80(%r14,%r13,1),%zmm31     b49:2a     b4a:62 f1 fd4829 5c24vmovapd %zmm3,0x400(%rsp)     b51:10     b52:62 d1 7c4828 davmovaps %zmm10,%zmm3     b58:6231 7c4828 d3vmovaps %zmm19,%zmm10     b5e:62 e2 1d40 7f dc    vpermt2ps %zmm4,%zmm28,%zmm19     b64:62 f1 fd4829 7c24vmovapd %zmm7,0x280(%rsp)     b6b:0a     b6c:6272 0d48 7f d4    vpermt2ps %zmm4,%zmm14,%zmm10     b72:6291 7c482864 2evmovaps0x800(%r14,%r13,1),%zmm4     b79:20     b7a:62 f2 3d40 b8 5c24 vfmadd231ps0x200(%rsp),%zmm24,%zmm3     b81:08     b82:6201 7c482844 2evmovaps0x8c0(%r14,%r13,1),%zmm24     b89:23     b8a:62 f27540 b8 5c24 vfmadd231ps0xc0(%rsp),%zmm17,%zmm3     b91:03     b92:62821548 7f e6    vpermt2ps %zmm30,%zmm13,%zmm20     b98:6281 7c4828 cfvmovaps %zmm31,%zmm17     b9e:62 f3 e54023 c1 e4 vshuff64x2$0xe4,%zmm1,%zmm19,%zmm0     ba5:6291 7c4828 4c 2evmovaps0x840(%r14,%r13,1),%zmm1     bac:21     bad:6281 7c4828 d9vmovaps %zmm25,%zmm19     bb3:62 f3 ad4823 d2 e4 vshuff64x2$0xe4,%zmm2,%zmm10,%zmm2     bba:6211144059 d5vmulps %zmm29,%zmm29,%zmm10     bc0:6201 7c4828 6c 2evmovaps0x900(%r14,%r13,1),%zmm29     bc7:24     bc8:6282 6d40 7f d8    vpermt2ps %zmm24,%zmm18,%zmm19     bce:62 f1 fd48294424vmovapd %zmm0,0x480(%rsp)     bd5:12     bd6:62 f1 7c4829 5c24vmovaps %zmm3,0xc0(%rsp)     bdd:03     bde:62 f1 fd48295424vmovapd %zmm2,0x200(%rsp)     be5:08     be6:62 b1 7c4828 d7vmovaps %zmm23,%zmm2     bec:62 e1 2c4859 fevmulps %zmm6,%zmm10,%zmm23     bf2:62 f1 7c4828 f4vmovaps %zmm4,%zmm6     bf8:6251 7c4828 d4vmovaps %zmm12,%zmm10     bfe:6212 6d40 7f d6    vpermt2ps %zmm30,%zmm18,%zmm10     c04:62724540 b8 5c24 vfmadd231ps0x340(%rsp),%zmm23,%zmm11     c0b:0d     c0c:62 f2 7d40 b85424 vfmadd231ps0x240(%rsp),%zmm16,%zmm2     c13:09     c14:62 e24540 b8 6c24 vfmadd231ps0x140(%rsp),%zmm23,%zmm21     c1b:05     c1c:62 f2 6d40 7f f1    vpermt2ps %zmm1,%zmm18,%zmm6     c22:62 f24540 b85424 vfmadd231ps0x1c0(%rsp),%zmm23,%zmm2     c29:07     c2a:62 e1 7c4829 6c24vmovaps %zmm21,0x140(%rsp)     c31:05     c32:6271 7c4829 5c24vmovaps %zmm11,0x240(%rsp)     c39:09     c3a:6281 7c4828 f5vmovaps %zmm29,%zmm22     c40:6291 7c4828 fdvmovaps %zmm29,%zmm7     c46:6282 6d40 7f f2    vpermt2ps %zmm26,%zmm18,%zmm22     c4c:62921548 7f fa    vpermt2ps %zmm26,%zmm13,%zmm7     c52:62 f1 7c48295424vmovaps %zmm2,0x1c0(%rsp)     c59:07     c5a:62 b3 cd4823 c3 e4 vshuff64x2$0xe4,%zmm19,%zmm6,%zmm0     c61:6291 7c4828 f1vmovaps %zmm25,%zmm6     c67:62921548 7f f0    vpermt2ps %zmm24,%zmm13,%zmm6     c6d:62 f1 fd48294424vmovapd %zmm0,0x380(%rsp)     c74:0e     c75:62 d3 cd4023 c2 e4 vshuff64x2$0xe4,%zmm10,%zmm22,%zmm0     c7c:6271 7c4828 d4vmovaps %zmm4,%zmm10     c82:6272 0d48 7f d1    vpermt2ps %zmm1,%zmm14,%zmm10     c88:62 f1 fd48294424vmovapd %zmm0,0x3c0(%rsp)     c8f:0f     c90:62 f1 7c4828 c4vmovaps %zmm4,%zmm0     c96:62 f2 1d40 7f e1    vpermt2ps %zmm1,%zmm28,%zmm4     c9c:62 f21548 7f c1    vpermt2ps %zmm1,%zmm13,%zmm0     ca2:62 f3 fd4823 c6 e4 vshuff64x2$0xe4,%zmm6,%zmm0,%zmm0     ca9:6291 7c482874 2evmovaps0xb80(%r14,%r13,1),%zmm6     cb0:2e     cb1:62 f1 fd48294424vmovapd %zmm0,0x2c0(%rsp)     cb8:0b     cb9:62 b3 c54823 c4 e4 vshuff64x2$0xe4,%zmm20,%zmm7,%zmm0     cc0:62 f1 7c4828 7c24vmovaps0x6c0(%rsp),%zmm7     cc7:1b     cc8:62 f1 fd48294424vmovapd %zmm0,0x340(%rsp)     ccf:0d     cd0:6291 7c4828 c1vmovaps %zmm25,%zmm0     cd6:6202 1d40 7f c8    vpermt2ps %zmm24,%zmm28,%zmm25     cdc:6292 0d48 7f c0    vpermt2ps %zmm24,%zmm14,%zmm0     ce2:6241 7c4828 c4vmovaps %zmm12,%zmm24     ce8:6212 1d40 7f e6    vpermt2ps %zmm30,%zmm28,%zmm12     cee:6202 0d48 7f c6    vpermt2ps %zmm30,%zmm14,%zmm24     cf4:6201 7c482874 2evmovaps0xa40(%r14,%r13,1),%zmm30     cfb:29     cfc:6293 dd4823 c9 e4 vshuff64x2$0xe4,%zmm25,%zmm4,%zmm1     d03:6291 7c482864 2evmovaps0xbc0(%r14,%r13,1),%zmm4     d0a:2f     d0b:62 e3 ad4823 f0 e4 vshuff64x2$0xe4,%zmm0,%zmm10,%zmm22     d12:6291 7c4828 c5vmovaps %zmm29,%zmm0     d18:6202 1d40 7f ea    vpermt2ps %zmm26,%zmm28,%zmm29     d1e:6201 7c482864 2evmovaps0xa00(%r14,%r13,1),%zmm28     d25:28     d26:6211 7c482854 2evmovaps0xac0(%r14,%r13,1),%zmm10     d2d:2b     d2e:6292 0d48 7f c2    vpermt2ps %zmm26,%zmm14,%zmm0     d34:62 f1 fd4829 4c24vmovapd %zmm1,0x300(%rsp)     d3b:0c     d3c:6291 7c4828 4c 2evmovaps0xb00(%r14,%r13,1),%zmm1     d43:2c     d44:6261 7c48285424vmovaps0x780(%rsp),%zmm26     d4b:1e     d4c:62 d3954023 dc e4 vshuff64x2$0xe4,%zmm12,%zmm29,%zmm3     d53:62713448 5c e7vsubps %zmm7,%zmm9,%zmm12     d59:6271 7c4828 4c24vmovaps0x680(%rsp),%zmm9     d60:1a     d61:6281 7c4828 fcvmovaps %zmm28,%zmm23     d67:62 c2 6d40 7f ca    vpermt2ps %zmm10,%zmm18,%zmm17     d6d:6283 fd4823 c0 e4 vshuff64x2$0xe4,%zmm24,%zmm0,%zmm16     d74:6201 7c4828 c7vmovaps %zmm31,%zmm24     d7a:6201 7c4828 ccvmovaps %zmm28,%zmm25     d80:6282 6d40 7f fe    vpermt2ps %zmm30,%zmm18,%zmm23     d86:62421548 7f c2    vpermt2ps %zmm10,%zmm13,%zmm24     d8c:62021548 7f ce    vpermt2ps %zmm30,%zmm13,%zmm25     d92:62 f1 fd4829 5c24vmovapd %zmm3,0x180(%rsp)     d99:06     d9a:62 f1 7c4828 5c24vmovaps0x4c0(%rsp),%zmm3     da1:13     da2:62 c1 7c40 5c c1vsubps %zmm9,%zmm16,%zmm16     da8:62 b3 c54023 c1 e4 vshuff64x2$0xe4,%zmm17,%zmm23,%zmm0     daf:62 e1 7c4828 cevmovaps %zmm6,%zmm17     db5:62 e1 7c4828 f9vmovaps %zmm1,%zmm23     dbb:6293 b54023 d0 e4 vshuff64x2$0xe4,%zmm24,%zmm25,%zmm2     dc2:62 e2 6d40 7f cc    vpermt2ps %zmm4,%zmm18,%zmm17     dc8:62821548 7f fb    vpermt2ps %zmm27,%zmm13,%zmm23     dce:62716448 5c ddvsubps %zmm5,%zmm3,%zmm11     dd4:62 f1 fd48294424vmovapd %zmm0,0x540(%rsp)     ddb:15     ddc:62 f1 7c4828 c1vmovaps %zmm1,%zmm0     de2:62 f1 fd48295424vmovapd %zmm2,0x8c0(%rsp)     de9:23     dea:62 f1 7c48285424vmovaps0x400(%rsp),%zmm2     df1:10     df2:6292 6d40 7f c3    vpermt2ps %zmm27,%zmm18,%zmm0     df8:62 b3 fd4823 c1 e4 vshuff64x2$0xe4,%zmm17,%zmm0,%zmm0     dff:62 e1 7c4828 cevmovaps %zmm6,%zmm17     e05:62 c1 6c48 5c e1vsubps %zmm9,%zmm2,%zmm20     e0b:62 f1 7c48285424vmovaps0x440(%rsp),%zmm2     e12:11     e13:62 e1 7c48294424vmovaps %zmm16,0x440(%rsp)     e1a:11     e1b:62 e21548 7f cc    vpermt2ps %zmm4,%zmm13,%zmm17     e21:62 f1 fd48294424vmovapd %zmm0,0x880(%rsp)     e28:22     e29:62 a3 c54023 e9 e4 vshuff64x2$0xe4,%zmm17,%zmm23,%zmm21     e30:6281 7c4828 cfvmovaps %zmm31,%zmm17     e36:62420548 7f fa    vpermt2ps %zmm10,%zmm15,%zmm31     e3c:62 e1 7c4828 7c24vmovaps0x740(%rsp),%zmm23     e43:1d     e44:62 f1 6c48 5c d7vsubps %zmm7,%zmm2,%zmm2     e4a:62 c2 0d48 7f ca    vpermt2ps %zmm10,%zmm14,%zmm17     e50:6211 7c4828 d4vmovaps %zmm28,%zmm10     e56:62 f1 7c48295424vmovaps %zmm2,0x580(%rsp)     e5d:16     e5e:62020548 7f e6    vpermt2ps %zmm30,%zmm15,%zmm28     e64:6212 0d48 7f d6    vpermt2ps %zmm30,%zmm14,%zmm10     e6a:6203 9d4023 f7 e4 vshuff64x2$0xe4,%zmm31,%zmm28,%zmm30     e71:62 a3 ad4823 d9 e4 vshuff64x2$0xe4,%zmm17,%zmm10,%zmm19     e78:6251 7c4828 d3vmovaps %zmm11,%zmm10     e7e:62522548 a8 d0    vfmadd213ps %zmm8,%zmm11,%zmm10     e84:6252 1d48 b8 d4    vfmadd231ps %zmm12,%zmm12,%zmm10     e8a:6232 5d40 b8 d4    vfmadd231ps %zmm20,%zmm20,%zmm10     e90:62 c2 7d48 4e ca    vrsqrt14ps %zmm10,%zmm17     e96:6231 2c4859 d1vmulps %zmm17,%zmm10,%zmm10     e9c:62127540 a8 d2    vfmadd213ps %zmm26,%zmm17,%zmm10     ea2:62 a1744059 cfvmulps %zmm23,%zmm17,%zmm17     ea8:62 d1744059 davmulps %zmm10,%zmm17,%zmm3     eae:6271 7c48285424vmovaps0x500(%rsp),%zmm10     eb5:14     eb6:6261 2c48 5c cdvsubps %zmm5,%zmm10,%zmm25     ebc:6271 7c48285424vmovaps0x200(%rsp),%zmm10     ec3:08     ec4:6281 7c4828 c9vmovaps %zmm25,%zmm17     eca:62 c23540 a8 c8    vfmadd213ps %zmm8,%zmm25,%zmm17     ed0:62 e2 6d48 b8 ca    vfmadd231ps %zmm2,%zmm2,%zmm17     ed6:62 d1 2c48 5c c1vsubps %zmm9,%zmm10,%zmm0     edc:6271 7c48285424vmovaps0x2c0(%rsp),%zmm10     ee3:0b     ee4:62 e2 7d48 b8 c8    vfmadd231ps %zmm0,%zmm0,%zmm17     eea:62 f1 7c48294424vmovaps %zmm0,0x5c0(%rsp)     ef1:17     ef2:62 d1 4c40 5c c1vsubps %zmm9,%zmm22,%zmm0     ef8:6222 7d48 4e c1    vrsqrt14ps %zmm17,%zmm24     efe:62 f1 7c48294424vmovaps %zmm0,0x200(%rsp)     f05:08     f06:6281744059 c8vmulps %zmm24,%zmm17,%zmm17     f0c:6282 3d40 a8 ca    vfmadd213ps %zmm26,%zmm24,%zmm17     f12:6221 3c4059 c7vmulps %zmm23,%zmm24,%zmm24     f18:62 b1 3c4059 d1vmulps %zmm17,%zmm24,%zmm2     f1e:62 e1 7c4828 4c24vmovaps0x380(%rsp),%zmm17     f25:0e     f26:6271 2c48 5c d7vsubps %zmm7,%zmm10,%zmm10     f2c:6271 7c48295424vmovaps %zmm10,0x380(%rsp)     f33:0e     f34:62617440 5c c5vsubps %zmm5,%zmm17,%zmm24     f3a:6281 7c4828 c8vmovaps %zmm24,%zmm17     f40:62 c2 3d40 a8 c8    vfmadd213ps %zmm8,%zmm24,%zmm17     f46:62 c2 2d48 b8 ca    vfmadd231ps %zmm10,%zmm10,%zmm17     f4c:6271 7c48285424vmovaps0x340(%rsp),%zmm10     f53:0d     f54:62 e2 7d48 b8 c8    vfmadd231ps %zmm0,%zmm0,%zmm17     f5a:62 a2 7d48 4e f1    vrsqrt14ps %zmm17,%zmm22     f60:62 a1744059 cevmulps %zmm22,%zmm17,%zmm17     f66:6282 4d40 a8 ca    vfmadd213ps %zmm26,%zmm22,%zmm17     f6c:62 a1 4c4059 f7vmulps %zmm23,%zmm22,%zmm22     f72:62 a1 4c4059 f1vmulps %zmm17,%zmm22,%zmm22     f78:62 e1 7c4828 4c24vmovaps0x3c0(%rsp),%zmm17     f7f:0f     f80:6271 2c48 5c d7vsubps %zmm7,%zmm10,%zmm10     f86:6271 7c48295424vmovaps %zmm10,0x340(%rsp)     f8d:0d     f8e:62 e17440 5c cdvsubps %zmm5,%zmm17,%zmm17     f94:6221 7c4828 e9vmovaps %zmm17,%zmm29     f9a:62427540 a8 e8    vfmadd213ps %zmm8,%zmm17,%zmm29     fa0:6242 2d48 b8 ea    vfmadd231ps %zmm10,%zmm10,%zmm29     fa6:62716448595424vmulps0x280(%rsp),%zmm3,%zmm10     fad:0a     fae:62 f1644859 dbvmulps %zmm3,%zmm3,%zmm3     fb4:6222 7d40 b8 e8    vfmadd231ps %zmm16,%zmm16,%zmm29     fba:6292 7d48 4e c5    vrsqrt14ps %zmm29,%zmm0     fc0:62 e1144059 c0vmulps %zmm0,%zmm29,%zmm16     fc6:6261 7c4828 e9vmovaps %zmm1,%zmm29     fcc:62920548 7f cb    vpermt2ps %zmm27,%zmm15,%zmm1     fd2:6282 7d48 a8 c2    vfmadd213ps %zmm26,%zmm0,%zmm16     fd8:62 b1 7c4859 c7vmulps %zmm23,%zmm0,%zmm0     fde:62 d1644859 davmulps %zmm10,%zmm3,%zmm3     fe4:6202 0d48 7f eb    vpermt2ps %zmm27,%zmm14,%zmm29     fea:62 a1 7c4859 c0vmulps %zmm16,%zmm0,%zmm16     ff0:62 f1 7c4828 c6vmovaps %zmm6,%zmm0     ff6:62 f20548 7f f4    vpermt2ps %zmm4,%zmm15,%zmm6     ffc:62 f2 0d48 7f c4    vpermt2ps %zmm4,%zmm14,%zmm01002:62 f1 6c48596424vmulps0x480(%rsp),%zmm2,%zmm41009:12    100a:62 f1 6c4859 d2vmulps %zmm2,%zmm2,%zmm21010:6261 6c4859 e4vmulps %zmm4,%zmm2,%zmm281016:62 f1 4c40595424vmulps0x300(%rsp),%zmm22,%zmm2    101d:0c    101e:6273 f54823 d6 e4 vshuff64x2$0xe4,%zmm6,%zmm1,%zmm101025:62 b1 4c4059 cevmulps %zmm22,%zmm22,%zmm1    102b:62 d16440 5c f1vsubps %zmm9,%zmm19,%zmm61031:62 e1 7c4059 5c24vmulps0x180(%rsp),%zmm16,%zmm191038:061039:62 a1 7c4059 c0vmulps %zmm16,%zmm16,%zmm16    103f:6263954023 e8 e4 vshuff64x2$0xe4,%zmm0,%zmm29,%zmm291046:62 f1 7c48284424vmovaps0x880(%rsp),%zmm0    104d:22    104e:62 f1 7c48297424vmovaps %zmm6,0x280(%rsp)1055:0a1056:62 e1744859 f2vmulps %zmm2,%zmm1,%zmm22    105c:62 f1 7c48285424vmovaps0x540(%rsp),%zmm21063:151064:6291 7c4828 4c 2evmovaps0xd80(%r14,%r13,1),%zmm1    106b:36    106c:62511440 5c c9vsubps %zmm9,%zmm29,%zmm91072:6271 7c4829 4c24vmovaps %zmm9,0x4c0(%rsp)1079:13    107a:6261 7c48 5c ddvsubps %zmm5,%zmm0,%zmm271080:62 f1 7c4828 c7vmovaps %zmm7,%zmm01086:62 f1 6c48 5c e5vsubps %zmm5,%zmm2,%zmm4    108c:62 f1 7c48285424vmovaps0x8c0(%rsp),%zmm21093:231094:6291 7c4828 ebvmovaps %zmm27,%zmm5    109a:6261 7c4829 5c24vmovaps %zmm27,0x400(%rsp)    10a1:10    10a2:62 f1 7c48296424vmovaps %zmm4,0x3c0(%rsp)    10a9:0f    10aa:62 d2 5d48 a8 e0    vfmadd213ps %zmm8,%zmm4,%zmm4    10b0:62 d22540 a8 e8    vfmadd213ps %zmm8,%zmm27,%zmm5    10b6:6221 7c4059 dbvmulps %zmm19,%zmm16,%zmm27    10bc:6211 7c482844 2evmovaps0xcc0(%r14,%r13,1),%zmm8    10c3:33    10c4:6281 7c482844 2evmovaps0xc40(%r14,%r13,1),%zmm16    10cb:31    10cc:62 f1 6c48 5c ffvsubps %zmm7,%zmm2,%zmm7    10d2:62 f15440 5c d0vsubps %zmm0,%zmm21,%zmm2    10d8:62 e1 7c4828 e8vmovaps %zmm0,%zmm21    10de:62 f1 7c48284424vmovaps0x80(%rsp),%zmm0    10e5:02    10e6:62 f24548 b8 e7    vfmadd231ps %zmm7,%zmm7,%zmm4    10ec:62 f1 7c4829 7c24vmovaps %zmm7,0x480(%rsp)    10f3:12    10f4:62 f2 6d48 b8 ea    vfmadd231ps %zmm2,%zmm2,%zmm5    10fa:62 f1 7c48295424vmovaps %zmm2,0x500(%rsp)1101:141102:6291 7c482854 2evmovaps0xc00(%r14,%r13,1),%zmm21109:30    110a:62 f2 4d48 b8 e6    vfmadd231ps %zmm6,%zmm6,%zmm41110:62 d23548 b8 e9    vfmadd231ps %zmm9,%zmm9,%zmm51116:6291 7c482874 2evmovaps0xdc0(%r14,%r13,1),%zmm6    111d:37    111e:62 f2 7d48 4e fc    vrsqrt14ps %zmm4,%zmm71124:62 f1 5c4859 e7vmulps %zmm7,%zmm4,%zmm4    112a:62924548 a8 e2    vfmadd213ps %zmm26,%zmm7,%zmm41130:62 b1444859 ffvmulps %zmm23,%zmm7,%zmm71136:62 f1444859 e4vmulps %zmm4,%zmm7,%zmm4    113c:62 f2 7d48 4e fd    vrsqrt14ps %zmm5,%zmm71142:62 f1544859 efvmulps %zmm7,%zmm5,%zmm51148:62 d26548 b8 c3    vfmadd231ps %zmm11,%zmm3,%zmm0    114e:62924548 a8 ea    vfmadd213ps %zmm26,%zmm7,%zmm51154:62 b1444859 ffvmulps %zmm23,%zmm7,%zmm7    115a:6271 7c4828 d9vmovaps %zmm1,%zmm111160:62 f1 7c48294424vmovaps %zmm0,0x80(%rsp)1167:021168:62 f1 7c48284424vmovaps0xc0(%rsp),%zmm0    116f:031170:6271444859 cdvmulps %zmm5,%zmm7,%zmm91176:6291 7c4828 6c 2evmovaps0xc80(%r14,%r13,1),%zmm5    117d:32    117e:62 f1 0c4059 fcvmulps %zmm4,%zmm30,%zmm71184:6261 7c4828 eavmovaps %zmm2,%zmm29    118a:62 f1 5c4859 e4vmulps %zmm4,%zmm4,%zmm41190:6261 7c4828 f2vmovaps %zmm2,%zmm301196:6261 5c4859 ffvmulps %zmm7,%zmm4,%zmm31    119c:6291 7c482864 2evmovaps0xd40(%r14,%r13,1),%zmm4    11a3:35    11a4:62221548 7f e8    vpermt2ps %zmm16,%zmm13,%zmm29    11aa:6222 6d40 7f f0    vpermt2ps %zmm16,%zmm18,%zmm30    11b0:62721548 7f de    vpermt2ps %zmm6,%zmm13,%zmm11    11b6:6251 2c4859 d1vmulps %zmm9,%zmm10,%zmm10    11bc:6251344859 c9vmulps %zmm9,%zmm9,%zmm9    11c2:62 f1 7c4828 f9vmovaps %zmm1,%zmm7    11c8:62 f2 6d40 7f fe    vpermt2ps %zmm6,%zmm18,%zmm7    11ce:62 d26548 b8 c4    vfmadd231ps %zmm12,%zmm3,%zmm0    11d4:6271 7c48286424vmovaps0x100(%rsp),%zmm12    11db:04    11dc:62 f1 7c48294424vmovaps %zmm0,0xc0(%rsp)    11e3:03    11e4:6291 7c482844 2evmovaps0xd00(%r14,%r13,1),%zmm0    11eb:34    11ec:62326548 b8 e4    vfmadd231ps %zmm20,%zmm3,%zmm12    11f2:62 e1 7c4828 e5vmovaps %zmm5,%zmm20    11f8:62 f1 7c4828 ddvmovaps %zmm5,%zmm3    11fe:62 c21548 7f e0    vpermt2ps %zmm8,%zmm13,%zmm201204:62 d2 6d40 7f d8    vpermt2ps %zmm8,%zmm18,%zmm3    120a:62 e1 7c4828 d8vmovaps %zmm0,%zmm191210:6272 4d40 b86424 vfmadd231ps0x200(%rsp),%zmm22,%zmm121217:081218:62 e2 6d40 7f dc    vpermt2ps %zmm4,%zmm18,%zmm19    121e:62720540 b86424 vfmadd231ps0x280(%rsp),%zmm31,%zmm121225:0a1226:62 a3954023 e4 e4 vshuff64x2$0xe4,%zmm20,%zmm29,%zmm20    122d:6261 7c4828 e8vmovaps %zmm0,%zmm291233:6263 8d4023 f3 e4 vshuff64x2$0xe4,%zmm3,%zmm30,%zmm30    123a:62 d1344859 davmulps %zmm10,%zmm9,%zmm31240:6271 7c4828 d2vmovaps %zmm2,%zmm101246:62 b20548 7f d0    vpermt2ps %zmm16,%zmm15,%zmm2    124c:62621548 7f ec    vpermt2ps %zmm4,%zmm13,%zmm291252:62 f1 7c4829 5c24vmovaps %zmm3,0x540(%rsp)1259:15    125a:62 f1 7c4828 ddvmovaps %zmm5,%zmm31260:6232 0d48 7f d0    vpermt2ps %zmm16,%zmm14,%zmm101266:62 e1 7c48284424vmovaps0x700(%rsp),%zmm16    126d:1c    126e:62 d20548 7f e8    vpermt2ps %zmm8,%zmm15,%zmm51274:62 f3 e54023 ff e4 vshuff64x2$0xe4,%zmm7,%zmm19,%zmm7    127b:6271 7c48296424vmovaps %zmm12,0x100(%rsp)1282:041283:62 d2 0d48 7f d8    vpermt2ps %zmm8,%zmm14,%zmm31289:6271 7c4828 c1vmovaps %zmm1,%zmm8    128f:62 f20548 7f ce    vpermt2ps %zmm6,%zmm15,%zmm11295:6272 0d48 7f c6    vpermt2ps %zmm6,%zmm14,%zmm8    129b:6243954023 eb e4 vshuff64x2$0xe4,%zmm11,%zmm29,%zmm29    12a2:6271 7c4828 d8vmovaps %zmm0,%zmm11    12a8:62 f20548 7f c4    vpermt2ps %zmm4,%zmm15,%zmm0    12ae:62 f3 ed4823 ed e4 vshuff64x2$0xe4,%zmm5,%zmm2,%zmm5    12b5:62 b1 7c4828 d5vmovaps %zmm21,%zmm2    12bb:6273 ad4823 cb e4 vshuff64x2$0xe4,%zmm3,%zmm10,%zmm9    12c2:62 f1 7c4828 5c24vmovaps0x240(%rsp),%zmm3    12c9:09    12ca:62 b1 0c40 5c f0vsubps %zmm16,%zmm30,%zmm6    12d0:6221 7c4828 f0vmovaps %zmm16,%zmm30    12d6:6272 0d48 7f dc    vpermt2ps %zmm4,%zmm14,%zmm11    12dc:62 f1 7c48286424vmovaps0x80(%rsp),%zmm4    12e3:02    12e4:62711440 5c d2vsubps %zmm2,%zmm29,%zmm10    12ea:6271 7c48295424vmovaps %zmm10,0x180(%rsp)    12f1:06    12f2:62 e3 fd4823 d9 e4 vshuff64x2$0xe4,%zmm1,%zmm0,%zmm19    12f9:62 b14448 5c c0vsubps %zmm16,%zmm7,%zmm0    12ff:62 e1 7c48284424vmovaps0x600(%rsp),%zmm161306:181307:6253 a54823 c0 e4 vshuff64x2$0xe4,%zmm8,%zmm11,%zmm8    130e:6271 7c4828 5c24vmovaps0x140(%rsp),%zmm111315:051316:62 f1 7c48294424vmovaps %zmm0,0x2c0(%rsp)    131d:0b    131e:6292 1d40 b8 d9    vfmadd231ps %zmm25,%zmm28,%zmm31324:6221 5c40 5c cdvsubps %zmm21,%zmm20,%zmm25    132a:62 e1 7c48286424vmovaps0x680(%rsp),%zmm201331:1a1332:6292 4d40 b8 e0    vfmadd231ps %zmm24,%zmm22,%zmm41338:62 e1 7c4828 6c24vmovaps0x1c0(%rsp),%zmm21    133f:071340:6201 7c482844 2evmovaps0xfc0(%r14,%r13,1),%zmm241347:3f1348:62 b22540 b8 d9    vfmadd231ps %zmm17,%zmm27,%zmm3    134e:6261 7c4829 4c24vmovaps %zmm25,0x300(%rsp)1355:0c1356:62 f20540 b86424 vfmadd231ps0x3c0(%rsp),%zmm31,%zmm4    135d:0f    135e:62 f1 7c48296424vmovaps %zmm4,0x80(%rsp)1365:021366:62 b2 7d48 a8 c0    vfmadd213ps %zmm16,%zmm0,%zmm0    136c:62 d2 2d48 b8 c2    vfmadd231ps %zmm10,%zmm10,%zmm01372:6271 7c4828 d3vmovaps %zmm3,%zmm101378:62 f1 7c4828 5c24vmovaps0xc0(%rsp),%zmm3    137f:031380:6272 1d40 b8 5c24 vfmadd231ps0x580(%rsp),%zmm28,%zmm111387:161388:62 f1 7c48297424vmovaps %zmm6,0x580(%rsp)    138f:161390:62 b2 4d48 a8 f0    vfmadd213ps %zmm16,%zmm6,%zmm61396:62 b13448 5c fcvsubps %zmm20,%zmm9,%zmm7    139c:62923540 b8 f1    vfmadd231ps %zmm25,%zmm25,%zmm6    13a2:6201 7c4828 4c 2evmovaps0xf80(%r14,%r13,1),%zmm25    13a9:3e    13aa:6231 3c48 5c c4vsubps %zmm20,%zmm8,%zmm8    13b0:6211 7c4828 4c 2evmovaps0xf40(%r14,%r13,1),%zmm9    13b7:3d    13b8:62 e2 1d40 b8 6c24 vfmadd231ps0x5c0(%rsp),%zmm28,%zmm21    13bf:17    13c0:62 f24548 b8 f7    vfmadd231ps %zmm7,%zmm7,%zmm6    13c6:62 f1 7c4829 7c24vmovaps %zmm7,0x240(%rsp)    13cd:09    13ce:62 d2 3d48 b8 c0    vfmadd231ps %zmm8,%zmm8,%zmm0    13d4:6271 7c48294424vmovaps %zmm8,0x1c0(%rsp)    13db:07    13dc:62 f2 7d48 4e fe    vrsqrt14ps %zmm6,%zmm7    13e2:62 f2 7d48 4e d0    vrsqrt14ps %zmm0,%zmm2    13e8:62 f1 4c4859 cfvmulps %zmm7,%zmm6,%zmm1    13ee:62 f1 7c4859 f2vmulps %zmm2,%zmm0,%zmm6    13f4:62722540 b8 5c24 vfmadd231ps0x340(%rsp),%zmm27,%zmm11    13fb:0d    13fc:62924548 a8 ca    vfmadd213ps %zmm26,%zmm7,%zmm11402:62 b1444859 ffvmulps %zmm23,%zmm7,%zmm71408:6292 6d48 a8 f2    vfmadd213ps %zmm26,%zmm2,%zmm6    140e:62 e22540 b8 6c24 vfmadd231ps0x440(%rsp),%zmm27,%zmm211415:111416:62 f1444859 c9vmulps %zmm1,%zmm7,%zmm1    141c:62 b1 6c4859 ffvmulps %zmm23,%zmm2,%zmm71422:62 f1444859 f6vmulps %zmm6,%zmm7,%zmm61428:62 f1744859 c1vmulps %zmm1,%zmm1,%zmm0    142e:62 f1544859 c9vmulps %zmm1,%zmm5,%zmm11434:6291 7c4828 7c 2evmovaps0xe80(%r14,%r13,1),%zmm7    143b:3a    143c:6291 7c4828 6c 2evmovaps0xe00(%r14,%r13,1),%zmm51443:381444:62 f2 4d40 b8 5c24 vfmadd231ps0x380(%rsp),%zmm22,%zmm3    144b:0e    144c:6281 7c482874 2evmovaps0xf00(%r14,%r13,1),%zmm221453:3c1454:62 e1 7c4859 c9vmulps %zmm1,%zmm0,%zmm17    145a:62 f1644059 c6vmulps %zmm6,%zmm19,%zmm01460:6271 4c4859 c6vmulps %zmm6,%zmm6,%zmm81466:6291 7c4828 4c 2evmovaps0xe40(%r14,%r13,1),%zmm1    146d:39    146e:6291 7c482874 2evmovaps0xec0(%r14,%r13,1),%zmm61475:3b1476:6271 7c4829 5c24vmovaps %zmm11,0x140(%rsp)    147d:05    147e:62 f1 3c4859 d0vmulps %zmm0,%zmm8,%zmm21484:6211 7c4828 c1vmovaps %zmm25,%zmm8    148a:6212 6d40 7f c0    vpermt2ps %zmm24,%zmm18,%zmm81490:62 f20540 b8 5c24 vfmadd231ps0x480(%rsp),%zmm31,%zmm31497:121498:6261 7c4828 dfvmovaps %zmm7,%zmm27    149e:62 f1 7c4828 c5vmovaps %zmm5,%zmm0    14a4:6271 7c4828 dfvmovaps %zmm7,%zmm11    14aa:6271 7c4828 e7vmovaps %zmm7,%zmm12    14b0:62 e1 7c4828 ddvmovaps %zmm5,%zmm19    14b6:6221 7c4828 e6vmovaps %zmm22,%zmm28    14bc:6221 7c4828 eevmovaps %zmm22,%zmm29    14c2:6262 6d40 7f de    vpermt2ps %zmm6,%zmm18,%zmm27    14c8:62 f2 6d40 7f c1    vpermt2ps %zmm1,%zmm18,%zmm0    14ce:62721548 7f de    vpermt2ps %zmm6,%zmm13,%zmm11    14d4:6272 0d48 7f e6    vpermt2ps %zmm6,%zmm14,%zmm12    14da:62 e2 0d48 7f d9    vpermt2ps %zmm1,%zmm14,%zmm19    14e0:62 f20548 7f fe    vpermt2ps %zmm6,%zmm15,%zmm7    14e6:6242 6d40 7f e1    vpermt2ps %zmm9,%zmm18,%zmm28    14ec:6281 7c4828 d1vmovaps %zmm25,%zmm18    14f2:62421548 7f e9    vpermt2ps %zmm9,%zmm13,%zmm29    14f8:62821548 7f d0    vpermt2ps %zmm24,%zmm13,%zmm18    14fe:6293 fd4823 e3 e4 vshuff64x2$0xe4,%zmm27,%zmm0,%zmm41505:62 f1 7c48284424vmovaps0x540(%rsp),%zmm0    150c:15    150d:6253 e54023 e4 e4 vshuff64x2$0xe4,%zmm12,%zmm19,%zmm121514:6281 7c4828 d9vmovaps %zmm25,%zmm19    151a:62020548 7f c8    vpermt2ps %zmm24,%zmm15,%zmm251520:6253 9d4023 c0 e4 vshuff64x2$0xe4,%zmm8,%zmm28,%zmm81527:6261 7c4828 e5vmovaps %zmm5,%zmm28    152d:62 f20548 7f e9    vpermt2ps %zmm1,%zmm15,%zmm51533:6282 0d48 7f d8    vpermt2ps %zmm24,%zmm14,%zmm191539:62621548 7f e1    vpermt2ps %zmm1,%zmm13,%zmm28    153f:6272 7d48 b85424 vfmadd231ps0x400(%rsp),%zmm0,%zmm101546:101547:62 e2 7d48 b8 6c24 vfmadd231ps0x4c0(%rsp),%zmm0,%zmm21    154e:13    154f:62 f3 d54823 cf e4 vshuff64x2$0xe4,%zmm7,%zmm5,%zmm11556:6291 3c48 5c eevsubps %zmm30,%zmm8,%zmm5    155c:6231 1c48 5c c4vsubps %zmm20,%zmm12,%zmm81562:6243 9d4023 db e4 vshuff64x2$0xe4,%zmm11,%zmm28,%zmm271569:6223954023 e2 e4 vshuff64x2$0xe4,%zmm18,%zmm29,%zmm281570:62 e1 7c48285424vmovaps0x140(%rsp),%zmm181577:051578:6271 7c4828 dbvmovaps %zmm3,%zmm11    157e:62 b1 7c4828 devmovaps %zmm22,%zmm31584:62 c20548 7f f1    vpermt2ps %zmm9,%zmm15,%zmm22    158a:62 d2 0d48 7f d9    vpermt2ps %zmm9,%zmm14,%zmm31590:6271 7c48287424vmovaps0x80(%rsp),%zmm141597:021598:6251 7c4828 ebvmovaps %zmm11,%zmm13    159e:6271 7c4828 5c24vmovaps0x100(%rsp),%zmm11    15a5:04    15a6:62727540 b8 6c24 vfmadd231ps0x300(%rsp),%zmm17,%zmm13    15ad:0c    15ae:6272 6d48 b85424 vfmadd231ps0x2c0(%rsp),%zmm2,%zmm10    15b5:0b    15b6:62 e2 6d48 b8 6c24 vfmadd231ps0x1c0(%rsp),%zmm2,%zmm21    15bd:07    15be:62 e2 7d48 b85424 vfmadd231ps0x500(%rsp),%zmm0,%zmm18    15c5:14    15c6:6291 5c48 5c c6vsubps %zmm30,%zmm4,%zmm0    15cc:62 f1 7c48286424vmovaps0x6c0(%rsp),%zmm4    15d3:1b    15d4:62 a3 e54823 db e4 vshuff64x2$0xe4,%zmm19,%zmm3,%zmm19    15db:6293 cd4023 d9 e4 vshuff64x2$0xe4,%zmm25,%zmm22,%zmm3    15e2:6271 7c4828 e0vmovaps %zmm0,%zmm12    15e8:62727540 b87424 vfmadd231ps0x580(%rsp),%zmm17,%zmm14    15ef:16    15f0:62727540 b8 5c24 vfmadd231ps0x240(%rsp),%zmm17,%zmm11    15f7:09    15f8:6232 7d48 a8 e0    vfmadd213ps %zmm16,%zmm0,%zmm12    15fe:62316440 5c ccvsubps %zmm20,%zmm19,%zmm91604:62 e2 6d48 b85424 vfmadd231ps0x180(%rsp),%zmm2,%zmm18    160b:06    160c:62 f12440 5c f4vsubps %zmm4,%zmm27,%zmm61612:62 f1 1c40 5c fcvsubps %zmm4,%zmm28,%zmm71618:62 f1 7c4828 e5vmovaps %zmm5,%zmm4    161e:62 b25548 a8 e0    vfmadd213ps %zmm16,%zmm5,%zmm41624:6272 4d48 b8 e6    vfmadd231ps %zmm6,%zmm6,%zmm12    162a:62 f24548 b8 e7    vfmadd231ps %zmm7,%zmm7,%zmm41630:6252 3d48 b8 e0    vfmadd231ps %zmm8,%zmm8,%zmm121636:62 d23548 b8 e1    vfmadd231ps %zmm9,%zmm9,%zmm4    163c:62 c2 7d48 4e e4    vrsqrt14ps %zmm12,%zmm201642:62 e2 7d48 4e c4    vrsqrt14ps %zmm4,%zmm161648:6231 1c4859 e4vmulps %zmm20,%zmm12,%zmm12    164e:62 a1 5c4859 d8vmulps %zmm16,%zmm4,%zmm191654:6212 5d40 a8 e2    vfmadd213ps %zmm26,%zmm20,%zmm12    165a:62 a1 5c4059 e7vmulps %zmm23,%zmm20,%zmm201660:62 b1 7c4059 e7vmulps %zmm23,%zmm16,%zmm41666:6282 7d40 a8 da    vfmadd213ps %zmm26,%zmm16,%zmm19    166c:6251 5c4059 e4vmulps %zmm12,%zmm20,%zmm121672:62 b1 5c4859 e3vmulps %zmm19,%zmm4,%zmm41678:62 c1 1c4859 c4vmulps %zmm12,%zmm12,%zmm16    167e:62 d1744859 ccvmulps %zmm12,%zmm1,%zmm11684:62 f1644859 d4vmulps %zmm4,%zmm3,%zmm2    168a:62 e1 5c4859 ccvmulps %zmm4,%zmm4,%zmm171690:62 f1 7c4059 c9vmulps %zmm1,%zmm16,%zmm11696:62 d1 7c4828 devmovaps %zmm14,%zmm3    169c:62 b1 7c4828 e2vmovaps %zmm18,%zmm4    16a2:62 f1744059 d2vmulps %zmm2,%zmm17,%zmm2    16a8:62 f27548 b8 d8    vfmadd231ps %zmm0,%zmm1,%zmm3    16ae:6272 6d48 b8 d5    vfmadd231ps %zmm5,%zmm2,%zmm10    16b4:62 f2 6d48 b8 e7    vfmadd231ps %zmm7,%zmm2,%zmm4    16ba:62 c2 6d48 b8 e9    vfmadd231ps %zmm9,%zmm2,%zmm21    16c0:62 f1 2c4858 c3vaddps %zmm3,%zmm10,%zmm0    16c6:62 d1 7c4828 ddvmovaps %zmm13,%zmm3    16cc:62 f3 fd481b c201 vextractf64x4$0x1,%zmm0,%ymm2    16d3:62 f27548 b8 de    vfmadd231ps %zmm6,%zmm1,%zmm3    16d9:62 f1 5c4858 dbvaddps %zmm3,%zmm4,%zmm3    16df:62 d1 7c4828 e3vmovaps %zmm11,%zmm4    16e5:62 f1 7c4858 c2vaddps %zmm2,%zmm0,%zmm0    16eb:62 d27548 b8 e0    vfmadd231ps %zmm8,%zmm1,%zmm4    16f1:c4 e3 7d19 c201vextractf128$0x1,%ymm0,%xmm2    16f7:62 f1544058 ccvaddps %zmm4,%zmm21,%zmm1    16fd:62 f3 fd481b dc01 vextractf64x4$0x1,%zmm3,%ymm41704:c5 f858 c2vaddps %xmm2,%xmm0,%xmm01708:62 f1644858 dcvaddps %zmm4,%zmm3,%zmm3    170e:c4 e3 7d19 dc01vextractf128$0x1,%ymm3,%xmm41714:c4 e37905 d001vpermilpd$0x1,%xmm0,%xmm2    171a:c5 e058 dcvaddps %xmm4,%xmm3,%xmm3    171e:c5 f858 c2vaddps %xmm2,%xmm0,%xmm01722:c5 fa16 d0vmovshdup %xmm0,%xmm21726:c5 fa58 c2vaddss %xmm2,%xmm0,%xmm0    172a:c5 f82984240001vmovaps %xmm0,0x100(%rsp)1731:00001733:c4 e37905 c301vpermilpd$0x1,%xmm3,%xmm01739:c5 e058 c0vaddps %xmm0,%xmm3,%xmm0    173d:62 f3 fd481b cb01 vextractf64x4$0x1,%zmm1,%ymm31744:62 f1744858 cbvaddps %zmm3,%zmm1,%zmm1    174a:c5 fa16 d0vmovshdup %xmm0,%xmm2    174e:c5 fa58 c2vaddss %xmm2,%xmm0,%xmm01752:c5 f8298424 c000vmovaps %xmm0,0xc0(%rsp)1759:0000    175b:c4 e3 7d19 c801vextractf128$0x1,%ymm1,%xmm01761:c5 f058 c0vaddps %xmm0,%xmm1,%xmm01765:c4 e37905 c801vpermilpd$0x1,%xmm0,%xmm1    176b:c5 f858 c1vaddps %xmm1,%xmm0,%xmm0    176f:c5 fa16 c8vmovshdup %xmm0,%xmm11773:c5 fa58 c1vaddss %xmm1,%xmm0,%xmm01777:c5 f82984248000vmovaps %xmm0,0x80(%rsp)    177e:00001780:c5 f877vzeroupper1783:41 ff d4call*%r121786:4181 c700010000add$0x100,%r15d    178d:4139 dfcmp    %ebx,%r15d1790:0f82 5a ea ff ffjb     1f0 <simplified_nbody+0x1f0>1796:eb27jmp    17bf <simplified_nbody+0x17bf>1798:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0    179c:c5 f82984240001vmovaps %xmm0,0x100(%rsp)    17a3:0000    17a5:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0    17a9:c5 f8298424 c000vmovaps %xmm0,0xc0(%rsp)    17b0:0000    17b2:c5 f857 c0vxorps %xmm0,%xmm0,%xmm0    17b6:c5 f82984248000vmovaps %xmm0,0x80(%rsp)    17bd:0000    17bf:48 8b442450mov0x50(%rsp),%rax    17c4:c5 f82894240001vmovaps0x100(%rsp),%xmm2    17cb:0000    17cd:c5 f828 9c24 c000vmovaps0xc0(%rsp),%xmm3    17d4:0000    17d6:c5 f828 a4248000vmovaps0x80(%rsp),%xmm4    17dd:0000    17df:48 8b 4c2458mov0x58(%rsp),%rcx    17e4:c5 fa1000vmovss (%rax),%xmm0    17e8:48 b80000000000 movabs$0x0,%rax    17ef:000000    17f2:c4 c1 7a10 0c06vmovss (%r14,%rax,1),%xmm1    17f8:48 8b442468mov0x68(%rsp),%rax    17fd:c4 e279 a95424 3c vfmadd213ss0x3c(%rsp),%xmm0,%xmm21804:c4 e279 a9 5c2440 vfmadd213ss0x40(%rsp),%xmm0,%xmm3    180b:c4 e279 a9642444 vfmadd213ss0x44(%rsp),%xmm0,%xmm41812:c5 ea59 d1vmulss %xmm1,%xmm2,%xmm21816:c5 e259 d9vmulss %xmm1,%xmm3,%xmm3    181a:c5 da59 c9vmulss %xmm1,%xmm4,%xmm1    181e:c5 fa102408vmovss (%rax,%rcx,1),%xmm41823:c4 e269 b9 e0       vfmadd231ss %xmm0,%xmm2,%xmm41828:c5 fa112408vmovss %xmm4,(%rax,%rcx,1)    182d:c5 fa10640804vmovss0x4(%rax,%rcx,1),%xmm41833:c4 e261 b9 e0       vfmadd231ss %xmm0,%xmm3,%xmm41838:c5 fa11640804vmovss %xmm4,0x4(%rax,%rcx,1)    183e:c4 e271 a9440808 vfmadd213ss0x8(%rax,%rcx,1),%xmm1,%xmm01845:c5 fa11440808vmovss %xmm0,0x8(%rax,%rcx,1)    184b:48 8b 4c2460mov0x60(%rsp),%rcx1850:c5 fa1111vmovss %xmm2,(%rcx)1854:c5 fa115904vmovss %xmm3,0x4(%rcx)1859:c5 fa114908vmovss %xmm1,0x8(%rcx)    185e:48 8d65 d8lea-0x28(%rbp),%rsp1862:5bpop    %rbx1863:41 5cpop    %r121865:41 5dpop    %r131867:41 5epop    %r141869:41 5fpop    %r15    186b:5dpop    %rbp    186c:c3ret
Host-Compute (ARM CPU) Note that the compiler would usually directly output a.bin file (ELF format). The output below comes from disassembling it withobjdump -d. Also note that this has been compiled for thearm-7 target (ARMv8.6 + FP16 + FP16FML, e.g. Apple M2+/A15+).
nbody_aarch64.bin:file formatelf64-littleaarch64Disassembly of section .text:0000000000000000 <simplified_nbody>:0:d104c3ffsubsp,sp, #0x1304:90000008 adrpx8,0 <floor_global_idx>8:6d0a33ed stpd13, d12,[sp, #160]   c:6d0b2beb stpd11, d10,[sp, #176]10:6d0c23e9 stpd9, d8,[sp, #192]14:a90d7bfd stpx29, x30,[sp, #208]18:910343fdaddx29,sp, #0xd0  1c:a90e6ffc stpx28, x27,[sp, #224]20:a90f67fa stpx26, x25,[sp, #240]24:a9105ff8 stpx24, x23,[sp, #256]28:a91157f6 stpx22, x21,[sp, #272]  2c:a9124ff4 stpx20, x19,[sp, #288]30:f9400108 ldrx8,[x8]34:b9400117 ldrw23,[x8]38:52800188movw8, #0xc                   // #12  3c:9b080af6 maddx22, x23, x8, x240:90000008 adrpx8,0 <floor_global_work_size>44:aa1603f8movx24, x2248:f9400108 ldrx8,[x8]  4c:fd4002c8 ldrd8,[x22]50:bc408f09 ldrs9,[x24, #8]!54:b9400119 ldrw25,[x8]58:34000d79 cbzw25,204 <simplified_nbody+0x204>  5c:2f00e403 movid3, #0x060:8b171008addx8, x0, x23,lsl #464:9000001c adrpx28,0 <floor_local_idx>68:a90007e3 stpx3, x1,[sp]  6c:90000013 adrpx19,0 <simplified_nbody>70:90000014 adrpx20,0 <host_compute_device_barrier>74:aa0003f5movx21, x078:2a1f03famovw26, wzr  7c:f940039c ldrx28,[x28]80:3c9a03a3 sturq3,[x29, #-96]84:2d400500 ldps0, s1,[x8]88:bd400902 ldrs2,[x8, #8]  8c:5296e2e8movw8, #0xb717                // #4687190:4f03f603 fmovv3.4s, #1.000000000000000000e+0094:72a71a28 movkw8, #0x38d1,lsl #1698:2a1f03fbmovw27, wzr  9c:3d8017e3strq3,[sp, #80]  a0:4e040403 dupv3.4s, v0.s[0]  a4:4e040d00 dupv0.4s, w8  a8:f9400273 ldrx19,[x19]  ac:ad018fe0 stpq0, q3,[sp, #48]  b0:2f00e400 movid0, #0x0  b4:4e040423 dupv3.4s, v1.s[0]  b8:3d801be0strq0,[sp, #96]  bc:2f00e400 movid0, #0x0  c0:3c9b03a0 sturq0,[x29, #-80]  c4:4e040440 dupv0.4s, v2.s[0]  c8:f9400294 ldrx20,[x20]  cc:ad008fe0 stpq0, q3,[sp, #16]  d0:b9400388 ldrw8,[x28]  d4:0b1b2109addw9, w8, w27,lsl #8  d8:3ce95aa0 ldrq0,[x21, w9, uxtw #4]  dc:3ca87a60strq0,[x19, x8,lsl #4]  e0:d63f0280 blrx20  e4:6f00e400 moviv0.2d, #0x0  e8:3cda03a4 ldurq4,[x29, #-96]  ec:6f00e402 moviv2.2d, #0x0  f0:aa1f03e8movx8, xzr  f4:6f00e403 moviv3.2d, #0x0  f8:ad41abeb ldpq11, q10,[sp, #48]  fc:6e040480movv0.s[0], v4.s[0]100:6f00e401 moviv1.2d, #0x0104:6f00e405 moviv5.2d, #0x0108:ad4293ff ldpq31, q4,[sp, #80] 10c:6e040482movv2.s[0], v4.s[0]110:3cdb03a4 ldurq4,[x29, #-80]114:ad40b3ed ldpq13, q12,[sp, #16]118:6e040483movv3.s[0], v4.s[0] 11c:6f00e404 moviv4.2d, #0x0120:8b080269addx9, x19, x8124:91020108addx8, x8, #0x80128:4eab1d67movv7.16b, v11.16b 12c:f140051fcmpx8, #0x1,lsl #12130:4eab1d7bmovv27.16b, v11.16b134:4cdf0930 ld4{v16.4s-v19.4s},[x9], #64138:4eaad606fsubv6.4s, v16.4s, v10.4s 13c:4eacd638fsubv24.4s, v17.4s, v12.4s140:4eadd659fsubv25.4s, v18.4s, v13.4s144:4e26ccc7 fmlav7.4s, v6.4s, v6.4s148:4e38cf07 fmlav7.4s, v24.4s, v24.4s 14c:4c400934 ld4{v20.4s-v23.4s},[x9]150:4e39cf27 fmlav7.4s, v25.4s, v25.4s154:6ea1f8e7fsqrtv7.4s, v7.4s158:4eaad69afsubv26.4s, v20.4s, v10.4s 15c:4eacd6bcfsubv28.4s, v21.4s, v12.4s160:4eadd6ddfsubv29.4s, v22.4s, v13.4s164:4e3acf5b fmlav27.4s, v26.4s, v26.4s168:6e27ffe7fdivv7.4s, v31.4s, v7.4s 16c:4e3ccf9b fmlav27.4s, v28.4s, v28.4s170:4e3dcfbb fmlav27.4s, v29.4s, v29.4s174:6ea1fb7bfsqrtv27.4s, v27.4s178:6e27dcfefmulv30.4s, v7.4s, v7.4s 17c:6e27de67fmulv7.4s, v19.4s, v7.4s180:6e27dfc7fmulv7.4s, v30.4s, v7.4s184:6e3bfffbfdivv27.4s, v31.4s, v27.4s188:4e26cce3 fmlav3.4s, v7.4s, v6.4s 18c:4e38cce2 fmlav2.4s, v7.4s, v24.4s190:4e39cce0 fmlav0.4s, v7.4s, v25.4s194:6e3bdf70fmulv16.4s, v27.4s, v27.4s198:6e3bdef1fmulv17.4s, v23.4s, v27.4s 19c:6e31de10fmulv16.4s, v16.4s, v17.4s 1a0:4e3ace05 fmlav5.4s, v16.4s, v26.4s 1a4:4e3cce04 fmlav4.4s, v16.4s, v28.4s 1a8:4e3dce01 fmlav1.4s, v16.4s, v29.4s 1ac:54fffba1 b.ne120 <simplified_nbody+0x120>  // b.any 1b0:4e23d4a3faddv3.4s, v5.4s, v3.4s 1b4:4e20d420faddv0.4s, v1.4s, v0.4s 1b8:4e22d482faddv2.4s, v4.4s, v2.4s 1bc:6e20d461faddpv1.4s, v3.4s, v0.4s 1c0:6e20d442faddpv2.4s, v2.4s, v0.4s 1c4:6e20d400faddpv0.4s, v0.4s, v0.4s 1c8:7e30d821faddps1, v1.2s 1cc:7e30d800faddps0, v0.2s 1d0:ad3d07a0 stpq0, q1,[x29, #-96] 1d4:7e30d841faddps1, v2.2s 1d8:3d801be1strq1,[sp, #96] 1dc:d63f0280 blrx20 1e0:1104035aaddw26, w26, #0x100 1e4:1100077baddw27, w27, #0x1 1e8:6b19035fcmpw26, w25 1ec:54fff723 b.ccd0 <simplified_nbody+0xd0>  // b.lo, b.ul, b.last 1f0:ad7d07a2 ldpq2, q1,[x29, #-96] 1f4:3dc01be0 ldrq0,[sp, #96] 1f8:a94007e3 ldpx3, x1,[sp] 1fc:6e0c0401movv1.s[1], v0.s[0]200:14000003 b20c <simplified_nbody+0x20c>204:2f00e401 movid1, #0x0208:2f00e402 movid2, #0x0 20c:5297cee8movw8, #0xbe77                // #48759210:bd400060 ldrs0,[x3]214:72a7efe8 movkw8, #0x3f7f,lsl #16218:8b171029addx9, x1, x23,lsl #4 21c:a9524ff4 ldpx20, x19,[sp, #288]220:0f801028 fmlav8.2s, v1.2s, v0.s[0]224:1f022402 fmadds2, s0, s2, s9228:0e040d01 dupv1.2s, w8 22c:1e270103 fmovs3, w8230:fd400124 ldrd4,[x9]234:a94f67fa ldpx26, x25,[sp, #240]238:1e230842fmuls2, s2, s3 23c:2e21dd01fmulv1.2s, v8.2s, v1.2s240:bd400923 ldrs3,[x9, #8]244:a94e6ffc ldpx28, x27,[sp, #224]248:bd000302strs2,[x24] 24c:0f801024 fmlav4.2s, v1.2s, v0.s[0]250:1f000c40 fmadds0, s2, s0, s3254:fd0002c1strd1,[x22]258:a95157f6 ldpx22, x21,[sp, #272] 25c:a9505ff8 ldpx24, x23,[sp, #256]260:fd000124strd4,[x9]264:a94d7bfd ldpx29, x30,[sp, #208]268:bd000920strs0,[x9, #8] 26c:6d4c23e9 ldpd9, d8,[sp, #192]270:6d4b2beb ldpd11, d10,[sp, #176]274:6d4a33ed ldpd13, d12,[sp, #160]278:9104c3ffaddsp,sp, #0x130 27c:d65f03c0ret
Metal / AIR Note that the compiler would usually directly output a.metallib file. The output below comes from disassembling it withmetallib-dis (provided by thetoolchain).
; ModuleID = 'bc_module'source_filename ="simplified_nbody"targetdatalayout ="e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"targettriple ="air64-apple-macosx14.0.0"%class.vector4 =type {%union.anon }%union.anon =type {%struct.anon }%struct.anon =type {float,float,float,float }%class.vector3 =type {%union.anon.8 }%union.anon.8 =type {%struct.anon.9 }%struct.anon.9 =type {float,float,float }@_ZZ16simplified_nbodyE20local_body_positions =internaladdrspace(3)unnamed_addrglobal [256 x%class.vector4]undef,align16; Function Attrs: nounwinddefinevoid@simplified_nbody(%class.vector4addrspace(1)*noaliasnocapturereadonly%0,%class.vector4addrspace(1)*noaliasnocapture%1,%class.vector3addrspace(1)*noaliasnocapture%2,floataddrspace(2)*noaliasnocapturereadonlyalign4dereferenceable(4)%3, <3 xi32>%4, <3 xi32>%5, <3 xi32>%6, <3 xi32>%7, <3 xi32>%8, <3 xi32>%9,i32%10,i32%11,i32%12,i32%13)local_unnamed_addr #0!reqd_work_group_size!33!kernel_dim!34 {%15 =extractelement <3 xi32>%4,i640%16 =zexti32%15toi64%17 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%16,i320,i320,i320%18 =bitcastfloataddrspace(1)*%17to <3 xfloat>addrspace(1)*%19 =load <3 xfloat>, <3 xfloat>addrspace(1)*%18,align4%20 =extractelement <3 xfloat>%19,i640%21 =getelementptrinbounds%class.vector3,%class.vector3addrspace(1)*%2,i64%16,i320,i320,i320%22 =bitcastfloataddrspace(1)*%21to <3 xfloat>addrspace(1)*%23 =load <3 xfloat>, <3 xfloat>addrspace(1)*%22,align4%24 =extractelement <3 xi32>%5,i640%25 =extractelement <3 xi32>%6,i640%26 =zexti32%25toi64%27 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@_ZZ16simplified_nbodyE20local_body_positions,i640,i64%26,i320,i320,i320%28 =bitcastfloataddrspace(3)*%27to <4 xfloat>addrspace(3)*%29 =shufflevector <3 xfloat>%19, <3 xfloat>undef, <2 xi32> <i321,i322>brlabel%5730:; preds = %68%31 =extractelement <3 xfloat>%23,i640%32 =loadfloat,floataddrspace(2)*%3,align4%33 =fmul fastfloat%32,%100%34 =insertelement <2 xfloat>undef,float%32,i640%35 =shufflevector <2 xfloat>%34, <2 xfloat>undef, <2 xi32>zeroinitializer%36 =fmul fast <2 xfloat>%35,%101%37 =fadd fastfloat%33,%31%38 =shufflevector <3 xfloat>%23, <3 xfloat>undef, <2 xi32> <i321,i322>%39 =fadd fast <2 xfloat>%36,%38%40 =fmul fastfloat%37,0x3FEFF7CEE0000000%41 =fmul fast <2 xfloat>%39, <float0x3FEFF7CEE0000000,float0x3FEFF7CEE0000000>%42 =fmul fastfloat%40,%32%43 =fmul fast <2 xfloat>%41,%35%44 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%1,i64%16,i320,i320,i320%45 =bitcastfloataddrspace(1)*%44to <3 xfloat>addrspace(1)*%46 =load <3 xfloat>, <3 xfloat>addrspace(1)*%45,align4,!tbaa!35%47 =extractelement <3 xfloat>%46,i640%48 =fadd fastfloat%42,%47%49 =shufflevector <3 xfloat>%46, <3 xfloat>undef, <2 xi32> <i321,i322>%50 =fadd fast <2 xfloat>%43,%49%51 =insertelement <3 xfloat>undef,float%48,i640%52 =shufflevector <2 xfloat>%50, <2 xfloat>undef, <3 xi32> <i320,i321,i32undef>%53 =shufflevector <3 xfloat>%51, <3 xfloat>%52, <3 xi32> <i320,i323,i324>store <3 xfloat>%53, <3 xfloat>addrspace(1)*%45,align4,!tbaa!35%54 =insertelement <3 xfloat>undef,float%40,i640%55 =shufflevector <2 xfloat>%41, <2 xfloat>undef, <3 xi32> <i320,i321,i32undef>%56 =shufflevector <3 xfloat>%54, <3 xfloat>%55, <3 xi32> <i320,i323,i324>store <3 xfloat>%56, <3 xfloat>addrspace(1)*%22,align4,!tbaa!35retvoid57:; preds = %68, %14%58 =phii32 [0,%14 ], [%69,%68 ]%59 =phii32 [0,%14 ], [%70,%68 ]%60 =phifloat [0.000000e+00,%14 ], [%100,%68 ]%61 =phi <2 xfloat> [zeroinitializer,%14 ], [%101,%68 ]%62 =shli32%59,8%63 =addi32%25,%62%64 =zexti32%63toi64%65 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%64,i320,i320,i320%66 =bitcastfloataddrspace(1)*%65to <4 xfloat>addrspace(1)*%67 =load <4 xfloat>, <4 xfloat>addrspace(1)*%66,align4store <4 xfloat>%67, <4 xfloat>addrspace(3)*%28,align4,!tbaa!35tailcallvoid@air.wg.barrier(i322,i321) #3brlabel%7268:; preds = %72tailcallvoid@air.wg.barrier(i322,i321) #3%69 =addi32%58,256%70 =addi32%59,1%71 =icmpulti32%69,%24bri1%71,label%57,label%30,!llvm.loop!3872:; preds = %72, %57%73 =phii32 [0,%57 ], [%102,%72 ]%74 =phifloat [%60,%57 ], [%100,%72 ]%75 =phi <2 xfloat> [%61,%57 ], [%101,%72 ]%76 =zexti32%73toi64%77 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@_ZZ16simplified_nbodyE20local_body_positions,i640,i64%76,i320,i320,i320%78 =bitcastfloataddrspace(3)*%77to <4 xfloat>addrspace(3)*%79 =load <4 xfloat>, <4 xfloat>addrspace(3)*%78,align4%80 =extractelement <4 xfloat>%79,i640%81 =extractelement <4 xfloat>%79,i643%82 =fsub fastfloat%80,%20%83 =shufflevector <4 xfloat>%79, <4 xfloat>undef, <2 xi32> <i321,i322>%84 =fsub fast <2 xfloat>%83,%29%85 =fmul fastfloat%82,%82%86 =fmul fast <2 xfloat>%84,%84%87 =extractelement <2 xfloat>%86,i640%88 =extractelement <2 xfloat>%86,i641%89 =fadd fastfloat%85,0x3F1A36E2E0000000%90 =fadd fastfloat%89,%87%91 =fadd fastfloat%90,%88%92 =tailcall fastfloat@air.fast_rsqrt.f32(float%91) #4%93 =fmul fastfloat%92,%92%94 =fmul fastfloat%93,%92%95 =fmul fastfloat%94,%81%96 =fmul fastfloat%95,%82%97 =insertelement <2 xfloat>undef,float%95,i640%98 =shufflevector <2 xfloat>%97, <2 xfloat>undef, <2 xi32>zeroinitializer%99 =fmul fast <2 xfloat>%98,%84%100 =fadd fastfloat%96,%74%101 =fadd fast <2 xfloat>%99,%75%102 =addnuwnswi32%73,1%103 =icmpeqi32%102,256bri1%103,label%68,label%72,!llvm.loop!40}; Function Attrs: nounwind readnonedeclarefloat@air.fast_rsqrt.f32(float)local_unnamed_addr #1; Function Attrs: convergent noduplicatedeclarevoid@air.wg.barrier(i32,i32)local_unnamed_addr #2attributes #0 = {nounwind"approx-func-fp-math"="true""frame-pointer"="all""less-precise-fpmad"="true""no-infs-fp-math"="true""no-nans-fp-math"="true""no-signed-zeros-fp-math"="true""no-trapping-math"="true""stack-protector-buffer-size"="8""uniform-work-group-size"="true""unsafe-fp-math"="true" }attributes #1 = {nounwindreadnone"approx-func-fp-math"="true""frame-pointer"="all""less-precise-fpmad"="true""no-infs-fp-math"="true""no-nans-fp-math"="true""no-signed-zeros-fp-math"="true""no-trapping-math"="true""stack-protector-buffer-size"="8""unsafe-fp-math"="true" }attributes #2 = {convergentnoduplicate"approx-func-fp-math"="true""frame-pointer"="all""less-precise-fpmad"="true""no-infs-fp-math"="true""no-nans-fp-math"="true""no-signed-zeros-fp-math"="true""no-trapping-math"="true""stack-protector-buffer-size"="8""unsafe-fp-math"="true" }attributes #3 = {convergentnoduplicatenounwind }attributes #4 = {nounwindreadnone }!air.kernel = !{!0}!air.version = !{!18}!air.language_version = !{!19}!air.compile_options = !{!20,!21,!22}!llvm.module.flags = !{!23,!24,!25,!26,!27,!28,!29,!30,!31}!llvm.ident = !{!32}!0 = !{void (%class.vector4addrspace(1)*,%class.vector4addrspace(1)*,%class.vector3addrspace(1)*,floataddrspace(2)*, <3 xi32>, <3 xi32>, <3 xi32>, <3 xi32>, <3 xi32>, <3 xi32>,i32,i32,i32,i32)*@simplified_nbody,!1,!2,!17}!1 = !{}!2 = !{!3,!4,!5,!6,!7,!8,!9,!10,!11,!12,!13,!14,!15,!16}!3 = !{i320, !"air.buffer", !"air.location_index",i320,i321, !"air.read", !"air.address_space",i321, !"air.arg_type_size",i3216, !"air.arg_type_align_size",i3216, !"air.arg_type_name", !"float4", !"air.arg_name", !"in_positions"}!4 = !{i321, !"air.buffer", !"air.location_index",i321,i321, !"air.read_write", !"air.address_space",i321, !"air.arg_type_size",i3216, !"air.arg_type_align_size",i3216, !"air.arg_type_name", !"float4", !"air.arg_name", !"out_positions"}!5 = !{i322, !"air.buffer", !"air.location_index",i322,i321, !"air.read_write", !"air.address_space",i321, !"air.arg_type_size",i3212, !"air.arg_type_align_size",i3212, !"air.arg_type_name", !"float3", !"air.arg_name", !"inout_velocities"}!6 = !{i323, !"air.buffer", !"air.buffer_size",i324, !"air.location_index",i323,i321, !"air.read", !"air.address_space",i322, !"air.arg_type_size",i324, !"air.arg_type_align_size",i324, !"air.arg_type_name", !"float", !"air.arg_name", !"time_delta"}!7 = !{i324, !"air.thread_position_in_grid", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__global_id__"}!8 = !{i325, !"air.threads_per_grid", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__global_size__"}!9 = !{i326, !"air.thread_position_in_threadgroup", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__local_id__"}!10 = !{i327, !"air.threads_per_threadgroup", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__local_size__"}!11 = !{i328, !"air.threadgroup_position_in_grid", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__group_id__"}!12 = !{i329, !"air.threadgroups_per_grid", !"air.arg_type_name", !"uint3", !"air.arg_name", !"__metal__group_size__"}!13 = !{i3210, !"air.simdgroup_index_in_threadgroup", !"air.arg_type_name", !"uint", !"air.arg_name", !"__metal__sub_group_id__"}!14 = !{i3211, !"air.thread_index_in_simdgroup", !"air.arg_type_name", !"uint", !"air.arg_name", !"__metal__sub_group_local_id__"}!15 = !{i3212, !"air.threads_per_simdgroup", !"air.arg_type_name", !"uint", !"air.arg_name", !"__metal__sub_group_size__"}!16 = !{i3213, !"air.simdgroups_per_threadgroup", !"air.arg_type_name", !"uint", !"air.arg_name", !"__metal__num_sub_groups__"}!17 = !{!"air.max_work_group_size",i32256}!18 = !{i322,i326,i320}!19 = !{!"Metal",i323,i321,i320}!20 = !{!"air.compile.denorms_disable"}!21 = !{!"air.compile.fast_math_enable"}!22 = !{!"air.compile.framebuffer_fetch_enable"}!23 = !{i327, !"air.max_device_buffers",i3231}!24 = !{i327, !"air.max_constant_buffers",i3231}!25 = !{i327, !"air.max_threadgroup_buffers",i3231}!26 = !{i327, !"air.max_textures",i32128}!27 = !{i327, !"air.max_read_write_textures",i328}!28 = !{i327, !"air.max_samplers",i3216}!29 = !{i321, !"wchar_size",i324}!30 = !{i327, !"frame-pointer",i322}!31 = !{i322, !"SDK Version", [2 xi32] [i3214,i320]}!32 = !{!"Apple metal version 32023.155 (metalfe-32023.155)"}!33 = !{i32256,i321,i321}!34 = !{i321}!35 = !{!36,!36,i640}!36 = !{!"omnipotent char",!37,i640}!37 = !{!"Simple C++ TBAA"}!38 = distinct !{!38,!39}!39 = !{!"llvm.loop.mustprogress"}!40 = distinct !{!40,!39}
OpenCL / SPIR Note that the compiler would usually directly output a.bc file. The output below comes from disassembling it withllvm-dis (provided by thetoolchain). Also note that the bitcode file is exported in a LLVM 3.2 / SPIR 1.2 compatible format, but the output below uses LLVM 14.0 syntax.
; ModuleID = 'spir.bc'source_filename ="spir.bc"targetdatalayout ="e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024"targettriple ="spir64-unknown-unknown"%class.vector4 =type {%union.anon }%union.anon =type {%struct.anon }%struct.anon =type {float,float,float,float }%class.vector3 =type {%union.anon.8 }%union.anon.8 =type {%struct.anon.9 }%struct.anon.9 =type {float,float,float }@simplified_nbody.local_body_positions =internalunnamed_addraddrspace(3)global [256 x%class.vector4]undef,align4define floor_kernelvoid@simplified_nbody(%class.vector4addrspace(1)*%0,%class.vector4addrspace(1)*%1,%class.vector3addrspace(1)*%2,float%3) {%5 =tailcall floor_funci64@_Z13get_global_idj(i320),!range!14%6 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%5,i320,i320,i320%7 =loadfloat,floataddrspace(1)*%6,align4%8 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%5,i320,i320,i321%9 =loadfloat,floataddrspace(1)*%8,align4%10 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%5,i320,i320,i322%11 =loadfloat,floataddrspace(1)*%10,align4%12 =getelementptrinbounds%class.vector3,%class.vector3addrspace(1)*%2,i64%5,i320,i320,i320%13 =loadfloat,floataddrspace(1)*%12,align4%14 =getelementptrinbounds%class.vector3,%class.vector3addrspace(1)*%2,i64%5,i320,i320,i321%15 =loadfloat,floataddrspace(1)*%14,align4%16 =getelementptrinbounds%class.vector3,%class.vector3addrspace(1)*%2,i64%5,i320,i320,i322%17 =loadfloat,floataddrspace(1)*%16,align4%18 =tailcall floor_funci64@_Z15get_global_sizej(i320),!range!15%19 =trunci64%18toi32,!range!16%20 =tailcall floor_funci64@_Z12get_local_idj(i320),!range!17%21 =trunci64%20toi32,!range!18%22 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%20,i320,i320,i320%23 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%20,i320,i320,i321%24 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%20,i320,i320,i322%25 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%20,i320,i320,i323brlabel%4826:; preds = %65%27 =fmulfloat%98,%3%28 =fmulfloat%99,%3%29 =fmulfloat%100,%3%30 =faddfloat%27,%13%31 =faddfloat%28,%15%32 =faddfloat%29,%17%33 =fmulfloat%30,0x3FEFF7CEE0000000%34 =fmulfloat%31,0x3FEFF7CEE0000000%35 =fmulfloat%32,0x3FEFF7CEE0000000%36 =fmulfloat%33,%3%37 =fmulfloat%34,%3%38 =fmulfloat%35,%3%39 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%1,i64%5,i320,i320,i320%40 =loadfloat,floataddrspace(1)*%39,align4,!tbaa!19%41 =faddfloat%40,%36storefloat%41,floataddrspace(1)*%39,align4,!tbaa!19%42 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%1,i64%5,i320,i320,i321%43 =loadfloat,floataddrspace(1)*%42,align4,!tbaa!19%44 =faddfloat%43,%37storefloat%44,floataddrspace(1)*%42,align4,!tbaa!19%45 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%1,i64%5,i320,i320,i322%46 =loadfloat,floataddrspace(1)*%45,align4,!tbaa!19%47 =faddfloat%46,%38storefloat%47,floataddrspace(1)*%45,align4,!tbaa!19storefloat%33,floataddrspace(1)*%12,align4,!tbaa!19storefloat%34,floataddrspace(1)*%14,align4,!tbaa!19storefloat%35,floataddrspace(1)*%16,align4,!tbaa!19retvoid48:; preds = %65, %4%49 =phii32 [0,%4 ], [%66,%65 ]%50 =phii32 [0,%4 ], [%67,%65 ]%51 =phifloat [0.000000e+00,%4 ], [%100,%65 ]%52 =phifloat [0.000000e+00,%4 ], [%99,%65 ]%53 =phifloat [0.000000e+00,%4 ], [%98,%65 ]%54 =shli32%50,8%55 =addi32%54,%21%56 =zexti32%55toi64%57 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%56,i320,i320,i320%58 =loadfloat,floataddrspace(1)*%57,align4%59 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%56,i320,i320,i321%60 =loadfloat,floataddrspace(1)*%59,align4%61 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%56,i320,i320,i322%62 =loadfloat,floataddrspace(1)*%61,align4%63 =getelementptrinbounds%class.vector4,%class.vector4addrspace(1)*%0,i64%56,i320,i320,i323%64 =loadfloat,floataddrspace(1)*%63,align4storefloat%58,floataddrspace(3)*%22,align4,!tbaa!19storefloat%60,floataddrspace(3)*%23,align4,!tbaa!19storefloat%62,floataddrspace(3)*%24,align4,!tbaa!19storefloat%64,floataddrspace(3)*%25,align4,!tbaa!19tailcall floor_funcvoid@_Z7barrierj(i321)brlabel%6965:; preds = %69tailcall floor_funcvoid@_Z7barrierj(i321)%66 =addi32%49,256%67 =addi32%50,1%68 =icmpulti32%66,%19bri1%68,label%48,label%26,!llvm.loop!2269:; preds = %69, %48%70 =phii64 [0,%48 ], [%101,%69 ]%71 =phifloat [%51,%48 ], [%100,%69 ]%72 =phifloat [%52,%48 ], [%99,%69 ]%73 =phifloat [%53,%48 ], [%98,%69 ]%74 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%70,i320,i320,i320%75 =loadfloat,floataddrspace(3)*%74,align4%76 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%70,i320,i320,i321%77 =loadfloat,floataddrspace(3)*%76,align4%78 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%70,i320,i320,i322%79 =loadfloat,floataddrspace(3)*%78,align4%80 =fsubfloat%75,%7%81 =fsubfloat%77,%9%82 =fsubfloat%79,%11%83 =fmulfloat%80,%80%84 =fmulfloat%81,%81%85 =fmulfloat%82,%82%86 =faddfloat%83,0x3F1A36E2E0000000%87 =faddfloat%86,%84%88 =faddfloat%87,%85%89 =tailcall floor_funcfloat@_Z5rsqrtf(float%88)%90 =getelementptrinbounds [256 x%class.vector4], [256 x%class.vector4]addrspace(3)*@simplified_nbody.local_body_positions,i640,i64%70,i320,i320,i323%91 =loadfloat,floataddrspace(3)*%90,align4,!tbaa!19%92 =fmulfloat%89,%89%93 =fmulfloat%92,%89%94 =fmulfloat%93,%91%95 =fmulfloat%94,%80%96 =fmulfloat%94,%81%97 =fmulfloat%94,%82%98 =faddfloat%95,%73%99 =faddfloat%96,%72%100 =faddfloat%97,%71%101 =addnuwnswi64%70,1%102 =icmpeqi64%101,256bri1%102,label%65,label%69,!llvm.loop!24}declare floor_funci64@_Z13get_global_idj(i32)declare floor_funci64@_Z15get_global_sizej(i32)declare floor_funci64@_Z12get_local_idj(i32)declare floor_funcfloat@_Z5rsqrtf(float)declare floor_funcvoid@_Z7barrierj(i32)!opencl.kernels = !{!0}!llvm.linker.options = !{}!llvm.module.flags = !{!7,!8}!opencl.ocl.version = !{!9}!opencl.spir.version = !{!9}!opencl.enable.FP_CONTRACT = !{}!opencl.used.extensions = !{!10}!opencl.used.optional.core.features = !{!11}!opencl.compiler.options = !{!12}!llvm.ident = !{!13}!0 = !{void (%class.vector4addrspace(1)*,%class.vector4addrspace(1)*,%class.vector3addrspace(1)*,float)*@simplified_nbody,!1,!2,!3,!4,!5,!6}!1 = !{!"kernel_arg_addr_space",i321,i321,i321,i320}!2 = !{!"kernel_arg_access_qual", !"none", !"none", !"none", !"none"}!3 = !{!"kernel_arg_type", !"compute_global_buffer<const float4>", !"compute_global_buffer<float4>", !"compute_global_buffer<float3>", !"param<float>"}!4 = !{!"kernel_arg_base_type", !"struct __class vector4<float>*", !"struct __class vector4<float>*", !"struct __class vector3<float>*", !"float"}!5 = !{!"kernel_arg_type_qual", !"restrict const", !"restrict", !"restrict", !"const"}!6 = !{!"kernel_arg_name", !"in_positions", !"out_positions", !"inout_velocities", !"time_delta"}!7 = !{i321, !"wchar_size",i324}!8 = !{i327, !"frame-pointer",i322}!9 = !{i321,i322}!10 = !{!"cl_khr_byte_addressable_store", !"cl_khr_global_int32_base_atomics", !"cl_khr_global_int32_extended_atomics", !"cl_khr_local_int32_base_atomics", !"cl_khr_local_int32_extended_atomics", !"cl_khr_fp64", !"cl_khr_fp16", !"cl_khr_gl_msaa_sharing"}!11 = !{!"cl_doubles"}!12 = !{!"-cl-kernel-arg-info", !"-cl-mad-enable", !"-cl-denorms-are-zero", !"-cl-unsafe-math-optimizations"}!13 = !{!"clang version 14.0.6 (https://github.com/a2flo/floor_llvm.git 85a83a4073c340ac03ca1c8fcd131db30339db24)"}!14 = !{i640,i644294967295}!15 = !{i641,i644294967295}!16 = !{i321,i32 -1}!17 = !{i640,i642048}!18 = !{i320,i322048}!19 = !{!20,!20,i640}!20 = !{!"omnipotent char",!21,i640}!21 = !{!"Simple C++ TBAA"}!22 = distinct !{!22,!23}!23 = !{!"llvm.loop.mustprogress"}!24 = distinct !{!24,!23}
OpenCL / SPIR-V Note that the compiler would usually directly output a.spv file. The output below comes from disassembling it withspirv-dis (provided by thetoolchain). Also note that the output below has been generated with extended readability (--debug-asm).
; SPIR-V; Version: 1.0; Generator: Khronos LLVM/SPIR-V Translator; 14; Bound: 153; Schema: 0                                         Capability Addresses                                         Capability Linkage                                         Capability Kernel                                         Capability Int64%1 = ExtInstImport"OpenCL.std"                                         MemoryModel Physical64 OpenCL                                         EntryPoint Kernel%simplified_nbody"simplified_nbody"%__spirv_BuiltInGlobalInvocationId%__spirv_BuiltInGlobalSize%__spirv_BuiltInLocalInvocationId                                         ExecutionMode%simplified_nbody LocalSize25611                                         SourceExtension"cl_khr_byte_addressable_store"                                         SourceExtension"cl_khr_fp16"                                         SourceExtension"cl_khr_fp64"                                         SourceExtension"cl_khr_gl_msaa_sharing"                                         SourceExtension"cl_khr_global_int32_base_atomics"                                         SourceExtension"cl_khr_global_int32_extended_atomics"                                         SourceExtension"cl_khr_local_int32_base_atomics"                                         SourceExtension"cl_khr_local_int32_extended_atomics"                                         Source OpenCL_C102000                                         Decorate%simplified_nbody.local_body_positions Alignment4                                         Decorate%19 FuncParamAttr NoAlias                                         Decorate%19 FuncParamAttr NoCapture                                         Decorate%19 FuncParamAttr NoWrite                                         Decorate%20 FuncParamAttr NoAlias                                         Decorate%20 FuncParamAttr NoCapture                                         Decorate%21 FuncParamAttr NoAlias                                         Decorate%21 FuncParamAttr NoCapture                                         Decorate%__spirv_BuiltInGlobalInvocationId LinkageAttributes"__spirv_BuiltInGlobalInvocationId" Import                                         Decorate%__spirv_BuiltInGlobalInvocationId Constant                                         Decorate%__spirv_BuiltInGlobalInvocationId BuiltIn GlobalInvocationId                                         Decorate%__spirv_BuiltInGlobalSize LinkageAttributes"__spirv_BuiltInGlobalSize" Import                                         Decorate%__spirv_BuiltInGlobalSize Constant                                         Decorate%__spirv_BuiltInGlobalSize BuiltIn GlobalSize                                         Decorate%__spirv_BuiltInLocalInvocationId LinkageAttributes"__spirv_BuiltInLocalInvocationId" Import                                         Decorate%__spirv_BuiltInLocalInvocationId Constant                                         Decorate%__spirv_BuiltInLocalInvocationId BuiltIn LocalInvocationId                                         Decorate%70 FPFastMathMode Fast                                         Decorate%72 FPFastMathMode Fast                                         Decorate%74 FPFastMathMode Fast                                         Decorate%101 FPFastMathMode Fast                                         Decorate%102 FPFastMathMode Fast                                         Decorate%103 FPFastMathMode Fast                                         Decorate%104 FPFastMathMode Fast                                         Decorate%105 FPFastMathMode Fast                                         Decorate%106 FPFastMathMode Fast                                         Decorate%108 FPFastMathMode Fast                                         Decorate%109 FPFastMathMode Fast                                         Decorate%110 FPFastMathMode Fast                                         Decorate%114 FPFastMathMode Fast                                         Decorate%115 FPFastMathMode Fast                                         Decorate%116 FPFastMathMode Fast                                         Decorate%117 FPFastMathMode Fast                                         Decorate%118 FPFastMathMode Fast                                         Decorate%119 FPFastMathMode Fast                                         Decorate%131 FPFastMathMode Fast                                         Decorate%132 FPFastMathMode Fast                                         Decorate%133 FPFastMathMode Fast                                         Decorate%134 FPFastMathMode Fast                                         Decorate%135 FPFastMathMode Fast                                         Decorate%136 FPFastMathMode Fast                                         Decorate%138 FPFastMathMode Fast                                         Decorate%139 FPFastMathMode Fast                                         Decorate%140 FPFastMathMode Fast                                         Decorate%141 FPFastMathMode Fast                                         Decorate%142 FPFastMathMode Fast                                         Decorate%143 FPFastMathMode Fast                                         Decorate%146 FPFastMathMode Fast                                         Decorate%149 FPFastMathMode Fast                                         Decorate%152 FPFastMathMode Fast%ulong = TypeInt640%uint = TypeInt320%256ul = Constant%ulong256%0u = Constant%uint0%1u = Constant%uint1%2u = Constant%uint2%0ul = Constant%ulong0%3u = Constant%uint3%8u = Constant%uint8%272u = Constant%uint272%0ul_0 = Constant%ulong0%1ul = Constant%ulong1%256u = Constant%uint256%float = TypeFloat32%struct.anon = TypeStruct%float%float%float%float%union.anon = TypeStruct%struct.anon%class.vector4 = TypeStruct%union.anon%class.vector4[256ul] = TypeArray%class.vector4%256ul     %(Workgroup)class.vector4[256ul]* = TypePointer Workgroup%class.vector4[256ul]%void = TypeVoid       %(CrossWorkgroup)class.vector4* = TypePointer CrossWorkgroup%class.vector4%struct.anon.9 = TypeStruct%float%float%float%union.anon.8 = TypeStruct%struct.anon.9%class.vector3 = TypeStruct%union.anon.8       %(CrossWorkgroup)class.vector3* = TypePointer CrossWorkgroup%class.vector3%void(#4) = TypeFunction%void %(CrossWorkgroup)class.vector4* %(CrossWorkgroup)class.vector4* %(CrossWorkgroup)class.vector3*%float                            %<3xulong> = TypeVector%ulong3                    %(Input)<3xulong>* = TypePointer Input %<3xulong>               %(CrossWorkgroup)float* = TypePointer CrossWorkgroup%float                    %(Workgroup)float* = TypePointer Workgroup%float%bool = TypeBool%simplified_nbody.local_body_positions = Variable %(Workgroup)class.vector4[256ul]* Workgroup%__spirv_BuiltInGlobalInvocationId = Variable %(Input)<3xulong>* Input%__spirv_BuiltInGlobalSize = Variable %(Input)<3xulong>* Input%__spirv_BuiltInLocalInvocationId = Variable %(Input)<3xulong>* Input%0.0f = Constant%float0%9.99999975e-05f = Constant%float9.99999975e-05%0.999000013f = Constant%float0.999000013functionvoid simplified_nbody (%void(#4) ) {%19 = FunctionParameter %(CrossWorkgroup)class.vector4*%20 = FunctionParameter %(CrossWorkgroup)class.vector4*%21 = FunctionParameter %(CrossWorkgroup)class.vector3*%22 = FunctionParameter%float23:%31 = Load %<3xulong>%__spirv_BuiltInGlobalInvocationId Aligned32%32 = CompositeExtract%ulong%310%36 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%32%0u%0u%0u%37 = Load%float%36 Aligned4%39 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%32%0u%0u%1u%40 = Load%float%39 Aligned4%42 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%32%0u%0u%2u%43 = Load%float%42 Aligned4%44 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%21%32%0u%0u%0u%45 = Load%float%44 Aligned4%46 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%21%32%0u%0u%1u%47 = Load%float%46 Aligned4%48 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%21%32%0u%0u%2u%49 = Load%float%48 Aligned4%51 = Load %<3xulong>%__spirv_BuiltInGlobalSize Aligned32%52 = CompositeExtract%ulong%510%53 = UConvert%uint%52%55 = Load %<3xulong>%__spirv_BuiltInLocalInvocationId Aligned32%56 = CompositeExtract%ulong%550%57 = UConvert%uint%56%60 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%56%0u%0u%0u%61 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%56%0u%0u%1u%62 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%56%0u%0u%2u%64 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%56%0u%0u%3u                                         Branch%2424:%66 = Phi%uint (%65 <-%26,%0u <-%23 )%68 = Phi%uint (%67 <-%26,%0u <-%23 )%71 = Phi%float (%0.0f <-%23,%70 <-%26 )%73 = Phi%float (%0.0f <-%23,%72 <-%26 )%75 = Phi%float (%0.0f <-%23,%74 <-%26 )%77 = ShiftLeftLogical%uint%68%8u%78 = IAdd%uint%77%57%79 = UConvert%ulong%78%80 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%79%0u%0u%0u%81 = Load%float%80 Aligned4%82 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%79%0u%0u%1u%83 = Load%float%82 Aligned4%84 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%79%0u%0u%2u%85 = Load%float%84 Aligned4%86 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%19%79%0u%0u%3u%87 = Load%float%86 Aligned4                                         Store%60%81 Aligned4                                         Store%61%83 Aligned4                                         Store%62%85 Aligned4                                         Store%64%87 Aligned4                                         ControlBarrier%2u%2u%272u                                         Branch%2525:%91 = Phi%ulong (%89 <-%25,%0ul_0 <-%24 )%92 = Phi%float (%71 <-%24,%70 <-%25 )%93 = Phi%float (%73 <-%24,%72 <-%25 )%94 = Phi%float (%75 <-%24,%74 <-%25 )%95 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%91%0u%0u%0u%96 = Load%float%95 Aligned4%97 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%91%0u%0u%1u%98 = Load%float%97 Aligned4%99 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%91%0u%0u%2u%100 = Load%float%99 Aligned4%101 = FSub%float%96%37%102 = FSub%float%98%40%103 = FSub%float%100%43%104 = FMul%float%101%101%105 = FMul%float%102%102%106 = FMul%float%103%103%108 = FAdd%float%104%9.99999975e-05f%109 = FAdd%float%108%105%110 = FAdd%float%109%106%111 = ExtInst%float%1 rsqrt%110%112 = InBoundsPtrAccessChain %(Workgroup)float*%simplified_nbody.local_body_positions%0ul%91%0u%0u%3u%113 = Load%float%112 Aligned4%114 = FMul%float%111%111%115 = FMul%float%114%111%116 = FMul%float%115%113%117 = FMul%float%116%101%118 = FMul%float%116%102%119 = FMul%float%116%103%74 = FAdd%float%117%94%72 = FAdd%float%118%93%70 = FAdd%float%119%92%89 = IAdd%ulong%91%1ul%126 = IEqual%bool%89%256ul                                         BranchConditional%126%26%2526:                                         ControlBarrier%2u%2u%272u%65 = IAdd%uint%66%256u%67 = IAdd%uint%68%1u%130 = ULessThan%bool%65%53                                         BranchConditional%130%24%2727:%131 = FMul%float%74%22%132 = FMul%float%72%22%133 = FMul%float%70%22%134 = FAdd%float%131%45%135 = FAdd%float%132%47%136 = FAdd%float%133%49%138 = FMul%float%134%0.999000013f%139 = FMul%float%135%0.999000013f%140 = FMul%float%136%0.999000013f%141 = FMul%float%138%22%142 = FMul%float%139%22%143 = FMul%float%140%22%144 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%20%32%0u%0u%0u%145 = Load%float%144 Aligned4%146 = FAdd%float%145%141                                         Store%144%146 Aligned4%147 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%20%32%0u%0u%1u%148 = Load%float%147 Aligned4%149 = FAdd%float%148%142                                         Store%147%149 Aligned4%150 = InBoundsPtrAccessChain %(CrossWorkgroup)float*%20%32%0u%0u%2u%151 = Load%float%150 Aligned4%152 = FAdd%float%151%143                                         Store%150%152 Aligned4                                         Store%44%138 Aligned4                                         Store%46%139 Aligned4                                         Store%48%140 Aligned4                                         Return}
Vulkan / SPIR-V Note that the compiler would usually directly output a.spvc file (asimple container format for multiple SPIR-V binaries). The output below comes from disassembling it withspirv-dis (provided by thetoolchain). Also note that the output below has been generated with extended readability (--debug-asm).
; SPIR-V; Version: 1.6; Generator: Khronos LLVM/SPIR-V Translator; 14; Bound: 210; Schema: 0                                                Capability Matrix                                                Capability Shader                                                Capability Int64                                                Capability GroupNonUniform                                                Capability VariablePointersStorageBuffer                                                Capability VariablePointers                                                Capability ShaderNonUniform                                                Capability UniformBufferArrayNonUniformIndexing                                                Capability SampledImageArrayNonUniformIndexing                                                Capability StorageBufferArrayNonUniformIndexing                                                Capability StorageImageArrayNonUniformIndexing                                                Capability VulkanMemoryModel                                                Capability VulkanMemoryModelDeviceScope                                                Capability PhysicalStorageBufferAddresses%1 = ExtInstImport"GLSL.std.450"                                                MemoryModel PhysicalStorageBuffer64 Vulkan                                                EntryPoint GLCompute %simplified_nbody "simplified_nbody" %simplified_nbody.vulkan_uniform. %simplified_nbody.vulkan_uniform..1 %simplified_nbody.vulkan_uniform..2 %simplified_nbody.vulkan_uniform..3 %simplified_nbody.vulkan_builtin_input. %simplified_nbody.vulkan_builtin_input..4 %simplified_nbody.vulkan_builtin_input..5 %simplified_nbody.vulkan_builtin_input..6 %simplified_nbody.vulkan_builtin_input..7 %simplified_nbody.vulkan_builtin_input..8 %vulkan.immutable_sampler_0 %vulkan.immutable_sampler_1 %vulkan.immutable_sampler_2 %vulkan.immutable_sampler_3 %vulkan.immutable_sampler_4 %vulkan.immutable_sampler_5 %vulkan.immutable_sampler_6 %vulkan.immutable_sampler_7 %vulkan.immutable_sampler_8 %vulkan.immutable_sampler_9 %vulkan.immutable_sampler_10 %vulkan.immutable_sampler_11 %vulkan.immutable_sampler_12 %vulkan.immutable_sampler_13 %vulkan.immutable_sampler_14 %vulkan.immutable_sampler_15 %vulkan.immutable_sampler_16 %vulkan.immutable_sampler_17 %vulkan.immutable_sampler_18 %vulkan.immutable_sampler_19 %vulkan.immutable_sampler_20 %vulkan.immutable_sampler_21 %vulkan.immutable_sampler_22 %vulkan.immutable_sampler_23 %vulkan.immutable_sampler_24 %vulkan.immutable_sampler_25 %vulkan.immutable_sampler_26 %vulkan.immutable_sampler_27 %vulkan.immutable_sampler_28 %vulkan.immutable_sampler_29 %vulkan.immutable_sampler_30 %vulkan.immutable_sampler_31 %vulkan.immutable_sampler_32 %vulkan.immutable_sampler_33 %vulkan.immutable_sampler_34 %vulkan.immutable_sampler_35 %vulkan.immutable_sampler_36 %vulkan.immutable_sampler_37 %vulkan.immutable_sampler_38 %vulkan.immutable_sampler_39 %vulkan.immutable_sampler_40 %vulkan.immutable_sampler_41 %vulkan.immutable_sampler_42 %vulkan.immutable_sampler_43 %vulkan.immutable_sampler_44 %vulkan.immutable_sampler_45 %vulkan.immutable_sampler_46 %vulkan.immutable_sampler_47 %_ZZ16simplified_nbodyE20local_body_positions                                                ExecutionMode%simplified_nbody LocalSize25611                                                SourceExtension"vk_capability_int16"                                                SourceExtension"vk_capability_int64"                                                SourceExtension"vk_capability_multiview"                                                Source GLSL450                                                Decorate%vulkan.immutable_sampler_0 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_0 Binding0                                                Decorate%vulkan.immutable_sampler_1 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_1 Binding1                                                Decorate%vulkan.immutable_sampler_2 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_2 Binding2                                                Decorate%vulkan.immutable_sampler_3 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_3 Binding3                                                Decorate%vulkan.immutable_sampler_4 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_4 Binding4                                                Decorate%vulkan.immutable_sampler_5 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_5 Binding5                                                Decorate%vulkan.immutable_sampler_6 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_6 Binding6                                                Decorate%vulkan.immutable_sampler_7 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_7 Binding7                                                Decorate%vulkan.immutable_sampler_8 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_8 Binding8                                                Decorate%vulkan.immutable_sampler_9 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_9 Binding9                                                Decorate%vulkan.immutable_sampler_10 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_10 Binding10                                                Decorate%vulkan.immutable_sampler_11 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_11 Binding11                                                Decorate%vulkan.immutable_sampler_12 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_12 Binding12                                                Decorate%vulkan.immutable_sampler_13 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_13 Binding13                                                Decorate%vulkan.immutable_sampler_14 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_14 Binding14                                                Decorate%vulkan.immutable_sampler_15 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_15 Binding15                                                Decorate%vulkan.immutable_sampler_16 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_16 Binding16                                                Decorate%vulkan.immutable_sampler_17 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_17 Binding17                                                Decorate%vulkan.immutable_sampler_18 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_18 Binding18                                                Decorate%vulkan.immutable_sampler_19 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_19 Binding19                                                Decorate%vulkan.immutable_sampler_20 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_20 Binding20                                                Decorate%vulkan.immutable_sampler_21 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_21 Binding21                                                Decorate%vulkan.immutable_sampler_22 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_22 Binding22                                                Decorate%vulkan.immutable_sampler_23 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_23 Binding23                                                Decorate%vulkan.immutable_sampler_24 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_24 Binding24                                                Decorate%vulkan.immutable_sampler_25 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_25 Binding25                                                Decorate%vulkan.immutable_sampler_26 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_26 Binding26                                                Decorate%vulkan.immutable_sampler_27 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_27 Binding27                                                Decorate%vulkan.immutable_sampler_28 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_28 Binding28                                                Decorate%vulkan.immutable_sampler_29 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_29 Binding29                                                Decorate%vulkan.immutable_sampler_30 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_30 Binding30                                                Decorate%vulkan.immutable_sampler_31 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_31 Binding31                                                Decorate%vulkan.immutable_sampler_32 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_32 Binding32                                                Decorate%vulkan.immutable_sampler_33 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_33 Binding33                                                Decorate%vulkan.immutable_sampler_34 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_34 Binding34                                                Decorate%vulkan.immutable_sampler_35 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_35 Binding35                                                Decorate%vulkan.immutable_sampler_36 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_36 Binding36                                                Decorate%vulkan.immutable_sampler_37 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_37 Binding37                                                Decorate%vulkan.immutable_sampler_38 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_38 Binding38                                                Decorate%vulkan.immutable_sampler_39 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_39 Binding39                                                Decorate%vulkan.immutable_sampler_40 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_40 Binding40                                                Decorate%vulkan.immutable_sampler_41 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_41 Binding41                                                Decorate%vulkan.immutable_sampler_42 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_42 Binding42                                                Decorate%vulkan.immutable_sampler_43 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_43 Binding43                                                Decorate%vulkan.immutable_sampler_44 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_44 Binding44                                                Decorate%vulkan.immutable_sampler_45 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_45 Binding45                                                Decorate%vulkan.immutable_sampler_46 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_46 Binding46                                                Decorate%vulkan.immutable_sampler_47 DescriptorSet0                                                Decorate%vulkan.immutable_sampler_47 Binding47                                                Decorate%class.vector4[256l] ArrayStride16                                                MemberDecorate%class.vector40 Offset0                                                MemberDecorate%union.anon0 Offset0                                                MemberDecorate%struct.anon0 Offset0                                                MemberDecorate%struct.anon1 Offset4                                                MemberDecorate%struct.anon2 Offset8                                                MemberDecorate%struct.anon3 Offset12                                                Decorate%enclose.class.vector4 Block                                                MemberDecorate%enclose.class.vector40 Offset0                                                Decorate%class.vector4[] ArrayStride16                                                Decorate %(StorageBuffer)enclose.class.vector4* ArrayStride16                                                Decorate%simplified_nbody.vulkan_uniform. NonWritable                                                Decorate%simplified_nbody.vulkan_uniform. DescriptorSet1                                                Decorate%simplified_nbody.vulkan_uniform. Binding0                                                Decorate%enclose.class.vector4_0 Block                                                MemberDecorate%enclose.class.vector4_00 Offset0                                                Decorate%class.vector4[]_0 ArrayStride16                                                Decorate %(StorageBuffer)enclose.class.vector4_0* ArrayStride16                                                Decorate%simplified_nbody.vulkan_uniform..1 DescriptorSet1                                                Decorate%simplified_nbody.vulkan_uniform..1 Binding1                                                Decorate%enclose.class.vector3 Block                                                MemberDecorate%enclose.class.vector30 Offset0                                                Decorate%class.vector3[] ArrayStride12                                                Decorate %(StorageBuffer)enclose.class.vector3* ArrayStride12                                                MemberDecorate%class.vector30 Offset0                                                MemberDecorate%union.anon.80 Offset0                                                MemberDecorate%struct.anon.90 Offset0                                                MemberDecorate%struct.anon.91 Offset4                                                MemberDecorate%struct.anon.92 Offset8                                                Decorate%simplified_nbody.vulkan_uniform..2 DescriptorSet1                                                Decorate%simplified_nbody.vulkan_uniform..2 Binding2                                                Decorate%enclose. Block                                                MemberDecorate%enclose.0 Offset0                                                Decorate%simplified_nbody.vulkan_uniform..3 NonWritable                                                Decorate%simplified_nbody.vulkan_uniform..3 Uniform                                                Decorate%simplified_nbody.vulkan_uniform..3 DescriptorSet1                                                Decorate%simplified_nbody.vulkan_uniform..3 Binding3                                                Decorate%simplified_nbody.vulkan_builtin_input. BuiltIn WorkgroupId                                                Decorate%simplified_nbody.vulkan_builtin_input..4 BuiltIn NumWorkgroups                                                Decorate%simplified_nbody.vulkan_builtin_input..5 BuiltIn SubgroupId                                                Decorate%simplified_nbody.vulkan_builtin_input..6 BuiltIn SubgroupLocalInvocationId                                                Decorate%simplified_nbody.vulkan_builtin_input..7 BuiltIn SubgroupSize                                                Decorate%simplified_nbody.vulkan_builtin_input..8 BuiltIn NumSubgroups                                                Decorate %(Workgroup)class.vector4[256l]* ArrayStride4096                                                Decorate%155 NoSignedWrap                                                Decorate%155 NoUnsignedWrap%ilong = TypeInt641%iint = TypeInt321%256l = Constant%ilong256%8i = Constant%iint8%0i = Constant%iint0%1i = Constant%iint1%2i = Constant%iint2%3i = Constant%iint3%2504i = Constant%iint2504%0l = Constant%ilong0%1l = Constant%ilong1%256i = Constant%iint256%Sampler = TypeSampler                   %(UniformConstant)Sampler* = TypePointer UniformConstant%Sampler%float = TypeFloat32%struct.anon = TypeStruct%float%float%float%float%union.anon = TypeStruct%struct.anon%class.vector4 = TypeStruct%union.anon%class.vector4[256l] = TypeArray%class.vector4%256l             %(Workgroup)class.vector4[256l]* = TypePointer Workgroup%class.vector4[256l]%void = TypeVoid%void() = TypeFunction%void%class.vector4[] = TypeRuntimeArray%class.vector4%enclose.class.vector4 = TypeStruct%class.vector4[]       %(StorageBuffer)enclose.class.vector4* = TypePointer StorageBuffer%enclose.class.vector4%class.vector4[]_0 = TypeRuntimeArray%class.vector4%enclose.class.vector4_0 = TypeStruct%class.vector4[]_0     %(StorageBuffer)enclose.class.vector4_0* = TypePointer StorageBuffer%enclose.class.vector4_0%struct.anon.9 = TypeStruct%float%float%float%union.anon.8 = TypeStruct%struct.anon.9%class.vector3 = TypeStruct%union.anon.8%class.vector3[] = TypeRuntimeArray%class.vector3%enclose.class.vector3 = TypeStruct%class.vector3[]       %(StorageBuffer)enclose.class.vector3* = TypePointer StorageBuffer%enclose.class.vector3%enclose. = TypeStruct%float                          %(Uniform)enclose.* = TypePointer Uniform%enclose.                                    %<3xiint> = TypeVector%iint3                            %(Input)<3xiint>* = TypePointer Input %<3xiint>                                %(Input)iint* = TypePointer Input%iint                       %(StorageBuffer)float* = TypePointer StorageBuffer%float                           %(Workgroup)float* = TypePointer Workgroup%float%bool = TypeBool                             %(Uniform)float* = TypePointer Uniform%float%vulkan.immutable_sampler_0 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_1 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_2 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_3 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_4 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_5 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_6 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_7 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_8 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_9 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_10 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_11 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_12 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_13 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_14 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_15 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_16 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_17 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_18 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_19 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_20 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_21 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_22 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_23 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_24 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_25 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_26 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_27 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_28 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_29 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_30 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_31 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_32 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_33 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_34 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_35 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_36 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_37 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_38 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_39 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_40 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_41 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_42 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_43 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_44 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_45 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_46 = Variable %(UniformConstant)Sampler* UniformConstant%vulkan.immutable_sampler_47 = Variable %(UniformConstant)Sampler* UniformConstant%_ZZ16simplified_nbodyE20local_body_positions = Variable %(Workgroup)class.vector4[256l]* Workgroup%simplified_nbody.vulkan_uniform. = Variable %(StorageBuffer)enclose.class.vector4* StorageBuffer%simplified_nbody.vulkan_uniform..1 = Variable %(StorageBuffer)enclose.class.vector4_0* StorageBuffer%simplified_nbody.vulkan_uniform..2 = Variable %(StorageBuffer)enclose.class.vector3* StorageBuffer%simplified_nbody.vulkan_uniform..3 = Variable %(Uniform)enclose.* Uniform%simplified_nbody.vulkan_builtin_input. = Variable %(Input)<3xiint>* Input%simplified_nbody.vulkan_builtin_input..4 = Variable %(Input)<3xiint>* Input%simplified_nbody.vulkan_builtin_input..5 = Variable %(Input)iint* Input%simplified_nbody.vulkan_builtin_input..6 = Variable %(Input)iint* Input%simplified_nbody.vulkan_builtin_input..7 = Variable %(Input)iint* Input%simplified_nbody.vulkan_builtin_input..8 = Variable %(Input)iint* Input%0.0f = Constant%float0%9.99999975e-05f = Constant%float9.99999975e-05%0.999000013f = Constant%float0.999000013functionvoid simplified_nbody (%void() ) {92:%98 = Load %<3xiint>%simplified_nbody.vulkan_builtin_input. Aligned16%99 = CompositeExtract%iint%980%101 = ShiftLeftLogical%iint%99%8i%102 = Load%iint%simplified_nbody.vulkan_builtin_input..6 Aligned4%103 = Load%iint%simplified_nbody.vulkan_builtin_input..5 Aligned4%104 = Load%iint%simplified_nbody.vulkan_builtin_input..7 Aligned4%105 = IMul%iint%103%104%106 = IAdd%iint%105%102%107 = IAdd%iint%101%106%108 = Load %<3xiint>%simplified_nbody.vulkan_builtin_input..4 Aligned16%109 = CompositeExtract%iint%1080%110 = ShiftLeftLogical%iint%109%8i%113 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%107%0i%0i%0i%115 = Load%float%113 Aligned|MakePointerVisible|NonPrivatePointer4%1i%116 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%107%0i%0i%1i%117 = Load%float%116 Aligned|MakePointerVisible|NonPrivatePointer4%1i%119 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%107%0i%0i%2i%120 = Load%float%119 Aligned|MakePointerVisible|NonPrivatePointer4%1i%121 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..2%0i%0i%107%0i%0i%0i%122 = Load%float%121 Aligned|MakePointerVisible|NonPrivatePointer4%1i%123 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..2%0i%0i%107%0i%0i%1i%124 = Load%float%123 Aligned|MakePointerVisible|NonPrivatePointer4%1i%125 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..2%0i%0i%107%0i%0i%2i%126 = Load%float%125 Aligned|MakePointerVisible|NonPrivatePointer4%1i%128 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%106%0i%0i%0i%129 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%106%0i%0i%1i%130 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%106%0i%0i%2i%132 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%106%0i%0i%3i                                                Branch%9393:%134 = Phi%iint (%133 <-%96,%0i <-%92 )%136 = Phi%iint (%135 <-%96,%0i <-%92 )%139 = Phi%float (%0.0f <-%92,%138 <-%96 )%141 = Phi%float (%0.0f <-%92,%140 <-%96 )%143 = Phi%float (%0.0f <-%92,%142 <-%96 )%144 = ShiftLeftLogical%iint%136%8i%145 = IAdd%iint%106%144%146 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%145%0i%0i%0i%147 = Load%float%146 Aligned|MakePointerVisible|NonPrivatePointer4%1i%148 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%145%0i%0i%1i%149 = Load%float%148 Aligned|MakePointerVisible|NonPrivatePointer4%1i%150 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%145%0i%0i%2i%151 = Load%float%150 Aligned|MakePointerVisible|NonPrivatePointer4%1i%152 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform.%0i%0i%145%0i%0i%3i%153 = Load%float%152 Aligned|MakePointerVisible|NonPrivatePointer4%1i                                                Store%128%147 Aligned4                                                Store%129%149 Aligned4                                                Store%130%151 Aligned4                                                Store%132%153 Aligned4                                                ControlBarrier%2i%2i%2504i                                                LoopMerge%97%96 None                                                Branch%9494:%157 = Phi%ilong (%155 <-%94,%0l <-%93 )%158 = Phi%float (%139 <-%93,%138 <-%94 )%159 = Phi%float (%141 <-%93,%140 <-%94 )%160 = Phi%float (%143 <-%93,%142 <-%94 )%161 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%157%0i%0i%0i%162 = Load%float%161 Aligned4%163 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%157%0i%0i%1i%164 = Load%float%163 Aligned4%165 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%157%0i%0i%2i%166 = Load%float%165 Aligned4%167 = FSub%float%162%115%168 = FSub%float%164%117%169 = FSub%float%166%120%171 = ExtInst%float%1 Fma%167%167%9.99999975e-05f%172 = ExtInst%float%1 Fma%168%168%171%173 = ExtInst%float%1 Fma%169%169%172%174 = ExtInst%float%1 InverseSqrt%173%175 = PtrAccessChain %(Workgroup)float*%_ZZ16simplified_nbodyE20local_body_positions%0i%157%0i%0i%3i%176 = Load%float%175 Aligned4%177 = FMul%float%174%174%178 = FMul%float%177%174%179 = FMul%float%178%176%142 = ExtInst%float%1 Fma%179%167%160%140 = ExtInst%float%1 Fma%179%168%159%138 = ExtInst%float%1 Fma%179%169%158%155 = IAdd%ilong%157%1l%186 = IEqual%bool%155%256l                                                LoopMerge%95%94 None                                                BranchConditional%186%95%9495:                                                Branch%9696:                                                ControlBarrier%2i%2i%2504i%133 = IAdd%iint%134%256i%135 = IAdd%iint%136%1i%190 = ULessThan%bool%133%110                                                BranchConditional%190%93%9797:%192 = InBoundsAccessChain %(Uniform)float*%simplified_nbody.vulkan_uniform..3%0i%193 = Load%float%192 Aligned4%194 = ExtInst%float%1 Fma%193%142%122%195 = ExtInst%float%1 Fma%193%140%124%196 = ExtInst%float%1 Fma%193%138%126%198 = FMul%float%194%0.999000013f%199 = FMul%float%195%0.999000013f%200 = FMul%float%196%0.999000013f%201 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..1%0i%0i%107%0i%0i%0i%202 = Load%float%201 Aligned|MakePointerVisible|NonPrivatePointer4%1i%203 = ExtInst%float%1 Fma%198%193%202                                                Store%201%203 Aligned|MakePointerAvailable|NonPrivatePointer4%1i%204 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..1%0i%0i%107%0i%0i%1i%205 = Load%float%204 Aligned|MakePointerVisible|NonPrivatePointer4%1i%206 = ExtInst%float%1 Fma%199%193%205                                                Store%204%206 Aligned|MakePointerAvailable|NonPrivatePointer4%1i%207 = PtrAccessChain %(StorageBuffer)float*%simplified_nbody.vulkan_uniform..1%0i%0i%107%0i%0i%2i%208 = Load%float%207 Aligned|MakePointerVisible|NonPrivatePointer4%1i%209 = ExtInst%float%1 Fma%200%193%208                                                Store%207%209 Aligned|MakePointerAvailable|NonPrivatePointer4%1i                                                Store%121%198 Aligned|MakePointerAvailable|NonPrivatePointer4%1i                                                Store%123%199 Aligned|MakePointerAvailable|NonPrivatePointer4%1i                                                Store%125%200 Aligned|MakePointerAvailable|NonPrivatePointer4%1i                                                Return}

Requirements

  • OS:

    • only AMD64/Intel64/ARM64 are supported

    • Windows: NT 6.1+

    • macOS: 13.0+

    • iOS: 16.0+

    • Linux: any current x64 distribution

    • other Unix: if other requirements are met

  • compiler/toolchain:

  • libraries and optional requirements:

    • SDL3 3.1.3+

    • (opt) OpenCL: requires OpenCL 1.2+ SDK and CPU/GPU drivers (Intel,AMD)

    • (opt) CUDA: requires sm_50+/Maxwell+ GPU and CUDA 12.0+ drivers (CUDA SDK not required!)

    • (opt) Metal: requires iOS 16.0+ or macOS 13.0+, and a Metal 3.0 capable GPU

    • (opt) Host-Compute: requires just the compiler/toolchain that is stated above

    • (opt) Vulkan: requires 1.4.309+ICD loader / headers / SDK,volk included as submodule

    • (opt) OpenVR: requiresOpenVR

    • (opt) OpenXR: requiresOpenXR

Build Instructions

  • ensure git submodules are cloned and up-to-date:git submodule update --init --recursive

General / CLI

  • run./build.sh (use./build.sh help to get a list of all options)

  • configuration of optional parts:

    • to disable OpenCL:defineFLOOR_NO_OPENCL or./build.sh no-opencl

    • to disable CUDA:defineFLOOR_NO_CUDA or./build.sh no-cuda

    • to disable Metal (only affects macOS/iOS builds):defineFLOOR_NO_METAL or./build.sh no-metal

    • to disable Host Compute:defineFLOOR_NO_HOST_COMPUTE or./build.sh no-host-compute

    • to disable Vulkan:defineFLOOR_NO_VULKAN or./build.sh no-vulkan

    • to disable OpenVR:defineFLOOR_NO_OPENVR or./build.sh no-openvr

    • to disable OpenXR:defineFLOOR_NO_OPENXR or./build.sh no-openxr

    • to build with libstdc++ (GCC 13.0+) instead of libc++:./build.sh libstdc++

CMake / ninja / CLI

  • this is provided as an alternative to build.sh and Xcode

  • create a build folder andcd into it

  • runcmake -G "Ninja" -S "<path-to-libfloor>" <options>

  • options:

    • to build a static library instead of a shared/dynamic one:-DBUILD_SHARED_LIBS=OFF

    • to explicitly use libc++:-DWITH_LIBCXX=ON

    • to build with address sanitizer:-DWITH_ASAN=ON

  • runninja

Xcode (macOS / iOS)

  • openfloor.xcodeproj and build

  • some notes:

    • almost all optional parts of floor are enabled here and you’ll have to install all dependencies or disable them manually

    • Homebrew is the recommended way to install additional dependencies:
      /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

    • (opt) downloadOpenVR and manually install it:

      • mkdir -p {/usr/local/include/openvr,/usr/local/lib}

      • cp openvr/headers/* /usr/local/include/openvr/

      • cp openvr/bin/osx32/libopenvr_api.dylib /usr/local/lib/

    • command line tools might be necessary, install them with:xcode-select --install

    • on iOS, either copy dependencies into your iPhoneOS and iPhoneSimulator SDK, orfloor/ios/deps/{include,lib}

    • iOS linker flags for a depending project:-lSDL3 -lfloor

Visual Studio (Windows / CMake / vcpkg)

  • installVisual Studio 2022

  • in "Workloads" select "Desktop development with C++", in "Individual components" search for and select all clang packages

  • install and wait

  • installVulkan SDK

  • install vcpkg (somewhere, not within libfloor):

  • install vcpkg packages:

    • .\vcpkg --triplet x64-windows install sdl3 OpenCL vulkan openvr openxr-loader

  • add a user (or system) environment variableVCPKG_ROOT that points to the vcpkg folder

  • in Visual Studio: Tools → Options → search for vcpkg and set the custom vcpkg.exe path

  • in Visual Studio: open folderfloor (wait a little until build files are generated)

  • selectDebug orRelease configuration and build

  • NOTE: all dependencies (optional parts) are enabled here

  • NOTE: having other build environments/systems inPATH (e.g. MSYS2/MinGW) may result in install/build issues

Installation

Installation (Unix / macOS)

  • sudo mkdir -p /opt/floor/include

  • sudo ln -sf /path/to/floor /opt/floor/include/floor

  • sudo ln -sf /path/to/floor/bin /opt/floor/lib

  • alternatively: copy these files/folders there

Installation (Windows)

  • create a%%ProgramFiles%%/floor folder (C:/Program Files/floor)

  • inside this folder:

    • create alib folder

    • VS2022:

      • copy everything from bin/ in there (dlls/lib/exp)

    • MinGW/MSYS2:

      • copy libfloor_static.a/libfloord_static.a there

    • create aninclude folder and copy the originalfloor folder in there (containing all floor source code)

Compute/Graphics Toolchain

  • automated builds for Linux, macOS and Windows can be found at:https://libfloor.org/builds/toolchain

  • NOTE: this requires a Unix environment with all LLVM build dependencies installed - use MSYS2 on Windows

  • NOTE: the absolute build path must not contain spaces

  • compile the toolchain:

    • cd floor/etc/llvm140/ && ./build.sh

    • if successful, package it (in addition to a .zip file, this also creates a folder with all necessary binaries and include files):./pkg.sh

  • install the toolchain:

    • Unix:

      • automatic:

        • development: run./deploy_dev.sh from the floor/etc/llvm140/ folder (this will create symlinks to everything in floor and floor/etc/llvm140)

        • release: run./deploy_pkg.sh from inside the toolchain package folder (floor/etc/llvm140/toolchain_140006_*; this will copy everything)

      • manual:

        • copy the toolchain folder astoolchain to/opt/floor/ (should then be/opt/floor/toolchain/{bin,clang,libcxx})

        • inside/opt/floor/toolchain, add a symlink to thefloor include folder:sudo ln -sf ../include floor

    • Windows:

      • copy the toolchain folder astoolchain to%%ProgramFiles%%/floor (should then be%%ProgramFiles%%/floor/toolchain/{bin,clang,libcxx})

      • inside%%ProgramFiles%%/floor/toolchain, copy thefloor folder from theinclude folder above it into this folder

  • NOTE: this is the expected default setup - paths can be changed inside config.json (toolchain.generic.paths)

Misc Hints

  • when using X11 forwarding, set these env variables:

    • export SDL_VIDEO_X11_NODIRECTCOLOR=yes

  • depending on how your Linux distribution handles OpenCL headers and library, you might need to manually install OpenCL 1.2+ compatible ones

  • Host-Compute device execution requires locked/pinned memory, which may be very limited in default Linux configurations (usually 64KiB)

    • libfloor will try to increase the limit to 32MiB per logical CPU core, but this may fail if the max limit is too low

    • to increase the max limit,/etc/security/limits.conf must be modified

    • as a simple workaround, add the following line to it (replace user_name with your user name) and relog:

      • user_name hard memlock unlimited

    • NOTE: when using ssh, PAM must be enabled for this to apply

  • depending on your Vulkan implementation, you may also need to increase the max amount of open files (usual default is 1024 files)

    • libfloor will try to increase the limit to 256 files per logical CPU core, but this may fail if the max limit is too low

    • to increase the max limit,/etc/security/limits.conf must be modified

    • as a simple workaround, add the following line to it (replace user_name with your user name) and relog:

      • user_name hard nofile unlimited

    • NOTE: when using ssh, PAM must be enabled for this to apply

Projects and Examples using libfloor


[8]ページ先頭

©2009-2025 Movatter.jp