Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add addition, subtraction, multiplication, and compare operations forf128#606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Amanieu merged 8 commits intorust-lang:masterfromtgross35:f16-f128-intrinsics-min
May 16, 2024

Conversation

@tgross35
Copy link
Contributor

@tgross35tgross35 commentedMay 10, 2024
edited
Loading

Division is not yet working, but all others were straigtforward to add so I split them out of#587.

This includes some bigger changes that are split per commit:

  • SplitInt intoInt andMinInt so we can use traits with bigint types
  • Add 256-bit bigint types for widening operations on 128-bit integers
  • Refactor test macros so that systems without implementations test againstrustc_apfloat rather than just being skipped. I needed this so I can actually debug.
  • Add implementations and tests for the new operations
  • Change powerpc symbol names to match what LLVM emits, fromhttps://gcc.gnu.org/wiki/Ieee128PowerPC

@tgross35tgross35force-pushed thef16-f128-intrinsics-min branch 7 times, most recently from2e9bb45 to7871c81CompareMay 11, 2024 05:43
@tgross35
Copy link
ContributorAuthor

tgross35 commentedMay 11, 2024
edited
Loading

Powerpc seems to be hitting a SIGILL onstxvd2x vs34,0,r9. Fromhttps://bugzilla.redhat.com/show_bug.cgi?id=1045384 andhttps://bugzilla.redhat.com/show_bug.cgi?id=1002077 it seems like this may be a limitation of qemu, but information is scarce. I may just need to use the apfloat fallback.

Also looks like symbol names may be different, I need to double checkhttps://gcc.gnu.org/wiki/Ieee128PowerPC

@tgross35
Copy link
ContributorAuthor

tgross35 commentedMay 11, 2024
edited
Loading

Ok, some more info. Compiling C for 32-bit powerpc seems to always emit__gcc_qadd regardless of-mlong-double flag. When forced to use__addkf3 it segfaults as expected because of thestxvd2x instruction. I think we are okay just ignoring this one as a likely limitation of qemu.

Compiling C for powerpc64 seems to use an ifunc resolver for__addkf3, which goes to__addkf3_hw for the single-instructionxsaddqp. Compiling Rust seems to do the exact same thing but I am getting different results, __addtf3(0x00000000000000000000000000000000, 0x00000000000000000000000000000001): std: 0x00000000000000000000000000000000, builtins: 0x00000000000000000000000000000001.I get the same results when calling__addkf3 directly in Cedit: no I don't. Maybe the operation format of xsaddqp needs to be set somehow to choose between ppc doubledouble and ieee f128?

@tgross35
Copy link
ContributorAuthor

tgross35 commentedMay 12, 2024
edited
Loading

Some asm for anyone who understands it better:

Assembly generated from Rust (incorrect result)
0000000000010d00 <.add_entry>:   10d00:   7c0802 a6     mflr    r0   10d04:   f821 ff91     stdu    r1,-112(r1)   10d08:   f8010080std     r0,128(r1)   10d0c:   4b ff c695bl      d3a0 <00000143.plt_call.__addkf3>   10d10:   e8410028     ld      r2,40(r1)   10d14:38210070     addi    r1,r1,112   10d18:   e8010010     ld      r0,16(r1)   10d1c:   7c0803 a6     mtlr    r0   10d20:   4e800020     blr000000000000d3a0 <00000143.plt_call.__addkf3>:    d3a0:       f8410028std     r2,40(r1)    d3a4:       3d62 ff ff     addisr11,r2,-1    d3a8:       e9 8b 7f58     ldr12,32600(r11)    d3ac:       7d8903 a6     mtctrr12    d3b0:       e8 4b 7f60     ld      r2,32608(r11)    d3b4:       4e800420     bctr0000000000056790 <.__addkf3_resolve>:56790:81 2d 8f 9c     lwzr9,-28772(r13)56794:75290040     andis.r9,r9,6456798:41820018     beq     567b0 <.__addkf3_resolve+0x20>   5679c:       e8628010     ld      r3,-32752(r2)   567a0:       4e800020     blr   567a4:60000000nop   567a8:60000000nop   567ac:60420000     ori     r2,r2,0   567b0:       e8628018     ld      r3,-32744(r2)   567b4:       4e800020     blr        ...   567c4:60000000nop   567c8:60000000nop   567cc:60420000     ori     r2,r2,0000000000005e6e0 <.__addkf3_hw>:   5e6e0:       fc421808     xsaddqp v2,v2,v3   5e6e4:       4e800020     blr        ...   5e6f4:60000000nop   5e6f8:60000000nop   5e6fc:60000000nop0000000000057050 <.__addkf3_sw>:57050:       fb41 ff d0std     r26,-48(r1)57054:       fb61 ff d8std     r27,-40(r1)57058:       fb81 ff e0std     r28,-32(r1)   5705c:       fb a1 ff e8std     r29,-24(r1)57060:       fb c1 ff f0std     r30,-16(r1)57064:       fb e1 ff f8std     r31,-8(r1)57068:       f821 ff41     stdu    r1,-192(r1)   5706c:39210070     addir9,r1,11257070:39410070     addir10,r1,112   // ... full sw implementation
Assembly generated from C (correct result)
0000000010000af8 <.add_entry>:    10000af8:   7c0802 a6     mflr    r0    10000afc:   f8010010std     r0,16(r1)    10000b00:   fb e1 ff f8std     r31,-8(r1)    10000b04:   f821 ff81     stdu    r1,-128(r1)    10000b08:   7c 3f0b78     mr      r31,r1    10000b0c:39200030     lir9,48    10000b10:39 5f0080     addir10,r31,128    10000b14:   7c 4a 4f99     stxvd2x vs34,r10,r9    10000b18:39200040     lir9,64    10000b1c:39 5f0080     addir10,r31,128    10000b20:   7c 6a 4f99     stxvd2x vs35,r10,r9    10000b24:39200040     lir9,64    10000b28:39 5f0080     addir10,r31,128    10000b2c:   7c 6a 4e99     lxvd2x  vs35,r10,r9    10000b30:39200030     lir9,48    10000b34:39 5f0080     addir10,r31,128    10000b38:   7c 4a 4e99     lxvd2x  vs34,r10,r9    10000b3c:   4b ff fb a5bl      100006e0 <00000019.plt_call.__addkf3>    10000b40:   e8410028     ld      r2,40(r1)    10000b44:   f0021496     xxmr    vs0,vs34    10000b48:   f0400491     xxmr    vs34,vs0    10000b4c:38 3f0080     addi    r1,r31,128    10000b50:   e8010010     ld      r0,16(r1)    10000b54:   7c0803 a6     mtlr    r0    10000b58:   eb e1 ff f8     ld      r31,-8(r1)    10000b5c:   4e800020     blr    10000b60:00000000     .long0x0    10000b64:00000001     .long0x1    10000b68:80010001     lwz     r0,1(r1)0000000010000c40 <.__addkf3_resolve>:    10000c40:81 2d 8f 9c     lwzr9,-28772(r13)    10000c44:75290040     andis.r9,r9,64    10000c48:41820018     beq     10000c60 <.__addkf3_resolve+0x20>    10000c4c:   e8628010     ld      r3,-32752(r2)    10000c50:   4e800020     blr    10000c54:60000000nop    10000c58:60000000nop    10000c5c:60420000     ori     r2,r2,0    10000c60:   e8628018     ld      r3,-32744(r2)    10000c64:   4e800020     blr        ...    10000c74:60000000nop    10000c78:60000000nop    10000c7c:60420000     ori     r2,r2,00000000010008b90 <.__addkf3_hw>:    10008b90:   fc421808     xsaddqp v2,v2,v3    10008b94:   4e800020     blr        ...    10008ba4:60000000nop    10008ba8:60000000nop    10008bac:60000000nop0000000010001500 <.__addkf3_sw>:10001500:   fb41 ff d0std     r26,-48(r1)10001504:   fb61 ff d8std     r27,-40(r1)10001508:   fb81 ff e0std     r28,-32(r1)    1000150c:   fb a1 ff e8std     r29,-24(r1)10001510:   fb c1 ff f0std     r30,-16(r1)10001514:   fb e1 ff f8std     r31,-8(r1)10001518:   f821 ff41     stdu    r1,-192(r1)    1000151c:39210070     addir9,r1,11210001520:39410070     addir10,r1,112    // ...
Rust code
#![feature(f128)]#[no_mangle]#[inline(never)]fnadd_entry(a:f128,b:f128) ->f128{    a + b}fnmain(){let a = f128::from_bits(0x0);let b = f128::from_bits(0x1);dbg!(a, b);let c =add_entry(a, b);dbg!(c);}
C code
#include<stdio.h>#include<stdlib.h>#include<inttypes.h>#define_Float128 __float128typedefstruct {#if__BYTE_ORDER==__LITTLE_ENDIANuint64_tlower,upper;#elif__BYTE_ORDER==__BIG_ENDIANuint64_tupper,lower;#else#error missing endian check#endif} __attribute__((aligned(_Alignof(_Float128))))u128;_Float128__addkf3(_Float128,_Float128);voidf128_print(_Float128val) {u128ival=*((u128*)(&val));printf("%#018"PRIx64"%016"PRIx64" %lf\n",ival.upper,ival.lower, (double)val);}_Float128new_f128(uint64_tupper,uint64_tlower) {u128val= { .lower=lower, .upper=upper };return*((_Float128*)(&val));}_Float128add_entry(_Float128a,_Float128b) {#ifdefUSE_ADDKF3return__addkf3(a,b);#elsereturna+b;#endif}intmain() {_Float128a=new_f128(0x0000000000000000,0x0000000000000000);_Float128b=new_f128(0x0000000000000000,0x0000000000000001);f128_print(a);f128_print(b);_Float128c=add_entry(a,b);f128_print(c);return0;}

@tgross35tgross35force-pushed thef16-f128-intrinsics-min branch 11 times, most recently from5119b1a to3133d8fCompareMay 12, 2024 22:25
@tgross35
Copy link
ContributorAuthor

I don't have any further insight on powerpc64, so I'll just disable testing against the system for now unless you have any suggestions@Amanieu. We're still testing against apfloat so our compiler-builtins is correct, but using the system symbols may cause precision issues for users. I think we can probably just address this more after everything merges.

This PR is probably best reviewed by-commit.

@tgross35tgross35 marked this pull request as ready for reviewMay 12, 2024 22:58
@Amanieu
Copy link
Member

@lu-zero Maybe you have some insight into the powerpc issues?

@tgross35
Copy link
ContributorAuthor

Discussed some more athttps://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/f128.20system.20libraries.20noncompliant.20platforms/near/438486364,@ecnelises is going to take a look at some point. This PR doesn't break anything new so I don't think it is blocked, openedrust-lang/rust#125109 to track the issue further.

@lu-zero
Copy link

@Amanieu I can build and try on real hardware. We can ask for real runners, I think.

@tgross35
Copy link
ContributorAuthor

tgross35 commentedMay 14, 2024
edited
Loading

@lu-zero could you try the code at#606 (comment) on powerpc64? I put more about how I am building atrust-lang/rust#125109

Less pressing but it would also be nice if you have a way to confirm thatstxvd2x does not sigill on realpowerpc-* targets, as in#606 (comment)

@lu-zero
Copy link

so this is what I'm getting.

$ cargo run   Compiling test_f128 v0.1.0 (/home/lu_zero/test_f128)    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s     Running `target/debug/test_f128`[src/main.rs:12:5] a = 0x00000000000000000000000000000000[src/main.rs:12:5] b = 0x00000000000000000000000000000001[src/main.rs:14:5] c = 0x00000000000000000000000000000001

@tgross35
Copy link
ContributorAuthor

Well that is interesting. That is 64-bit, correct?

@lu-zero
Copy link

lu-zero commentedMay 14, 2024
edited
Loading

Yes, tested on power9, used as ppc64le

@tgross35
Copy link
ContributorAuthor

LE seems to work fine in qemu, the green test in CI isn't using the fallback. Onlypowerpc- andpowerpc64- are causing the issueshttps://github.com/rust-lang/compiler-builtins/blob/3133d8f555580d93645eb763f94ec4cb59ac5880/testcrate/build.rs

@lu-zero
Copy link

Sadly less and less people care about BE and it is fairly annoying to set up a BE environment =/

@tgross35
Copy link
ContributorAuthor

Ah, well thanks for taking a look! I'll dig a little bit more but I think we are probably okay to not worry about it unless rust-lang/rust unit f128 unit tests have failures (we currently can't test much without this)

@lu-zero
Copy link

Either qemu gets fixed for BE or we'll need a BE runner.

@tgross35
Copy link
ContributorAuthor

I don't think the ppc64 BE is a qemu issue since the C version works fine. 32-bit hopefully should be just qemu.

Agreed that native runners wouldn't be a bad thing.

`MinInt` contains the basic methods that are only needed by integersinvolved in widening operations, i.e. big integers. `Int` retains allother operations and convenience methods.
Change float test macros to fall back to testing against `rustc_apfloat`when system implementations are not available, rather than just skippingtests.This allows for easier debugging where operations may not be supported.
@tgross35tgross35force-pushed thef16-f128-intrinsics-min branch from3133d8f to6a847abCompareMay 15, 2024 12:19
@Amanieu
Copy link
Member

This took a while to fully review, but LGTM!

@AmanieuAmanieu merged commit449643f intorust-lang:masterMay 16, 2024
@tgross35
Copy link
ContributorAuthor

Thanks for taking a look!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@tgross35@Amanieu@lu-zero

[8]ページ先頭

©2009-2025 Movatter.jp