Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

CPU feature detection in core#3469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
Amanieu wants to merge4 commits intorust-lang:masterfromAmanieu:core-detect

Conversation

@Amanieu
Copy link
Member

@AmanieuAmanieu commentedAug 3, 2023
edited by rustbot
Loading

Rendered

This RFC moves theis_*_feature_detected macros intocore, but keeps the logic for actually doing feature detection (which requires OS support) instd.

boehs, lebensterben, and mcobzarenco reacted with thumbs up emojitgross35, programmerjake, clarfonthey, mkroening, ChayimFriedman2, soqb, Freax13, paolobarbolini, andylizi, BurntSushi, and 12 more reacted with heart emoji
@ehussehuss added the T-libs-apiRelevant to the library API team, which will review and decide on the RFC. labelAug 4, 2023
@programmerjake
Copy link
Member

programmerjake commentedAug 4, 2023
edited
Loading

I think it would be useful to have theis_*_feature_detected macros instead run code like:

macro_rules! is_x86_feature_detected{("sse") =>{ feature_detect(X86Feature::SSE)};("sse2") =>{ feature_detect(X86Feature::SSE2)};("avx") =>{ feature_detect(X86Feature::AVX)};("avx2") =>{ feature_detect(X86Feature::AVX2)};// ...}#[repr(u16)]pubenumX86Feature{// arbitrary order since I didn't bother to look it up,// we'd probably want to base these off `cpuid` bit positions// or something similar, since that makes `rust_cpu_feature_detect` much simplerSSE,SSE2,AVX,AVX2,// ... lots of elided featuresAVX512F,// assume this is the last one}implX86Feature{pubconstMAX:Self =Self::AVX512F;// assume this is the last one}#[derive(Copy,Clone,Eq,PartialEq,Debug,Default)]pubstructX86Features(pub[usize;Self::ARRAY_SIZE]);implX86Features{pubconstARRAY_SIZE:usize =X86Feature::MAXasusize / usize::BITSasusize +1;}extern"Rust"{// this should have some kind of weak linking or somethingfnrust_cpu_feature_detect() ->X86Features;}#[inline]pubfnfeature_detect(feature:X86Feature) ->bool{constZ:AtomicUsize =AtomicUsize::new(0);staticCACHE:[AtomicUsize;X86Features::ARRAY_SIZE] =[Z;X86Features::ARRAY_SIZE];staticCACHE_VALID:AtomicBool =AtomicBool::new(false);#[cold]fnfill_cache(){for(cache, v)inCACHE.iter().zip(&unsafe{rust_cpu_feature_detect()}.0){            cache.store(*v,Ordering::Relaxed);}CACHE_VALID.store(true,Ordering::Release);}// intentionally only use atomic store/load to avoid needing cmpxchg or similar for cpus without support for thatif !CACHE_VALID.load(Ordering::Acquire){fill_cache();}let index = featureasusize;let bit = index % usize::BITSasusize;let index = index / usize::BITSasusize;(CACHE[index].load(Ordering::Relaxed) >> bit)&1 !=0}

@Lokathor
Copy link
Contributor

That's approximately howstd_detect works already, could you explain what in particular is different about your example code?

@programmerjake
Copy link
Member

programmerjake commentedAug 4, 2023
edited
Loading

That's approximately howstd_detect works already, could you explain what in particular is different about your example code?

it's not what@Amanieu proposed? in particular the code has the initialization driven by the threads callingis_*_feature_detected instead of depending onmain or similar to fill it in and hope that's before you need it.

it also needs much less cache space for platforms without atomicfetch_or

@Lokathor
Copy link
Contributor

Ah, well when you put it like that I see the difference :3


## Using a lang item to call back into `std`

Instead of having `std` "push" the CPU features to `core` at initialization time, an alternative design would be for `core` to "pull" this information from `std` by calling a lang item defined in `std`. The problem with this approach is that it doesn't provide a clear path for how this would be exposed to no-std programs which want to do their own feature detection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A lang item with two "grades" - weak and strong, core defines a weak version of the lang item so users are notrequired to provide their own lang item definition, but anyone can override it with a strong version of the same lang item (libstd will do that as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(A lang item can have a stable alias like#[panic_handler].)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Weak lang items is already used as name for lang items like#[panic_handler] that need not be defined when it is used, but does need to be defined when linking if it was used anywhere. Maybe use preemptible lang items as name here?

Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>
@Amanieu
Copy link
MemberAuthor

I think it would be useful to have theis_*_feature_detected macros instead run code like:

This is what I described in alternative designshere.

The main issues are:

  • this requires a much larger API surface, which makes stabilization more difficult.
  • this is actually slightly slower since it requires an atomic check that the CPU features have been initialized.

it also needs much less cache space for platforms without atomicfetch_or

This couldalso be solved by requiring thatmark_*_feature_as_detected is only called before multiple threads access the CPU features (since it's unsafe anyways). This works for static initializers, even for dynamic libraries, since they are executed before any other code.

@programmerjake
Copy link
Member

I think it would be useful to have theis_*_feature_detected macros instead run code like:

This is what I described in alternative designshere.

The main issues are:

  • this requires a much larger API surface, which makes stabilization more difficult.

we don't have to have all that API surface stabilized, we can just stabilize the existence ofrust_cpu_feature_detect andX86Features with justARRAY_SIZE and the field.0 and stabilizing the location of each feature bit (copying cpuid locations would make that relatively uncontroversial), which imho is a rather small API surface for what it does.

  • this is actually slightly slower since it requires an atomic check that the CPU features have been initialized.

we'll likely want an atomic check anyway so we can be sure all the features are correctly synchronized with each other.
e.g. if we checkis_aarch64_feature_detected!("v8.1a") followed byis_aarch64_feature_detected!("v8.2a") then, assuming the function computing the cpu features always returns the same value within each program run,!v8_1a && v8_2a should never be true. if we don't have a acquire-load from the same location as the function computing cpu features release-stored to after writing all features to memory, then wecan end up with inconsistent features since relaxed-loads to different locations can be reordered with each other.

it also needs much less cache space for platforms without atomicfetch_or

This couldalso be solved by requiring thatmark_*_feature_as_detected is only called before multiple threads access the CPU features (since it's unsafe anyways). This works for static initializers, even for dynamic libraries, since they are executed before any other code.

the problem with that is that users are often specifically trying toavoid static initializers due to ordering issues and stuff:
rust-lang/rust#111921

@lyphyser
Copy link

There's also the option of including all detection code in libcore, using raw system calls in assembly if OS interaction is needed.

It makes libcore OS-specific, but I think that might be fine as long as it doesn't use any library.

@bjorn3
Copy link
Member

bjorn3 commentedAug 10, 2023
edited
Loading

Apart from Linux most OSes don't allow direct syscalls. Go tried, but had to revert back to using libc on most platforms after macOS changed the ABI of one syscall breaking every Go program in existence and OpenBSD added an exploit mitigation that makes your process crash if you try to do a syscall while outside of libc.

elichai, ghishadow, and danc86 reacted with thumbs up emoji

@lyphyser
Copy link

lyphyser commentedAug 10, 2023
edited
Loading

Apart from Linux most OSes don't allow direct syscalls. Go tried, but had to revert back to using libc on most platforms after macOS changed the ABI of one syscall breaking every Go program in existence and OpenBSD added an exploit mitigation that makes your process crash if you try to do a syscall while outside of libc.

I guess on those systems libcore can just link the C library. After all, it seems that no useful program can exist on those systems without linking the C library (since without system calls, the only things a program can do are to loop infinitely, waste a number of CPU cycles and then crash, or try to exploit the CPU or kernel), so might as well link it from libcore.

There's also the issue that in some cases CPU feature detection needs to access the ELF auxiliary values, but argv and envp are passed to static initializers, and the ELF auxiliary entries are guaranteed to be after the environment pointers which are after the argument pointers (at least on x86-64, but I guess all ABIs are like that), so there should be no need to call getauxval to access them.

@YurySolovyov
Copy link

might be worth thinking about something likeAVX10 which uses point versioning like 10.1, 10.2, etc.

@the8472
Copy link
Member

AVX10 will be more complicated than that because cores can have different capabilities. We currently have no way to represent those differences. We don't even have thread pinning in std which would be a prerequisite to make that work.

@YurySolovyov
Copy link

I thought the whole idea of AVX10 was to give cores identical capabilities 🤔

@the8472
Copy link
Member

the8472 commentedAug 22, 2023
edited
Loading

I thought the whole idea of AVX10 was to give cores identical capabilities 🤔

https://cdrdv2.intel.com/v1/dl/getContent/784267 Introduction section on page 14 as of revision 2.0 of the document

The sameinstructions will be supported across all cores. But maximum register width may vary across cores. Specifically P-cores may support ZMM registers while E-cores may be limited to YMM.

And in the CPUID table on page 15 I'm not seeing anything to query the minimum set across all cores instead of the current CPU...

BurntSushi, YurySolovyov, elichai, and ghishadow reacted with eyes emoji

@AmanieuAmanieu added the I-libs-api-nominatedIndicates that an issue has been nominated for prioritizing at the next libs-api team meeting. labelJul 23, 2024
Co-authored-by: Nilstrieb <48135649+Nilstrieb@users.noreply.github.com>
@m-ou-se
Copy link
Member

@rfcbot merge

@rfcbot
Copy link

rfcbot commentedJul 30, 2024
edited by Amanieu
Loading

Team member@m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

Seethis document for info about what commands tagged team members can give me.

@rfcbotrfcbot added proposed-final-comment-periodCurrently awaiting signoff of all team members in order to enter the final comment period. disposition-mergeThis RFC is in PFCP or FCP with a disposition to merge it. labelsJul 30, 2024
@joshtriplett
Copy link
Member

Could this RFC please add an unresolved question about the efficiency of having (say) a hundred of these at startup, and whether this can be reasonably optimized?

clarfonthey, kennytm, and ghishadow reacted with thumbs up emoji

@rfcbotrfcbot added the final-comment-periodWill be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labelJul 30, 2024
@rfcbot
Copy link

🔔This is now entering its final comment period, as per thereview above. 🔔

@rfcbotrfcbot removed the proposed-final-comment-periodCurrently awaiting signoff of all team members in order to enter the final comment period. labelJul 30, 2024
@AmanieuAmanieu removed the I-libs-api-nominatedIndicates that an issue has been nominated for prioritizing at the next libs-api team meeting. labelAug 6, 2024
@rfcbotrfcbot added finished-final-comment-periodThe final comment period is finished for this RFC. and removed final-comment-periodWill be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labelsAug 9, 2024
@rfcbot
Copy link

The final comment period, with a disposition tomerge, as per thereview above, is nowcomplete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

lqd, DianaNites, programmerjake, and Kobzol reacted with hooray emoji

@programmerjake
Copy link
Member

programmerjake commentedAug 9, 2024
edited
Loading

in unresolved questions, can you add the problem that some features imply other features (e.g.avx2 impliesavx) and we should try to make updates synchronized so we don't see a transient state whereavx2 is enabled butavx is disabled.

(I would have posted this on the tracking issue, but there isn't one yet...)

@Lokathor
Copy link
Contributor

The implementation of feature detection bystd sounds separate from ifcore-only code can call upon feature detection and expect other linked code to provide it.

@programmerjake
Copy link
Member

The implementation of feature detection bystd sounds separate from ifcore-only code can call upon feature detection and expect other linked code to provide it.

If this is in reply to#3469 (comment), I meant that the proposed implementation is just reading from some atomics incore, so we should synchronize updates to those atomics to prevent transient inconsistent features.

@clarfonthey
Copy link

Feels like multiple problems could be solved at once if the features were represented using some kind of bitflags-type struct that was applied over a slice of atomics. Things like transitive features (avx2 -> avx) are pretty naturally representable this way, since you can just make each feature set the bits it affects. If you make these relations between features clear to the people who need to set them, they should in theory be able to do a minimum number of set calls, rather than say, setting 100 flags at once like people were worried about.

@Lokathor
Copy link
Contributor

@programmerjake yes that was my intention.

My point is that this RFC is extending an existing and stable part ofstd to also work incore, and that's all it's doing. Any concerns abouthow thestd detection system works should be tracked separately in other issues or rfcs or whatever appropriate location.

@programmerjake
Copy link
Member

@programmerjake yes that was my intention.

My point is that this RFC is extending an existing and stable part ofstd to also work incore, and that's all it's doing.

no, it's also adding thecore::arch::mark_*_feature_as_detected APIs...my concern is about how those are implemented, since if they just naivelyfetch_or the one bit for each feature and don't think about which order to set the features in, it's very easy to end up readingavx2 as true since it happened to be set first butavx as false since that was set slightly later.

Lokathor, the8472, and xvln reacted with thumbs up emojikennytm reacted with confused emoji

@clarfonthey
Copy link

clarfonthey commentedAug 10, 2024
edited
Loading

@programmerjake yes that was my intention.
My point is that this RFC is extending an existing and stable part ofstd to also work incore, and that's all it's doing.

no, it's also adding thecore::arch::mark_*_feature_as_detected APIs...my concern is about how those are implemented, since if they just naivelyfetch_or the one bit for each feature and don't think about which order to set the features in, it's very easy to end up readingavx2 as true since it happened to be set first butavx as false since that was set slightly later.

Right, which is why I mentioned bitfields. If your atomics are integers instead of bools, then you can just set both bits at once and it works as expected.

Of course, this does require explicitly specifying these interdependencies between features, but I'd argue we should be doing that even without this RFC. It doesn't make sense foravx to be ever unavailable whenavx2 is available, and I'd imagine many people already rely on this.

@tgross35
Copy link
Contributor

@Amanieu since FCP completed is there anything blocking this from merging?

lebensterben reacted with thumbs up emoji

@the8472
Copy link
Member

Since the desire for externally implementable items was mentioned in the lang/libs meeting, is this RFC forward-compatible with switching from a static initializer to a pull-approach using such an item?

@tgross35
Copy link
Contributor

Since the desire for externally implementable items was mentioned in the lang/libs meeting, is this RFC forward-compatible with switching from a static initializer to a pull-approach using such an item?

I believe Amanieu does plan to make some changes here to account for externally implementable items (based on all hands discussion). My impression was that this would be less of an extension and more of a redesign of some parts.

@Amanieu
Copy link
MemberAuthor

While the libs-api team has completed an FCP on this RFC, this wasn't a commitment to a specific design but more to indicate that we are interested in having feature detection available in core with the ability for#![no_std] users to provide their own "feature provider backend".

Honestly the whole RFC probably needs to be rewritten with a new design based on externally implementable items (rust-lang/rust#125418) and with a pull-style API. That is, rather than requiring features to be initialized statically, it should use a callback-style API to have features initialized on first use like the current system for feature detection.

Since the design in this RFC is pretty outdated, I'm going to close this and encourage anyone interested in working on this to collaborate to design a new RFC for this.

I'll jot down some nodes which might be helpful:

  • It would be nice to share information about target features between the compiler and standard library. Notably details about target feature dependencies and target feature groups (e.g.v8.1-a on AArch64).
  • ACP: efficient runtime checking of multiple target features libs-team#585 has raised issues with the performance of the current feature detection API. This could be address by a better API (e.g. returning a struct containing a bit mask) which allows querying multiple features. This could be done as part of this redesign.
lebensterben reacted with thumbs up emojikennytm reacted with heart emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@petrochenkovpetrochenkovpetrochenkov left review comments

@bjorn3bjorn3bjorn3 left review comments

@NoratriebNoratriebNoratrieb left review comments

+1 more reviewer

@konsumlammkonsumlammkonsumlamm left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

disposition-mergeThis RFC is in PFCP or FCP with a disposition to merge it.finished-final-comment-periodThe final comment period is finished for this RFC.T-libs-apiRelevant to the library API team, which will review and decide on the RFC.to-announce

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

16 participants

@Amanieu@programmerjake@Lokathor@lyphyser@bjorn3@YurySolovyov@the8472@m-ou-se@rfcbot@joshtriplett@clarfonthey@tgross35@petrochenkov@konsumlamm@Noratrieb@ehuss

[8]ページ先頭

©2009-2025 Movatter.jp