- Notifications
You must be signed in to change notification settings - Fork105
Open
Description
Inspired byrust-lang/rust#121960, I'm looking for SIMD intrinsics that are not inlined in generated code.
https://github.com/BurntSushi/aho-corasick/blob/master/src/packed/vector.rs#L19C1-L27C59:
/// # Safety////// All methods are not safe since they are intended to be implemented using/// vendor intrinsics, which are also not safe. Callers must ensure that/// the appropriate target features are enabled in the calling function,/// and that the current CPU supports them. All implementations should/// avoid marking the routines with `#[target_feature]` and instead mark/// them as `#[inline(always)]` to ensure they get appropriately inlined./// (`inline(always)` cannot be used with target_feature.)
It's not fully true: if you do not mark the routines with#[target_feature], LLVM will reject to inline them since it does not know if inlining causes ABI issues. So we need to use both#[target_feature] and#[inline(always)].
I find_mm256_loadu_si256 is failed to be inlined in my project and it also applies to the releasedcargo binary. I think it's another rustc bug at first but finallyobjdump leads me here.
Step to reproduce it:
Copy & paste the example in readme.
objdump ./target/release/play_rust -D --demangle | grep "core_arch" e0f1: e8 7a fd 06 00 call 7de70 <core::core_arch::x86::xsave::_xgetbv>0000000000029f70 <core::ptr::drop_in_place<&aho_corasick::packed::teddy::generic::Mask<core::core_arch::x86::__m128i>>>:000000000002eba0 <core::ptr::drop_in_place<core::core_arch::x86::__m128i>>:000000000002ebb0 <core::ptr::drop_in_place<core::core_arch::x86::__m256i>>:000000000002f0d0 <<core::core_arch::x86::__m128i as core::fmt::Debug>::fmt>:000000000002f120 <<core::core_arch::x86::__m256i as core::fmt::Debug>::fmt>:0000000000031810 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,1_usize>>>:0000000000031820 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,2_usize>>>:0000000000031830 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,3_usize>>>:0000000000031840 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,4_usize>>>: 48651: e8 7a 06 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48678: e8 53 06 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 486c6: e8 05 06 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 486ea: e8 e1 05 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48744: e8 87 05 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48762: e8 69 05 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48861: e8 6a 04 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48888: e8 43 04 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 488d4: e8 f7 03 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 488f2: e8 d9 03 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 489a5: e8 26 03 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 489c3: e8 08 03 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48a51: e8 7a 02 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48a78: e8 53 02 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48ac6: e8 05 02 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48aea: e8 e1 01 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48b44: e8 87 01 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48b68: e8 63 01 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48bc2: e8 09 01 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256> 48be0: e8 eb 00 00 00 call 48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>0000000000048cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>:000000000007de70 <core::core_arch::x86::xsave::_xgetbv>:Metadata
Metadata
Assignees
Labels
No labels