Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

_mm256_loadu_si256 is failed to be inlined for ABI issues #140

Open
@usamoi

Description

@usamoi

Inspired byrust-lang/rust#121960, I'm looking for SIMD intrinsics that are not inlined in generated code.

https://github.com/BurntSushi/aho-corasick/blob/master/src/packed/vector.rs#L19C1-L27C59:

/// # Safety////// All methods are not safe since they are intended to be implemented using/// vendor intrinsics, which are also not safe. Callers must ensure that/// the appropriate target features are enabled in the calling function,/// and that the current CPU supports them. All implementations should/// avoid marking the routines with `#[target_feature]` and instead mark/// them as `#[inline(always)]` to ensure they get appropriately inlined./// (`inline(always)` cannot be used with target_feature.)

It's not fully true: if you do not mark the routines with#[target_feature], LLVM will reject to inline them since it does not know if inlining causes ABI issues. So we need to use both#[target_feature] and#[inline(always)].

I find_mm256_loadu_si256 is failed to be inlined in my project and it also applies to the releasedcargo binary. I think it's another rustc bug at first but finallyobjdump leads me here.

Step to reproduce it:
Copy & paste the example in readme.

objdump ./target/release/play_rust -D --demangle | grep "core_arch"    e0f1:       e8 7a fd 06 00          call   7de70 <core::core_arch::x86::xsave::_xgetbv>0000000000029f70 <core::ptr::drop_in_place<&aho_corasick::packed::teddy::generic::Mask<core::core_arch::x86::__m128i>>>:000000000002eba0 <core::ptr::drop_in_place<core::core_arch::x86::__m128i>>:000000000002ebb0 <core::ptr::drop_in_place<core::core_arch::x86::__m256i>>:000000000002f0d0 <<core::core_arch::x86::__m128i as core::fmt::Debug>::fmt>:000000000002f120 <<core::core_arch::x86::__m256i as core::fmt::Debug>::fmt>:0000000000031810 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,1_usize>>>:0000000000031820 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,2_usize>>>:0000000000031830 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,3_usize>>>:0000000000031840 <core::ptr::drop_in_place<aho_corasick::packed::teddy::generic::Slim<core::core_arch::x86::__m128i,4_usize>>>:   48651:       e8 7a 06 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48678:       e8 53 06 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   486c6:       e8 05 06 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   486ea:       e8 e1 05 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48744:       e8 87 05 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48762:       e8 69 05 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48861:       e8 6a 04 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48888:       e8 43 04 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   488d4:       e8 f7 03 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   488f2:       e8 d9 03 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   489a5:       e8 26 03 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   489c3:       e8 08 03 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48a51:       e8 7a 02 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48a78:       e8 53 02 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48ac6:       e8 05 02 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48aea:       e8 e1 01 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48b44:       e8 87 01 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48b68:       e8 63 01 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48bc2:       e8 09 01 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>   48be0:       e8 eb 00 00 00          call   48cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>0000000000048cd0 <core::core_arch::x86::avx::_mm256_loadu_si256>:000000000007de70 <core::core_arch::x86::xsave::_xgetbv>:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp