Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

moving x86-64 feature baseline to SSE4.2? #27851

Open
Labels
component: SIMDIssues in SIMD (fast instruction sets) code or machinery
@rgommers

Description

@rgommers

As of today, the SIMD "baseline" that we compile for goes up to SSE3, and any higher features are opt-in and runtime dispatched. SSE3 has been the maximum assumed feature for quite a while. We haven't reviewed this choice recently. At some point in the past we determined a rule of thumb saying that we could drop support for a particular feature (or lack thereof) if support for it dropped below 0.5%. That seems to be the case now for systems without SSE4.1 and SSE4.2.

Here is the full list of dispatchable targets and the features we currently build for each one, in the format "headers: enabled target list, e.g.:

Generating multi-targets for "_umath_tests.dispatch.h"   Enabled targets: AVX2, SSE41, baseline

Full set of dispatchable targets:

Generating multi-targets for "_umath_tests.dispatch.h"   Enabled targets: AVX2, SSE41, baselineGenerating multi-targets for "argfunc.dispatch.h"   Enabled targets: AVX512_SKX, AVX2, SSE42, baselineGenerating multi-targets for "x86_simd_argsort.dispatch.h"   Enabled targets: AVX512_SKX, AVX2Generating multi-targets for "x86_simd_qsort.dispatch.h"   Enabled targets: AVX512_SKX, AVX2Generating multi-targets for "x86_simd_qsort_16bit.dispatch.h"   Enabled targets: AVX512_SPR, AVX512_ICLGenerating multi-targets for "highway_qsort.dispatch.h"   Enabled targets: Generating multi-targets for "highway_qsort_16bit.dispatch.h"   Enabled targets: Generating multi-targets for "loops_arithm_fp.dispatch.h"   Enabled targets: AVX2, baselineGenerating multi-targets for "loops_arithmetic.dispatch.h"   Enabled targets: AVX512_SKX, AVX512F, AVX2, SSE41, baselineGenerating multi-targets for "loops_comparison.dispatch.h"   Enabled targets: AVX512_SKX, AVX512F, AVX2, SSE42, baselineGenerating multi-targets for "loops_exponent_log.dispatch.h"   Enabled targets: AVX512_SKX, AVX512F, AVX2, baselineGenerating multi-targets for "loops_hyperbolic.dispatch.h"   Enabled targets: AVX512_SKX, AVX2, baselineGenerating multi-targets for "loops_logical.dispatch.h"   Enabled targets: AVX512_SKX, AVX2, baselineGenerating multi-targets for "loops_minmax.dispatch.h"   Enabled targets: AVX512_SKX, AVX2, baselineGenerating multi-targets for "loops_modulo.dispatch.h"   Enabled targets: baselineGenerating multi-targets for "loops_trigonometric.dispatch.h"   Enabled targets: AVX512_SKX, AVX2, baselineGenerating multi-targets for "loops_umath_fp.dispatch.h"   Enabled targets: AVX512_SKX, baselineGenerating multi-targets for "loops_unary.dispatch.h"   Enabled targets: AVX512_SKX, AVX2, baselineGenerating multi-targets for "loops_unary_fp.dispatch.h"   Enabled targets: SSE41, baselineGenerating multi-targets for "loops_unary_fp_le.dispatch.h"   Enabled targets: SSE41, baselineGenerating multi-targets for "loops_unary_complex.dispatch.h"   Enabled targets: AVX512F, AVX2, baselineGenerating multi-targets for "loops_autovec.dispatch.h"   Enabled targets: AVX2, baselineGenerating multi-targets for "_simd.dispatch.h"   Enabled targets: SSE42, AVX2, FMA3, AVX512F, AVX512_SKX, baseline

The most widely used data source for determining what hardware is out there is, I believe,https://store.steampowered.com/hwsurvey/?platform=combined. That currently says that SSE3 is at 100%, SSE4.1 at 99.78% and SSE4.2 at 99.70%. Meaning that if we bump the baseline up to SSE4.2, we'd only be dropping support for ~0.3% of systems with really old CPUs.

image

For more context, SSE4.2 was introduced in 2008, and even Windows 11 (v2024H2) now requires it (xrefhttps://en.wikipedia.org/wiki/SSE4#SSE4.2).

Now the other side of this coin is - what do we gain by making this change? I haven't quantified each item, but the basic answer is:

  • Reduces build time on x86-64: 40% of build targets (206/517) on my 6 year Intel CPU with AVX512 are SIMD targets. We can trim off a decent fraction of those.
  • Reduces binary size:numpy/_core/_simd.so currently is 3.1 MB out of 39.9 MB on disk for a Linux release build. Looking at the multi-targets list higher up, it looks like we can trim that a fair bit.
  • Reduces number of variations that should be tested in CI (linux_simd.yml). Given the current config, we can't actually drop a job, but we do make the test coverage higher (there are current zero test configs for baseline + SSE4.1/2).

I'd suggest making the change inmain this release cycle, meaning for numpy 2.3.0, which will probably be released in June 2025.

Hat tip to@itamarst for bringing up this topic (xrefscientific-python/faster-scientific-python-ideas#11).

Metadata

Metadata

Assignees

No one assigned

    Labels

    component: SIMDIssues in SIMD (fast instruction sets) code or machinery

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp