Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: Modulate dispatched x86 CPU features#28896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
seiko2plus wants to merge4 commits intonumpy:main
base:main
Choose a base branch
Loading
fromseiko2plus:modulate_x86_features

Conversation

seiko2plus
Copy link
Member

@seiko2plusseiko2plus commentedMay 4, 2025
edited
Loading

Overview

This PR reorganizes NumPy's CPU build options by replacing individual x86 features with microarchitecture levels. This change aligns with the Google Highway project requirements and common Linux distribution practices.

This PR default setting forcpu-baseline on x86 has been raised tox86-64-v2 microarchitecture as we're in 2025 and adding SIMD compatibility for antiquated CPUs from before 2009 is no longer practical or efficient.
This can be changed tocpu-baseline=none during build time to support older CPUs, though manual SIMD optimizations for pre-2009 processors are no longer supported. This change improves performance and reduces binary
size while only affecting hardware that is over 15 years old.

Key Changes

  • Consolidated into Microarchitecture Groups: Replaced individual features withX86_V2,x86_V3, andx86_V4 groups
  • Adjusted Baseline: Set to micro-architecture level 2 (x86_V2), covering features from CPUs since 2009.
    This improves performance and reduces binary size
  • Improved- Operator: Corrected to properly exclude successor features
  • Backward Compatibility: Added redirection via meson for removed feature names
  • Stricter Compatibility: Features like AVX512 without full mask operations now considered unsupported rather than using fallbacks

Detailed CPU Feature Changes

  • Removed individual features (SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2,POPCNT) → now inX86_V2
  • Removed AMD legacy features (XOP,FMA4)
  • Removed Xeon Phi support (AVX512_KNL,AVX512_KNM)
  • Removed individual features (AVX,AVX2,FMA3,F16C) → now inX86_V3
  • RemovedAVX512F,AVX512CD (from dropping Xeon Phi support)
  • RenamedAVX512_SKX toX86_V4
  • Removed redundant groupsAVX512_CLX andAVX512_CNL
  • UpdatedAVX512_ICL to includeVAES,GFNI,VPCLMULQDQ

New Feature Group Hierarchy

NameImpliesIncludes
X86_V2SSE SSE2 SSE3 SSSE3 SSE4_1 SSE4_2 POPCNT CX16 LAHF
X86_V3X86_V2AVX AVX2 FMA3 BMI BMI2 LZCNT F16C MOVBE
X86_V4X86_V3AVX512F AVX512CD AVX512VL AVX512BW AVX512DQ
AVX512_ICLX86_V4AVX512VBMI AVX512VBMI2 AVX512VNNI AVX512BITALG AVX512VPOPCNTDQ AVX512IFMA VAES GFNI VPCLMULQDQ
AVX512_SPRAVX512_ICLAVX512FP16

CPU Generation Mapping

  • X86_V2: x86-64-v2 microarchitectures (CPUs since 2009)
  • X86_V3: x86-64-v3 microarchitectures (CPUs since 2015)
  • X86_V4: x86-64-v4 microarchitectures (AVX-512 capable CPUs)
  • AVX512_ICL: Intel Ice Lake and similar CPUs
  • AVX512_SPR: Intel Sapphire Rapids and newer CPUs

Note: On 32-bit x86,cx16 is excluded fromX86_V2.

Documentation

Documentation has been updated to reflect these changes and to fit the current meson build system.

closes#27851

r-devulap reacted with hooray emojirgommers reacted with rocket emoji
@seiko2plusseiko2plusforce-pushed themodulate_x86_features branch 9 times, most recently frome1ee011 to1cd1a0eCompareMay 14, 2025 16:42
@seiko2plusseiko2plus changed the titleWIP: MAINT: Modulate dispatched x86 CPU featuresENH: Modulate dispatched x86 CPU featuresMay 14, 2025
@seiko2plusseiko2plus marked this pull request as ready for reviewMay 14, 2025 17:09
@jorenham
Copy link
Member

needs a rebase

Copy link

@CopilotCopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull Request Overview

This PR reorganizes NumPy’s CPU build options by replacing individual x86 features with consolidated microarchitecture groups and bumps the baseline to x86-64-v2. Key changes include the introduction of X86_V2, X86_V3, and X86_V4 feature groups, updated enum values and CPU feature detection logic, and corresponding updates in tests and build configurations.

Reviewed Changes

Copilot reviewed 10 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
FileDescription
numpy/_core/tests/test_cpu_features.pyUpdates to feature groups and test definitions reflecting the new model.
numpy/_core/tests/test_cpu_dispatcher.pyAdjusts dispatcher tests to use new group names.
numpy/_core/src/common/npy_cpu_features.hRevises enum definitions to include new CPU groups.
numpy/_core/src/common/npy_cpu_features.cUpdates CPU feature detection logic and mappings.
meson_cpu/x86/test_x86_v[2-4].cIntroduces new tests for the respective microarchitecture groups.
doc/source/reference/simd/gen_features.pyRemoves deprecated code generation for CPU features documentation.
.github/workflows/linux_simd.ymlModifies build flags and cpu-dispatch settings to integrate new groups.
Files not reviewed (9)
  • doc/release/upcoming_changes/28896.change.rst: Language not supported
  • doc/source/reference/simd/generated_tables/compilers-diff.inc: Language not supported
  • doc/source/reference/simd/generated_tables/cpu_features.inc: Language not supported
  • doc/source/reference/simd/log_example.txt: Language not supported
  • meson.options: Language not supported
  • meson_cpu/meson.build: Language not supported
  • meson_cpu/x86/meson.build: Language not supported
  • numpy/_core/meson.build: Language not supported
  • numpy/_core/src/umath/loops_autovec.dispatch.c.src: Language not supported
Comments suppressed due to low confidence (1)

numpy/_core/src/common/npy_cpu_features.c:515

  • Review the modified condition for AVX512 OS support; verify that incorporating the avx_os check correctly handles systems without AVX OS support without unintended side effects.
if (!avx512_os && avx_os) {

@seiko2plusseiko2plus added component: SIMDIssues in SIMD (fast instruction sets) code or machinery 36 - BuildBuild related PR and removed 25 - WIP labelsMay 16, 2025
@tacaswell
Copy link
Contributor

If I am understanding

build from source and change the default baseline to cpu-baseline=none.

correctly, than the limitation on older machines depends on how the binaries are built?

For any of the downstream packagers (linux distros, conda-forge, etc) that is their responsibilities to sort out. For wheels I think there is a case to go more aggressively newer and push users towards better packaging ecosystems if they need to support older chips.

@seiko2plus
Copy link
MemberAuthor

You're right - we're bumping the defaultcpu-baseline and users can change this. I agree. Downstream packagers need to decide their own compatibility targets but we will no longer need to provide SIMD kernels belowx86-64-v2. See the updated documentation for clarification:

https://github.com/numpy/numpy/blob/729d61c84d087dccf17e4ef960391227aaa0c5b3/doc/source/reference/simd/build-options.rst#targeting-older-cpus

**IMPORTANT NOTE**: The default setting for `cpu-baseline`` on x86 has been raised to `x86-64-v2` microarchitecture.This can be changed to `cpu-baseline=none` during build time to support older CPUs, though manual SIMD optimizations for pre-2009 processors are no longer supported.This patch reorganizes CPU build options by replacing individual x86 features with microarchitecture levels.This change aligns with the Google Highway requirements and common Linux distribution practices.This patch:- Removes all individual x86 features and replaces them with three microarchitecture  levels (`X86_V2`, `X86_V3`, `X86_V4`) commonly used by Linux distributions- Raises the baseline to microarchitecture level 2 (replacing `SSE3`) since all known  x86 CPUs since 2009 support it. This improves performance and  reduces binary size- Updates documentation to to reflect these changes and to fit the current meson build system.- Corrects the behavior of the `-` operator, which now excludes successor features  that imply the excluded feature- Adds redirection via meson for removed feature names to avoid breaking builds- Removes compiler compatibility workarounds, so features like AVX512 without full mask operations  will be considered unsupported rather than providing fallbacksDetailed CPU features changes:- Removes individual features (`SSE`, `SSE2`, `SSE3`, `SSSE3`, `SSE4_1`, `SSE4_2`,  `POPCNT`) which are now part of the new group `X86_V2`- Removes AMD legacy features (`XOP`, `FMA4`)- Removes Xeon Phi support (`AVX512_KNL`, `AVX512_KNM`) which Intel has discontinued- Removes individual features (`AVX`, `AVX2`, `FMA3`, `F16C`) which are now part  of the new group `X86_V3`- Removes individual features `AVX512F`, `AVX512CD` as a result of dropping Xeon Phi support- Renames group `AVX512_SKX` to `x86_v4` to align with microarchitecture level naming- Removes groups `AVX512_CLX` and `AVX512_CNL` (features available via `AVX512_ICL`)- Updates `AVX512_ICL` to include features (`VAES`, `GFNI`, `VPCLMULQDQ`) for alignment  with HighwayNew Feature Group Hierarchy:```Name          | Implies     | Includes--------------|-------------|-----------------------------------------------------------X86_V2        |             | SSE SSE2 SSE3 SSSE3 SSE4_1 SSE4_2 POPCNT CX16 LAHFX86_V3        | X86_V2      | AVX AVX2 FMA3 BMI BMI2 LZCNT F16C MOVBEX86_V4        | X86_V3      | AVX512F AVX512CD AVX512VL AVX512BW AVX512DQAVX512_ICL    | X86_V4      | AVX512VBMI AVX512VBMI2 AVX512VNNI AVX512BITALG              |             | AVX512VPOPCNTDQ AVX512IFMA VAES GFNI VPCLMULQDQAVX512_SPR    | AVX512_ICL  | AVX512FP16```These groups correspond to CPU generations:- `X86_V2`: x86-64-v2 microarchitectures (CPUs since 2009)- `X86_V3`: x86-64-v3 microarchitectures (CPUs since 2015)- `X86_V4`: x86-64-v4 microarchitectures (AVX-512 capable CPUs)- `AVX512_ICL`: Intel Ice Lake and similar CPUs- `AVX512_SPR`: Intel Sapphire Rapids and newer CPUsNote: On 32-bit x86, `cx16` is excluded from `X86_V2`.
…64-v2 baselinePrevents "invalid value encountered in left_shift" warnings on clang-clwhen testing bit shifts with long types under x86-64-v2 baseline.
Force SSE-based floating-point on 32-bit x86 systems to fix inconsistentresults between einsum and other math functions. Prevents test failureswith int16 operations by avoiding the x87 FPU's extended precision.
@seiko2plus
Copy link
MemberAuthor

@tacaswell, I've updated the release note to be more clear about the CPU baseline change.

tacaswell reacted with thumbs up emoji

Copy link
MemberAuthor

@seiko2plusseiko2plus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Three cases require changes; the rest are backport notes and comments that may need your decision.

Comment on lines +605 to +620
npy__cpu_have[NPY_CPU_FEATURE_AVX512_KNL] = npy__cpu_have[NPY_CPU_FEATURE_AVX512F] &&
npy__cpu_have[NPY_CPU_FEATURE_AVX512CD] &&
npy__cpu_have[NPY_CPU_FEATURE_AVX512ER] &&
npy__cpu_have[NPY_CPU_FEATURE_AVX512PF];

npy__cpu_have[NPY_CPU_FEATURE_AVX512_KNM] = npy__cpu_have[NPY_CPU_FEATURE_AVX512_KNL] &&
npy__cpu_have[NPY_CPU_FEATURE_AVX5124FMAPS] &&
npy__cpu_have[NPY_CPU_FEATURE_AVX5124VNNIW] &&
npy__cpu_have[NPY_CPU_FEATURE_AVX512VPOPCNTDQ];

npy__cpu_have[NPY_CPU_FEATURE_AVX512_CLX] = npy__cpu_have[NPY_CPU_FEATURE_X86_V4] &&
npy__cpu_have[NPY_CPU_FEATURE_AVX512VNNI];

npy__cpu_have[NPY_CPU_FEATURE_AVX512_CNL] = npy__cpu_have[NPY_CPU_FEATURE_X86_V4] &&
npy__cpu_have[NPY_CPU_FEATURE_AVX512IFMA] &&
npy__cpu_have[NPY_CPU_FEATURE_AVX512VBMI];
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
npy__cpu_have[NPY_CPU_FEATURE_AVX512_KNL]=npy__cpu_have[NPY_CPU_FEATURE_AVX512F]&&
npy__cpu_have[NPY_CPU_FEATURE_AVX512CD]&&
npy__cpu_have[NPY_CPU_FEATURE_AVX512ER]&&
npy__cpu_have[NPY_CPU_FEATURE_AVX512PF];
npy__cpu_have[NPY_CPU_FEATURE_AVX512_KNM]=npy__cpu_have[NPY_CPU_FEATURE_AVX512_KNL]&&
npy__cpu_have[NPY_CPU_FEATURE_AVX5124FMAPS]&&
npy__cpu_have[NPY_CPU_FEATURE_AVX5124VNNIW]&&
npy__cpu_have[NPY_CPU_FEATURE_AVX512VPOPCNTDQ];
npy__cpu_have[NPY_CPU_FEATURE_AVX512_CLX]=npy__cpu_have[NPY_CPU_FEATURE_X86_V4]&&
npy__cpu_have[NPY_CPU_FEATURE_AVX512VNNI];
npy__cpu_have[NPY_CPU_FEATURE_AVX512_CNL]=npy__cpu_have[NPY_CPU_FEATURE_X86_V4]&&
npy__cpu_have[NPY_CPU_FEATURE_AVX512IFMA]&&
npy__cpu_have[NPY_CPU_FEATURE_AVX512VBMI];

Not sure if we should drop legacy groups. I kept them in case there are projects that depend on them via dict__cpu_features__

are not supported by the target CPU (raises Python runtime error).
- ``cpu-baseline``: The minimum set of CPU features required to run the compiled NumPy.

* Default: ``min`` (provides compatibility across a wide range of platforms)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
* Default: ``min`` (provides compatibility across a wide range of platforms)
* Default: ``min`` (provides compatibility across a wide range of platforms), see:ref:`special options<opt-special-options>` to check which min maps to for each architecture.

provides a ref for optionmin

@@ -207,8 +226,17 @@ foreach opt_name, conf : parse_options
endforeach
else
filterd = []
# filter out the features that are in the accumulate list
# including any successor features
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not excluding successor features should be considered a bug and needs to be back-ported.
For example, with cpu-dispatch before this patch, to exclude AVX512 you had to exclude all successor features, otherwise it would not be disabled:

python -m build --wheel -Csetup-args=-Dcpu-dispatch="max -avx512f -avx512cd \-avx512_knl -avx512_knm -avx512_skx -avx512_clx -avx512_cnl -avx512_icl -avx512_spr"

After this fix:

python -m build --wheel -Csetup-args=-Dcpu-dispatch="max -avx512f"

test_code: files(source_root + '/numpy/distutils/checks/cpu_avx512_cnl.c')[0]
)
HWY_SSE4_FLAGS = ['-DHWY_WANT_SSE4', '-DHWY_DISABLE_PCLMUL_AES']
# Use SSE for floating-point on x86-32 to ensure numeric consistency.
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This fix needs to be backported too.

test_code: files(current_dir + '/test_x86_v4.c')[0],
)
if cpu_family == 'x86'
X86_V4.update(disable: 'not supported on x86-32')
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This disable needs to be backported too. We should not generateAVX512 kernels on 32-bit systems.

AVX2.update(args: {'val': '/arch:AVX2', 'match': clear_arch})
AVX512_SKX.update(args: {'val': '/arch:AVX512', 'match': clear_arch})
X86_V3.update(args: {'val': '/arch:AVX2', 'match': clear_arch})
X86_V4.update(args: {'val': '/arch:AVX512', 'match': clear_arch})
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

By default, Highway considers AVX512 a broken platform on MSVC. We've managed to support AVX512 on MSVC through universal intrinsics. We have three options here:

  1. Keep it as-is: This leads to building AVX2 kernels with/arch:AVX512, which gives the compiler the opportunity to optimize (hopefully). Worst case: we'll have duplicate kernels.

  2. Disable AVX512 support entirely:

Suggested change
X86_V4.update(args: {'val': '/arch:AVX512', 'match': clear_arch})
X86_V4.update(disable: 'not supported by Highway')

3.Change the default behavior and force-enable AVX512 on MSVC and lets see how deep the rabbit-hole goes:

Suggested change
X86_V4.update(args: {'val': '/arch:AVX512', 'match': clear_arch})
X86_V4.update(args:[{'val': '/arch:AVX512', 'match': clear_arch}, '-DHWY_BROKEN_MSVC=0'])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think I'd prefer (2) or (3). If it's really very broken, disabling it seems fine (and low-effort). If it's feasible and there is energy for this, fixing it up in Highway over time seems nice.

(1) seems worst, if it doesn't do much, I'd much rather take the binary size gains from disabling it completely.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We still haven't converted many kernels yet from universal intrinsics to Highway, so adding extra burden isn't a wise decision. Let's go for option (2) for now and investigate later which MSVC versions are compatible with Highway.

- Disable X86_V4 (AVX-512) for MSVC builds due to Highway incompatibility- Add FIXME comment for future MSVC compatibility investigation- Update SIMD CI workflow to reflect x86-64-v2 baseline- Remove redundant test configurations- Add missing X86_V4 support to unary complex loops
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@rgommersrgommersrgommers left review comments

Copilot code reviewCopilotCopilot left review comments

@r-devulapr-devulapAwaiting requested review from r-devulap

Assignees
No one assigned
Labels
36 - BuildBuild related PRcomponent: SIMDIssues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

moving x86-64 feature baseline to SSE4.2?
4 participants
@seiko2plus@jorenham@tacaswell@rgommers

[8]ページ先頭

©2009-2025 Movatter.jp