CPU Build Options #

Overview#

NumPy provides configuration options to optimize performance based on CPU capabilities.These options allow you to specify which CPU features to support, balancing performance, compatibility, and binary size.This document explains how to use these options effectively across various CPU architectures.

Key Configuration Options#

NumPy uses several build options to control CPU optimizations:

cpu-baseline: The minimum set of CPU features required to run the compiled NumPy.
- Default:min (provides compatibility across a wide range of platforms)
- If your target CPU doesn’t support all specified baseline features, NumPy will fail to load with a Python runtime error
cpu-baseline-detect: controls detection of CPU baseline based on compilerflags. Default value isauto that enables detection if-march=or a similar compiler flag is used. The other possible values areenabledanddisabled to respective enable or disable it unconditionally.
cpu-dispatch: Additional CPU features for which optimized code paths will be generated.
- Default:max (enables all available optimizations)
- At runtime, NumPy will automatically select the fastest available code path based on your CPU’s capabilities
disable-optimization: Completely disables all CPU optimizations.
- Default:false (optimizations are enabled)
- When set totrue, disables all CPU optimized code including dispatch, SIMD, and loop unrolling
- Useful for debugging, testing, or in environments where optimization causes issues

These options are specified at build time via meson-python arguments:

pipinstall.-Csetup-args=-Dcpu-baseline="min"-Csetup-args=-Dcpu-dispatch="max"# or through spinspinbuild---Dcpu-baseline="min"-Dcpu-dispatch="max"

cpu-baseline andcpu-dispatch can be set to specificCPU groups, features, orspecial optionsthat perform specific actions. The following sections describe these options in detail.

Common Usage Scenarios#

Building for Local Use Only#

When building for your machine only and not planning to distribute:

python-mbuild--wheel-Csetup-args=-Dcpu-baseline="native"-Csetup-args=-Dcpu-dispatch="none"

This automatically detects and uses all CPU features available on your machine.

Note

A fatal error will be raised ifnative isn’t supported by the host platform.

Excluding Specific Features#

You may want to exclude certain CPU features from the dispatched features:

# For x86-64: exclude all AVX-512 featurespython-mbuild--wheel-Csetup-args=-Dcpu-dispatch="max -X86_V4"# For ARM64: exclude SVEpython-mbuild--wheel-Csetup-args=-Dcpu-dispatch="max -SVE"

Note

Excluding a feature will also exclude any successor features that areimplied by the excluded feature. For example, excludingX86_V4 willexcludeAVX512_ICL andAVX512_SPR as well.

Targeting Older CPUs#

Onx86-64, by default the baseline is set tomin which maps toX86_V2.This unsuitable for older CPUs (before 2009) or old virtual machines.To address this, set the baseline tonone:

python-mbuild--wheel-Csetup-args=-Dcpu-baseline="none"

This will create a build that is compatible with all x86 CPUs, butwithout any manual optimizations or SIMD code paths for the baseline.The build will rely only on dispatched code paths for optimization.

Targeting Newer CPUs#

Raising the baseline improves performance for two main reasons:

Dispatched kernels don’t cover all code paths
A higher baseline leads to smaller binary size as the compiler won’t generate code paths for excluded dispatched features

For CPUs from 2015 and newer, setting the baseline toX86_V3 may be suitable:

python-mbuild--wheel-Csetup-args=-Dcpu-baseline="min+X86_V3"

Supported CPU Features By Architecture#

NumPy supports optimized code paths for multiple CPU architectures. Below are the supported feature groups for each architecture.The name of the feature group can be used in the build optionscpu-baseline andcpu-dispatch.

X86#

Name	Implies	Includes
`X86_V2`		`SSESSE2SSE3SSSE3SSE4_1SSE4_2POPCNTCX16LAHF`
`X86_V3`	`X86_V2`	`AVXAVX2FMA3BMIBMI2LZCNTF16CMOVBE`
`X86_V4`	`X86_V3`	`AVX512FAVX512CDAVX512VLAVX512BWAVX512DQ`
`AVX512_ICL`	`X86_V4`	`AVX512VBMIAVX512VBMI2AVX512VNNIAVX512BITALGAVX512VPOPCNTDQAVX512IFMAVAESGFNIVPCLMULQDQ`
`AVX512_SPR`	`AVX512_ICL`	`AVX512FP16`

These groups correspond to CPU generations:

X86_V2: x86-64-v2 microarchitectures (CPUs since 2009)
X86_V3: x86-64-v3 microarchitectures (CPUs since 2015)
X86_V4: x86-64-v4 microarchitectures (AVX-512 capable CPUs)
AVX512_ICL: Intel Ice Lake and similar CPUs
AVX512_SPR: Intel Sapphire Rapids and newer CPUs

Note

On 32-bit x86,cx16 is excluded fromX86_V2.

On IBM/POWER big-endian#

Name	Implies
`VSX`
`VSX2`	`VSX`
`VSX3`	`VSXVSX2`
`VSX4`	`VSXVSX2VSX3`

On IBM/POWER little-endian#

Name	Implies
`VSX`	`VSX2`
`VSX2`	`VSX`
`VSX3`	`VSXVSX2`
`VSX4`	`VSXVSX2VSX3`

On ARMv7/A32#

Name	Implies
`NEON`
`NEON_FP16`	`NEON`
`NEON_VFPV4`	`NEONNEON_FP16`
`ASIMD`	`NEONNEON_FP16NEON_VFPV4`
`ASIMDHP`	`NEONNEON_FP16NEON_VFPV4ASIMD`
`ASIMDDP`	`NEONNEON_FP16NEON_VFPV4ASIMD`
`ASIMDFHM`	`NEONNEON_FP16NEON_VFPV4ASIMDASIMDHP`

On ARMv8/A64#

Name	Implies
`NEON`	`NEON_FP16NEON_VFPV4ASIMD`
`NEON_FP16`	`NEONNEON_VFPV4ASIMD`
`NEON_VFPV4`	`NEONNEON_FP16ASIMD`
`ASIMD`	`NEONNEON_FP16NEON_VFPV4`
`ASIMDHP`	`NEONNEON_FP16NEON_VFPV4ASIMD`
`ASIMDDP`	`NEONNEON_FP16NEON_VFPV4ASIMD`
`ASIMDFHM`	`NEONNEON_FP16NEON_VFPV4ASIMDASIMDHP`

On IBM/ZSYSTEM(S390X)#

Name	Implies
`VX`
`VXE`	`VX`
`VXE2`	`VXVXE`

On RISCV64#

Name	Implies
`RVV`

Special Options#

Beyond specific feature names, you can use these special values:

`NONE`#

Enables no features (equivalent to an empty string).

`NATIVE`#

Enables all features supported by the host CPU.

`DETECT`#

Detects the features enabled by the compiler. This option is appended by defaulttocpu-baseline if-march,-mcpu,-xhost, or/QxHost is set inthe environment variableCFLAGS unlesscpu-baseline-detect isdisabled.

`MIN`#

Enables the minimum CPU features for each architecture:

For Arch	Implies
x86 (32-bit)	`X86_V2`
x86-64	`X86_V2`
IBM/POWER (big-endian)	`NONE`
IBM/POWER (little-endian)	`VSXVSX2`
ARMv7/ARMHF	`NONE`
ARMv8/AArch64	`NEONNEON_FP16NEON_VFPV4ASIMD`
IBM/ZSYSTEM(S390X)	`NONE`
riscv64	`NONE`

`MAX`#

Enables all features supported by the compiler and platform.

Operator Operators (`-`/`+`)#

Remove or add specific features, useful withMAX,MIN, andNATIVE:

Adding a feature (+) includes all implied features
Removing a feature (-) excludes all successor features that imply the removed feature

Examples:

python-mbuild--wheel-Csetup-args=-Dcpu-dispatch="max-X86_V4"python-mbuild--wheel-Csetup-args=-Dcpu-baseline="min+X86_V4"

Usage And Behaviors#

Case Insensitivity#

CPU features and options are case-insensitive:

python-mbuild--wheel-Csetup-args=-Dcpu-dispatch="X86_v4"

Mixing Features across Architectures#

You can mix features from different architectures:

python-mbuild--wheel-Csetup-args=-Dcpu-baseline="X86_V4 VSX4 SVE"

Order Independence#

The order of specified features doesn’t matter:

python-mbuild--wheel-Csetup-args=-Dcpu-dispatch="SVE X86_V4 x86_v3"

Separators#

You can use spaces or commas as separators:

# All of these are equivalentpython-mbuild--wheel-Csetup-args=-Dcpu-dispatch="X86_V2 X86_V4"python-mbuild--wheel-Csetup-args=-Dcpu-dispatch=X86_V2,X86_V4

Feature Combination#

Features specified in options are automatically combined with all implied features:

python-mbuild--wheel-Csetup-args=-Dcpu-baseline=X86_V4

Equivalent to:

python-mbuild--wheel-Csetup-args=-Dcpu-baseline="X86_V2 X86_V3 X86_V4"

Baseline Overlapping#

Features specified incpu-baseline will be excluded from thecpu-dispatch features,along with their implied features, but without excluding successor features that imply them.

For instance, if you specifycpu-baseline="X86_V4", it will excludeX86_V4 and itsimplied featuresX86_V2 andX86_V3 from thecpu-dispatch features.

Compile-time Detection#

Specifying features tocpu-dispatch orcpu-baseline doesn’t explicitly enable them.Features are detected at compile time, and the maximum available features based on yourspecified options will be enabled according to toolchain and platform support.

This detection occurs by testing feature availability in the compiler through compile-timesource files containing common intrinsics for the specified features. If both the compilerand assembler support the feature, it will be enabled.

For example, if you specifycpu-dispatch="AVX512_ICL" but your compiler doesn’t support it,the feature will be excluded from the build. However, any implied features will still beenabled if they’re supported.

Platform differences#

Some exceptional conditions force us to link some features together when it come tocertain compilers or architectures, resulting in the impossibility of building them separately.

These conditions can be divided into two parts, as follows:

Architectural compatibility

The need to align certain CPU features that are assured to be supported bysuccessive generations of the same architecture, some cases:

On ppc64leVSX(ISA2.06) andVSX2(ISA2.07) both imply one another since thefirst generation that supports little-endian mode isPower-8(ISA2.07)
On AArch64NEONNEON_FP16NEON_VFPV4ASIMD implies each other since they are part of thehardware baseline.

For example:

# On ARMv8/A64, specify NEON is going to enable Advanced SIMD# and all predecessor extensionspython-mbuild--wheel-Csetup-args=-Dcpu-baseline=neon# which is equivalent topython-mbuild--wheel-Csetup-args=-Dcpu-baseline="neon neon_fp16 neon_vfpv4 asimd"

Note

Please take a deep look atSupported CPU Features By Architecture,in order to determine the features that imply one another.

Build report#

In most cases, the CPU build options do not produce any fatal errors that lead to hanging the build.Most of the errors that may appear in the build log serve as heavy warnings due to the lack of someexpected CPU features by the compiler.

So we strongly recommend checking the final report log, to be aware of what kind of CPU featuresare enabled and what are not.

You can find the final report of CPU optimizations by tracing meson build log,and here is how it looks on x86_64/gcc:

Testfeatures"X86_V2":SupportedTestfeatures"X86_V3":SupportedTestfeatures"X86_V4":SupportedTestfeatures"AVX512_ICL":SupportedTestfeatures"AVX512_SPR":SupportedConfiguringnpy_cpu_dispatch_config.husingconfigurationMessage:CPUOptimizationOptionsbaseline:Requested:minEnabled:X86_V2dispatch:Requested:maxEnabled:X86_V3X86_V4AVX512_ICLAVX512_SPRGeneratingmulti-targetsfor"_umath_tests.dispatch.h"Enabledtargets:X86_V3,baselineGeneratingmulti-targetsfor"argfunc.dispatch.h"Enabledtargets:X86_V4,X86_V3,baselineGeneratingmulti-targetsfor"x86_simd_argsort.dispatch.h"Enabledtargets:X86_V4,X86_V3Generatingmulti-targetsfor"x86_simd_qsort.dispatch.h"Enabledtargets:X86_V4,X86_V3Generatingmulti-targetsfor"x86_simd_qsort_16bit.dispatch.h"Enabledtargets:AVX512_SPR,AVX512_ICLGeneratingmulti-targetsfor"highway_qsort.dispatch.h"Enabledtargets:Generatingmulti-targetsfor"highway_qsort_16bit.dispatch.h"Enabledtargets:Generatingmulti-targetsfor"loops_arithm_fp.dispatch.h"Enabledtargets:X86_V3,baselineGeneratingmulti-targetsfor"loops_arithmetic.dispatch.h"Enabledtargets:X86_V4,X86_V3,baselineGeneratingmulti-targetsfor"loops_comparison.dispatch.h"Enabledtargets:X86_V4,X86_V3,baselineGeneratingmulti-targetsfor"loops_exponent_log.dispatch.h"Enabledtargets:X86_V4,X86_V3,baselineGeneratingmulti-targetsfor"loops_hyperbolic.dispatch.h"Enabledtargets:X86_V4,X86_V3,baselineGeneratingmulti-targetsfor"loops_logical.dispatch.h"Enabledtargets:X86_V4,X86_V3,baselineGeneratingmulti-targetsfor"loops_minmax.dispatch.h"Enabledtargets:X86_V4,X86_V3,baselineGeneratingmulti-targetsfor"loops_modulo.dispatch.h"Enabledtargets:baselineGeneratingmulti-targetsfor"loops_trigonometric.dispatch.h"Enabledtargets:X86_V4,X86_V3,baselineGeneratingmulti-targetsfor"loops_umath_fp.dispatch.h"Enabledtargets:X86_V4,baselineGeneratingmulti-targetsfor"loops_unary.dispatch.h"Enabledtargets:X86_V4,baselineGeneratingmulti-targetsfor"loops_unary_fp.dispatch.h"Enabledtargets:baselineGeneratingmulti-targetsfor"loops_unary_fp_le.dispatch.h"Enabledtargets:baselineGeneratingmulti-targetsfor"loops_unary_complex.dispatch.h"Enabledtargets:X86_V3,baselineGeneratingmulti-targetsfor"loops_autovec.dispatch.h"Enabledtargets:X86_V3,baselineGeneratingmulti-targetsfor"loops_half.dispatch.h"Enabledtargets:AVX512_SPR,X86_V4,baselineWARNING:Projecttargets'>=1.5.2'butusesfeaturedeprecatedsince'1.3.0':Sourcefilesrc/umath/svml/linux/avx512/svml_z0_acos_d_la.sinthe'objects'kwargisnotanobject..Generatingmulti-targetsfor"_simd.dispatch.h"Enabledtargets:X86_V3,X86_V4,baseline

Runtime Dispatch#

Importing NumPy triggers a scan of the available CPU features from the setof dispatchable features. You can restrict this scan by setting theenvironment variableNPY_DISABLE_CPU_FEATURES to a comma-, tab-, orspace-separated list of features to disable.

For instance, onx86_64 this will disableX86_V4:

NPY_DISABLE_CPU_FEATURES="X86_V4"

This will raise an error if parsing fails or if the feature was not enabled through thecpu-dispatch build option.If the feature is supported by the build but not available on the current CPU, a warning will be emitted instead.

Tracking Dispatched Functions#

You can discover which CPU targets are enabled for different optimized functions usingthe Python functionnumpy.lib.introspect.opt_func_info.

This function offers two optional arguments for filtering results:

func_name - For refining function names
signature - For specifying data types in the signatures

For example:

>>func_info=numpy.lib.introspect.opt_func_info(func_name='add|abs',signature='float64|complex64')>>print(json.dumps(func_info,indent=2)){"absolute":{"dd":{"current":"baseline(X86_V2)","available":"baseline(X86_V2)"},"Ff":{"current":"X86_V3","available":"X86_V3 baseline(X86_V2)"},"Dd":{"current":"X86_V3","available":"X86_V3 baseline(X86_V2)"}},"add":{"ddd":{"current":"X86_V3","available":"X86_V3 baseline(X86_V2)"},"FFF":{"current":"X86_V3","available":"X86_V3 baseline(X86_V2)"}}}

On this page

Movatterモバイル変換

CPU Build Options#

Overview#

Key Configuration Options#

Common Usage Scenarios#

Building for Local Use Only#

Excluding Specific Features#

Targeting Older CPUs#

Targeting Newer CPUs#

Supported CPU Features By Architecture#

X86#

On IBM/POWER big-endian#

On IBM/POWER little-endian#

On ARMv7/A32#

On ARMv8/A64#

On IBM/ZSYSTEM(S390X)#

On RISCV64#

Special Options#

NONE#

NATIVE#

DETECT#

MIN#

MAX#

Operator Operators (-/+)#

Usage And Behaviors#

Case Insensitivity#

Mixing Features across Architectures#

Order Independence#

Separators#

Feature Combination#

Baseline Overlapping#

Compile-time Detection#

Platform differences#

Build report#

Runtime Dispatch#

Tracking Dispatched Functions#

CPU Build Options #

`NONE`#

`NATIVE`#

`DETECT`#

`MIN`#

`MAX`#

Operator Operators (`-`/`+`)#