Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit9759ddc

Browse files
committed
ENH, SIMD: Initial implementation of Highway wrapper
A thin wrapper over Google's Highway SIMD library to simplify its interface.This commit provides the implementation of that wrapper, consisting of:- simd.hpp: Main header defining the SIMD namespaces and configuration- simd.inc.hpp: Template header included multiple times with different namespacesThe wrapper eliminates Highway's class tags by:- Using lane types directly which can be deduced from arguments- Leveraging namespaces (np::simd and np::simd128) for different register widthsA README is included to guide usage and document design decisions.
1 parent295e2d5 commit9759ddc

File tree

3 files changed

+440
-0
lines changed

3 files changed

+440
-0
lines changed

‎numpy/_core/src/common/simd/README.md

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
#NumPy SIMD Wrapper for Highway
2+
3+
This directory contains a lightweight C++ wrapper over Google's[Highway](https://github.com/google/highway) SIMD library, designed specifically for NumPy's needs.
4+
5+
>**Note**: This directory also contains the C interface of universal intrinsics (under`simd.h`) which is no longer supported. The Highway wrapper described in this document should be used instead for all new SIMD code.
6+
7+
##Overview
8+
9+
The wrapper simplifies Highway's SIMD interface by eliminating class tags and using lane types directly, which can be deduced from arguments in most cases. This design makes the SIMD code more intuitive and easier to maintain while still leveraging Highway generic intrinsics.
10+
11+
##Architecture
12+
13+
The wrapper consists of two main headers:
14+
15+
1.`simd.hpp`: The main header that defines namespaces and includes configuration macros
16+
2.`simd.inc.hpp`: Implementation details included by`simd.hpp` multiple times for different namespaces
17+
18+
Additionally, this directory contains legacy C interface files for universal intrinsics (`simd.h` and related files) which are deprecated and should not be used for new code. All new SIMD code should use the Highway wrapper.
19+
20+
21+
##Usage
22+
23+
###Basic Usage
24+
25+
```cpp
26+
#include"simd/simd.hpp"
27+
28+
// Use np::simd for maximum width SIMD operations
29+
usingnamespacenp::simd;
30+
float*data = /* ...*/;
31+
Vec<float> v = LoadU(data);
32+
v = Add(v, v);
33+
StoreU(v, data);
34+
35+
// Use np::simd128 for fixed 128-bit SIMD operations
36+
using namespace np::simd128;
37+
Vec<float> v128 = LoadU(data);
38+
v128 = Add(v128, v128);
39+
StoreU(v128, data);
40+
```
41+
42+
### Checking for SIMD Support
43+
44+
```cpp
45+
#include "simd/simd.hpp"
46+
47+
// Check if SIMD is enabled
48+
#if NPY_SIMDX
49+
// SIMD code
50+
#else
51+
// Scalar fallback code
52+
#endif
53+
54+
// Check for float64 support
55+
#if NPY_SIMDX_F64
56+
// Use float64 SIMD operations
57+
#endif
58+
59+
// Check for FMA support
60+
#if NPY_SIMDX_FMA
61+
// Use FMA operations
62+
#endif
63+
```
64+
65+
##Type Support and Constraints
66+
67+
The wrapper provides type constraints to help with SFINAE (Substitution Failure Is Not An Error) and compile-time type checking:
68+
69+
-`kSupportLane<TLane>`: Determines whether the specified lane type is supported by the SIMD extension.
70+
```cpp
71+
// Base template - always defined, even when SIMD is not enabled (for SFINAE)
72+
template<typename TLane>
73+
constexprboolkSupportLane = NPY_SIMDX !=0;
74+
template<>
75+
constexprboolkSupportLane<double> = NPY_SIMDX_F64 !=0;
76+
```
77+
78+
79+
```cpp
80+
#include"simd/simd.hpp"
81+
82+
// Check if float64 operations are supported
83+
ifconstexpr (np::simd::kSupportLane<double>) {
84+
// Use float64 operations
85+
}
86+
```
87+
88+
These constraints allow for compile-time checking of which lane types are supported, which can be used in SFINAE contexts to enable or disable functions based on type support.
89+
90+
##Available Operations
91+
92+
The wrapper provides the following common operations that are used in NumPy:
93+
94+
- Vector creation operations:
95+
-`Zero`: Returns a vector with all lanes set to zero
96+
-`Set`: Returns a vector with all lanes set to the given value
97+
-`Undefined`: Returns an uninitialized vector
98+
99+
- Memory operations:
100+
-`LoadU`: Unaligned load of a vector from memory
101+
-`StoreU`: Unaligned store of a vector to memory
102+
103+
- Vector information:
104+
-`Lanes`: Returns the number of vector lanes based on the lane type
105+
106+
- Type conversion:
107+
-`BitCast`: Reinterprets a vector to a different type without modifying the underlying data
108+
-`VecFromMask`: Converts a mask to a vector
109+
110+
- Comparison operations:
111+
-`Eq`: Element-wise equality comparison
112+
-`Le`: Element-wise less than or equal comparison
113+
-`Lt`: Element-wise less than comparison
114+
-`Gt`: Element-wise greater than comparison
115+
-`Ge`: Element-wise greater than or equal comparison
116+
117+
- Arithmetic operations:
118+
-`Add`: Element-wise addition
119+
-`Sub`: Element-wise subtraction
120+
-`Mul`: Element-wise multiplication
121+
-`Div`: Element-wise division
122+
-`Min`: Element-wise minimum
123+
-`Max`: Element-wise maximum
124+
-`Abs`: Element-wise absolute value
125+
-`Sqrt`: Element-wise square root
126+
127+
- Logical operations:
128+
-`And`: Bitwise AND
129+
-`Or`: Bitwise OR
130+
-`Xor`: Bitwise XOR
131+
-`AndNot`: Bitwise AND NOT (a &~b)
132+
133+
Additional Highway operations can be accessed via the`hn` namespace alias inside the`simd` or`simd128` namespaces.
134+
135+
##Extending
136+
137+
To add more operations from Highway:
138+
139+
1. Import them in the`simd.inc.hpp` file using the`using` directive if they don't require a tag:
140+
```cpp
141+
// For operations that don't require a tag
142+
using hn::FunctionName;
143+
```
144+
145+
2. Define wrapper functions for intrinsics that require a class tag:
146+
```cpp
147+
// For operations that require a tag
148+
template<typename TLane>
149+
HWY_API ReturnTypeFunctionName(Args... args) {
150+
return hn::FunctionName(_Tag<TLane>(), args...);
151+
}
152+
```
153+
154+
3. Add appropriate documentation and SFINAE constraints if needed
155+
156+
157+
## Build Configuration
158+
159+
The SIMD wrapper automatically disables SIMD operations when optimizations are disabled:
160+
161+
- When `NPY_DISABLE_OPTIMIZATION` is defined, SIMD operations are disabled
162+
- SIMD is enabled only when the Highway target is not scalar (`HWY_TARGET != HWY_SCALAR`)
163+
164+
## Design Notes
165+
166+
1. **Why avoid Highway scalar operations?**
167+
- NumPy already provides kernels for scalar operations
168+
- Compilers can better optimize standard library implementations
169+
- Not all Highway intrinsics are fully supported in scalar mode
170+
171+
2. **Legacy Universal Intrinsics**
172+
- The older universal intrinsics C interface (in `simd.h` and accessible via `NPY_SIMD` macros) is deprecated
173+
- All new SIMD code should use this Highway-based wrapper (accessible via `NPY_SIMDX` macros)
174+
- The legacy code is maintained for compatibility but will eventually be removed
175+
176+
3. **Feature Detection Constants vs. Highway Constants**
177+
- NumPy-specific constants (`NPY_SIMDX_F16`, `NPY_SIMDX_F64`, `NPY_SIMDX_FMA`) provide additional safety beyond raw Highway constants
178+
- Highway constants (e.g., `HWY_HAVE_FLOAT16`) only check hardware capabilities but don't consider NumPy's build configuration
179+
- Our constants combine both checks:
180+
```cpp
181+
#define NPY_SIMDX_F16 (NPY_SIMDX && HWY_HAVE_FLOAT16)
182+
```
183+
- This ensures SIMD features won't be used when:
184+
- Hardware supports it but NumPy optimization is disabled via meson option:
185+
```
186+
option('disable-optimization', type: 'boolean', value: false,
187+
description: 'Disable CPU optimized code (dispatch,simd,unroll...)')
188+
```
189+
- Highway target is scalar (`HWY_TARGET == HWY_SCALAR`)
190+
- Using these constants ensures consistent behavior across different compilation settings
191+
- Without this additional layer, code might incorrectly try to use SIMD paths in scalar mode
192+
193+
4. **Namespace Design**
194+
- `np::simd`: Maximum width SIMD operations (scalable)
195+
- `np::simd128`: Fixed 128-bit SIMD operations
196+
- `hn`: Highway namespace alias (available within the SIMD namespaces)
197+
198+
5. **Why Namespaces and Why Not Just Use Highway Directly?**
199+
- Highway's design uses class tag types as template parameters (e.g., `Vec<ScalableTag<float>>`) when defining vector types
200+
- Many Highway functions require explicitly passing a tag instance as the first parameter
201+
- This class tag-based approach increases verbosity and complexity in user code
202+
- Our wrapper eliminates this by internally managing tags through namespaces, letting users directly use types e.g. `Vec<float>`
203+
- Simple example with raw Highway:
204+
```cpp
205+
// Highway's approach
206+
float *data = /* ... */;
207+
208+
namespace hn = hwy::HWY_NAMESPACE;
209+
using namespace hn;
210+
211+
// Full-width operations
212+
ScalableTag<float> df; // Create a tag instance
213+
Vec<decltype(df)> v = LoadU(df, data); // LoadU requires a tag instance
214+
StoreU(v, df, data); // StoreU requires a tag instance
215+
216+
// 128-bit operations
217+
Full128<float> df128; // Create a 128-bit tag instance
218+
Vec<decltype(df128)> v128 = LoadU(df128, data); // LoadU requires a tag instance
219+
StoreU(v128, df128, data); // StoreU requires a tag instance
220+
```
221+
222+
- Simple example with our wrapper:
223+
```cpp
224+
// Our wrapper approach
225+
float *data = /* ... */;
226+
227+
// Full-width operations
228+
using namespace np::simd;
229+
Vec<float> v = LoadU(data); // Full-width vector load
230+
StoreU(v, data);
231+
232+
// 128-bit operations
233+
using namespace np::simd128;
234+
Vec<float> v128 = LoadU(data); // 128-bit vector load
235+
StoreU(v128, data);
236+
```
237+
238+
- The namespaced approach simplifies code, reduces errors, and provides a more intuitive interface
239+
- It preserves all Highway operations benefits while reducing cognitive overhead
240+
241+
5. **Why Namespaces Are Essential for This Design?**
242+
- Namespaces allow us to define different internal tag types (`hn::ScalableTag<TLane>` in `np::simd` vs `hn::Full128<TLane>` in `np::simd128`)
243+
- This provides a consistent type-based interface (`Vec<float>`) without requiring users to manually create tags
244+
- Enables using the same function names (like `LoadU`) with different implementations based on SIMD width
245+
- Without namespaces, we'd have to either reintroduce tags (defeating the purpose of the wrapper) or create different function names for each variant (e.g., `LoadU` vs `LoadU128`)
246+
247+
6. **Template Type Parameters**
248+
- `TLane`: The scalar type for each vector lane (e.g., uint8_t, float, double)
249+
250+
251+
## Requirements
252+
253+
- C++17 or later
254+
- Google Highway library
255+
256+
## License
257+
258+
Same as NumPy's license

‎numpy/_core/src/common/simd/simd.hpp

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
#ifndef NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_
2+
#defineNUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_
3+
4+
/**
5+
* This header provides a thin wrapper over Google's Highway SIMD library.
6+
*
7+
* The wrapper aims to simplify the SIMD interface of Google's Highway by
8+
* get ride of its class tags and use lane types directly which can be deduced
9+
* from the args in most cases.
10+
*/
11+
/**
12+
* Since `NPY_SIMD` is only limited to NumPy C universal intrinsics,
13+
* `NPY_SIMDX` is defined to indicate the SIMD availability for Google's Highway
14+
* C++ code.
15+
*
16+
* Highway SIMD is only available when optimization is enabled.
17+
* When NPY_DISABLE_OPTIMIZATION is defined, SIMD operations are disabled
18+
* and the code falls back to scalar implementations.
19+
*/
20+
#ifndef NPY_DISABLE_OPTIMIZATION
21+
#include<hwy/highway.h>
22+
23+
/**
24+
* We avoid using Highway scalar operations for the following reasons:
25+
* 1. We already provide kernels for scalar operations, so falling back to
26+
* the NumPy implementation is more appropriate. Compilers can often
27+
* optimize these better since they rely on standard libraries.
28+
* 2. Not all Highway intrinsics are fully supported in scalar mode.
29+
*
30+
* Therefore, we only enable SIMD when the Highway target is not scalar.
31+
*/
32+
#defineNPY_SIMDX (HWY_TARGET != HWY_SCALAR)
33+
34+
// Indicates if the SIMD operations are available for float16.
35+
#defineNPY_SIMDX_F16 (NPY_SIMDX && HWY_HAVE_FLOAT16)
36+
// Note: Highway requires SIMD extentions with native float32 support, so we don't need
37+
// to check for it.
38+
39+
// Indicates if the SIMD operations are available for float64.
40+
#defineNPY_SIMDX_F64 (NPY_SIMDX && HWY_HAVE_FLOAT64)
41+
42+
// Indicates if the SIMD floating operations are natively supports fma.
43+
#defineNPY_SIMDX_FMA (NPY_SIMDX && HWY_NATIVE_FMA)
44+
45+
#else
46+
#defineNPY_SIMDX0
47+
#defineNPY_SIMDX_F160
48+
#defineNPY_SIMDX_F640
49+
#defineNPY_SIMDX_FMA0
50+
#endif
51+
52+
namespacenp {
53+
54+
/// Represents the max SIMD width supported by the platform.
55+
namespacesimd {
56+
#if NPY_SIMDX
57+
/// The highway namespace alias.
58+
/// We can not import all the symbols from the HWY_NAMESPACE because it will
59+
/// conflict with the existing symbols in the numpy namespace.
60+
namespacehn= hwy::HWY_NAMESPACE;
61+
// internaly used by the template header
62+
template<typename TLane>
63+
using _Tag = hn::ScalableTag<TLane>;
64+
#endif
65+
#include"simd.inc.hpp"
66+
}// namespace simd
67+
68+
/// Represents the 128-bit SIMD width.
69+
namespacesimd128 {
70+
#if NPY_SIMDX
71+
namespacehn= hwy::HWY_NAMESPACE;
72+
template<typename TLane>
73+
using _Tag = hn::Full128<TLane>;
74+
#endif
75+
#include"simd.inc.hpp"
76+
}// namespace simd128
77+
78+
}// namespace np
79+
80+
#endif// NUMPY__CORE_SRC_COMMON_SIMD_SIMD_HPP_

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp