Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork10.9k
ENH, SIMD: Initial implementation of Highway wrapper#28622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I was morbidly curious and read theREADME
so added minor copyedits.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Hi, Could you share the expected timeline or next steps for merging this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
First read, will process more over time 😸
Uh oh!
There was an error while loading.Please reload this page.
numpy/_core/src/common/simd/simd.hpp Outdated
* | ||
* Therefore, we only enable SIMD when the Highway target is not scalar. | ||
*/ | ||
#define NPY_SIMDX (HWY_TARGET != HWY_SCALAR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Doesn't this mean we'll have dispatch sources with scalar implementations of our own?
Potentially, this should raise a compile error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
When HWY_TARGET == HWY_SCALAR, there are two possible scenarios:
- The SIMD extension or architecture isn't supported by Highway
- Highway considers it a broken platform or compiler (which is fine if the feature is part of the baseline)
For the second case, we have two options:
- Extend meson to detect this behavior
- Consider it normal behavior since the compiler still has a chance to optimize scalar operations based on the compiler flags defined by meson CPU dispatcher.
numpy/_core/src/common/simd/simd.hpp Outdated
// Indicates if the SIMD operations are available for float16. | ||
#define NPY_SIMDX_F16 (NPY_SIMDX && HWY_HAVE_FLOAT16) | ||
// Note: Highway requires SIMD extentions with native float32 support, so we don't need | ||
// to check for it. | ||
// Indicates if the SIMD operations are available for float64. | ||
#define NPY_SIMDX_F64 (NPY_SIMDX && HWY_HAVE_FLOAT64) | ||
// Indicates if the SIMD floating operations are natively supports fma. | ||
#define NPY_SIMDX_FMA (NPY_SIMDX && HWY_NATIVE_FMA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
These seem to just wrap theHWY
constants withNPY_SIMDX
, so if we removeNPY_SIMDX
we can use theHWY
variants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We can use HWY variants, but under#if NPY_SIMDX
, please check design notes3. Feature Detection Constants vs. Highway Constants
Uh oh!
There was an error while loading.Please reload this page.
@tylerjereddy, Thanks, I have addressed them. Feel free to give it a second round.
@ixgbe, I'm was aiming to get feedback on the approach first that outlined in your PR#28490 but so far no strong opinions against it, then we can move forward with merging. Once this PR gets merged we can follow on your PR as fast as possible. Please, let me know if you have any specific suggestions or any concerns against this approach.
@Mousius, Thank you! I've updated the README to cover your questions. Please give it a second read. |
Uh oh!
There was an error while loading.Please reload this page.
|
Uh oh!
There was an error while loading.Please reload this page.
when I compile numpy source code(#28608) on RVV1.0 platform, We’ve identified two pain points that require further investigation: 1、 |
seiko2plus commentedApr 12, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@ixgbe, constexprint UNROLL = (NPY_SIMD ==128 ?4 :2); to: constexprintkUnrollSize =kMaxLanes<uint8_t> ==16 ?4 :2; For |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
18c6085
toc7884f0
CompareThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Minor comments but mostly LGTM. I still need to wrap my head around it and will give it a second read tomorrow.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I have had some time to sit on this a little bit and here are my thoughts. As far as I understand, the main goal of having a wrapper around highway intrinsics is to get rid of the tags that highway ops require. For example, this piece of code
namespace hn = hwy::HWY_NAMESPACE;template<typename T>HWY_ATTR void add(T* src, T* dst, size_t N) { const hn::ScalableTag<T> ftype; hn::Vec<decltype(ftype)> v = hn::LoadU(ftype, src); v = hn::Add(v, v); hn::StoreU(v, ftype, dst);}
can be written without tags:
using namespace np::simd;template<typename T>void add(T* src, T* dst, size_t N) { Vec<T> v = LoadU(src); v = Add(v, v); StoreU(v, dst);}
I do agree that the latter way looks cleaner and avoids having to redefine tags and highway vector for every kernel (like here:#28490 (comment)), I do have to point out couple of downsides of having this wrapper:
- This adds an extra layer of abstraction for the intrinsics that needs to be maintained and potentially add an extra layer to debug.
- If we want all our highway kernels to be tag free, then this wrapper needs to be expanded with time for all highway ops used which is extra work.
For example, if we want the kernel defined inloops_hyperbolic.dispatch.cpp.src
to be tags free, the wrapper needs to expanded to include all these highway ops:
hn::InterleaveLowerhn::TwoTablesLookupLaneshn::RebindMaskhn::GatherIndexhn::ConcatLowerLowerhn::BitCasthn::Subhn::Mul
While the approach proposed here avoids explicit tags in Highway operations, the added abstraction layer introduces maintenance overhead without significant benefits. Directly using Highway's native tag syntax (e.g.,hn::LoadU(ftype, src)
) appears sufficiently readable and not hard to write.
Here is my proposed alternative:
- Centralize type definitions in simd.hpp with predefined tags/vectors
- Reuse these definitions across all Highway kernels
- Avoid op-specific expansions through generic implementations
This way we avoid introducing an extra abstraction layer and eliminate the need to expand wrappers for every Highway operation.
@seiko2plus Do you think this would be a simpler and more maintainable solution? ping@jan-wassenberg and@Mousius for their opinions.
That makes sense to me. Centralized variable definitions for the tags would avoid the (minor) problem of people choosing different names for them, and shorten the code a bit. There was also a concern that tags complicate python bindings because many more wrapper functions would have to be generated there - one for half, quarter vectors etc. I suppose we can for purposes of that python binding only export the full-vector version that the currently proposed wrapper functions here would provide. |
seiko2plus commentedApr 29, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I appreciate your thoughtful analysis. You make valid points about the trade-offs involved with this wrapper approach. Also, modern C++17 features allow us to introduce simpler constants to deal with SFINAE with cleaner code: e.g.,
This wrapper is exceptionally thin. All functions are simple one-liners that just forward to Highway with the appropriate tag. There's very little to maintain or debug in the wrapper itself. If an issue occurs, the debugging process is straightforward - the problem will either be in the calling code or in Highway itself, with the wrapper simply passing values through. The mapping between wrapper functions and Highway functions is direct and transparent. See: numpy/numpy/_core/src/common/simd/simd.inc.hpp Lines 47 to 52 inc7884f0
Not all Highway operations need to be functionally wrapped. Most of Highway operations are already tag-free; extra work is nothing compared to the benefit of having cleaner kernels. Developers can still access the full Highway API through the numpy/numpy/_core/src/common/simd/README.md Lines 142 to 157 inc7884f0
The benefits are clear: code becomes more concise, more readable, less prone to errors and encourages users to produce simpler code. As we decided to replace C template sources
Not sure what the point of predefining tags is? Why not just Using your previous example, your suggestion would be like this: template<typename T>voidadd(T* src, T* dst,size_t N) {// ScalableVec<T> v = hn::LoadU(kScalable<T>, src);// bad practice to use auto everywhere but can we tolerate it?auto v =hn::LoadU(kScalable<T>, src); v =hn::Add(v, v);hn::StoreU(v,kScalable<T>, dst);} How about
I'm more concerned about maintaining the kernels than the wrapper itself.
@jan-wassenberg, Thank you for your input. If we moved the wrapper to Highway to replace |
One quick comment:
I'd advise against using auto everywhere. Kernels that involve range reduction may involve multiple types (at least int32 and f32, possibly more), and I think it's helpful to see the type mentioned in the code. |
This is potentially my most pressing question: Will it continue to be thin? It seems we'll have a lot of thin wrappers. @jan-wassenberg do you see issues with these being contributed directly to Highway? I'd imagine a lot of the C++17 stuff is a welcome addition to the macro variants? I'm not seeing anything that is NumPy-centric or needs to be NumPy-centric just yet, and it appears we already know roughly what we'd like to contribute.
I'd also suggest that the extra work may not be necessary if it's this minimal for users. I prefer@r-devulap's ideas of centralised tags for NumPy types and explicit typing in the SIMD kernels.
This could be debated. As@jan-wassenberg has mentioned above, it is valuable to see the types and conversions explicitly in SIMD kernels to avoid errors. The code itself may contain fewer characters, but that doesn't necessarily correlate with being easier to read. Personally, I've found the verbosity of the SIMD kernels justified due to their details. |
Totally agree with you as mentioned "bad practice". Based on@r-devulap's idea of using centralized tags, the most simplified example would be like: template<typename T>voidadd(T* src, T* dst,size_t N) { ScalableVec<T> v =hn::LoadU(kScalable<T>, src); v =hn::Add(v, v);hn::StoreU(v,kScalable<T>, dst);} And that's why I believe simplifying Highway will encourages users to produce more clearer and simple code.
Yes definitely. We're just erasing tags and using modern C++ features. We're in 2025 - if we have a chance to make our meta-programming much simpler by leveraging modern C++ features, we should take it. Highway had legacy reasons to support C++11, but we don't.
I agree that most of this wrapper isn't NumPy-specific and could benefit the broader Highway ecosystem. That's why it would be great to move it to Highway as we agreed before.
I think you misunderstood my comment#28622 (comment). Let me explain with more details: @r-devulap suggested using centralized tags, and my response was "there's no need for them, what's the point of having them?" Instead, we can use template variables (introduced in C++14) rather than explicitly defining each tag (if we decide to tolerate tags). So instead of explicitly defining each tag: const hn::ScalableTag<float> f32;const hn::ScalableTag<int32_t> s32; we can define a template variable like this: template<typename T>constexprautokScalable = hn::ScalableTag<T>{}; And by relying on the lane type, you can define any tag simply like this: kScalable<float>;// equivalent to f32kScalable<int32_t>;// equivalent to s32 |
r-devulap commentedApr 29, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@seiko2plus this sounds good to me. My preference would be to tolerate tags in highway ops and avoid having a wrapper around them. Thin or not, it is still additional code we have to maintain and expand over time and I personally think doesn't add much value. As for the tags and vector themselves, a central definition that kernels can draw from sounds perfect.
I think so. These are useful to be defined in a central place that all the highway kernels can use. |
I'm sorry, my question was probably misformed. What prevents us from contributing this directly to Highway now?
Couldn't we also use |
It's different preferences, I guess. To me, counting on ADL and being able to force change SIMD width with just a single line when needed is a great advantage.
But we are going to transferring the additional code from the kernels to the wrapper.
Well, once it becomes mature enough we can pass it. New ideas may emerge anytime - why should we worry about backward compatibility at this stage when other projects aren't yet depending on it?
|
@seiko2plus if you find it advantageous and feel strongly about it, I don't mind putting this in. As you said in an earlier comment: developers can still access the full Highway API through the |
@seiko2plus Can we make this change before merging?https://github.com/numpy/numpy/pull/28622/files#r2057043403 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Pull Request Overview
This PR introduces a thin Highway SIMD wrapper for NumPy by implementing a simplified interface through new headers and documentation.
- Updated loops_trigonometric.dispatch.cpp to include the new SIMD wrapper header and adjust array allocation.
- Added simd.hpp and simd.inc.hpp to provide lightweight abstractions over Highway.
- Included a README to document design decisions and usage.
Reviewed Changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated no comments.
File | Description |
---|---|
numpy/_core/src/umath/loops_trigonometric.dispatch.cpp | Updated SIMD function with new header inclusion and buffer size adjustments. |
numpy/_core/src/common/simd/simd.inc.hpp | Introduces template implementations and exposes Highway operations in an anonymous namespace. |
numpy/_core/src/common/simd/simd.hpp | Provides configuration macros and namespace-based wrappers for Highway SIMD functionality. |
numpy/_core/src/common/simd/README.md | Documents the design, usage, and configuration details of the new SIMD wrapper. |
Files not reviewed (2)
- numpy/_core/src/highway: Language not supported
- numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src: Language not supported
Comments suppressed due to low confidence (2)
numpy/_core/src/umath/loops_trigonometric.dispatch.cpp:6
- The new header simd/simd.hpp is included alongside simd/simd.h. Verify whether both are necessary or if consolidation is possible to avoid potential conflicts or redundancy.
#include "simd/simd.hpp"
numpy/_core/src/umath/loops_trigonometric.dispatch.cpp:189
- Switching from hn::Lanes(f32) to hn::MaxLanes(f32) alters the allocated buffer size. Confirm that this change is intentional and that the larger buffer size matches the expected SIMD lane count without introducing unnecessary overhead.
float NPY_DECL_ALIGNED(NPY_SIMD_WIDTH) ip_fback[hn::MaxLanes(f32)];
Discussing in the Optimisation team meeting:
|
seiko2plus commentedMay 17, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Done.
As I mentioned before, I'd prefer to use it within NumPy first rather than moving it to Highway until it's mature enough. Hosting in NumPy will make it easier to make changes as we refine the implementation. |
Mousius commentedMay 19, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I believe this is already possible by only enabling
I think this is unnecessary extra work, as it's simple to contribute this to Highway directly without creating our own mini-framework straight away. In my mind, this is setting a precedent that we don't try to work with Highway in the first instance.@jan-wassenberg has already said he has no issues and it's a small change. However, if@r-devulap is happy to merge this, then I will get out of the way. |
A thin wrapper over Google's Highway SIMD library to simplify its interface.This commit provides the implementation of that wrapper, consisting of:- simd.hpp: Main header defining the SIMD namespaces and configuration- simd.inc.hpp: Template header included multiple times with different namespacesThe wrapper eliminates Highway's class tags by:- Using lane types directly which can be deduced from arguments- Leveraging namespaces (np::simd and np::simd128) for different register widthsA README is included to guide usage and document design decisions.
- Fix hardware/platform terminology in documentation for clarity - Add support for long double in template specializations - Add kMaxLanes constant to expose maximum vector width information - Follows clang formatting style for consistency with NumPy codebase.
- Add anonymous namespace around implementation to ensure each translation unit gets its own constants based on local flags - Use HWY_LANES_CONSTEXPR for Lanes function to ensure proper constexpr evaluation across platforms
…size Replace hn::Lanes(f64) with hn::MaxLanes(f64) when defining the index array size to fix error C2131: "expression did not evaluate to a constant". This error occurs because Lanes() isn't always constexpr compatible, especially with scalable vector extensions. MaxLanes() provides a compile-time constant value suitable for static array allocation and should be used with non-scalable SIMD extensions when defining fixed-size arrays.
Rename Highway wrapper macros for clarity:- NPY_SIMDX → NPY_HWY- NPY_SIMDX_F16 → NPY_HWY_F16- NPY_SIMDX_F64 → NPY_HWY_F64- NPY_SIMDX_FMA → NPY_HWY_FMATo avoids confusion with legacy SIMD macros.
Skip Highway's EMU128 in favor of NumPy's scalar implementations fordue to strict IEEE 754 floating-point compliance requirements
When the option |
We already have I'm not sure whether a lack of optimisation in |
A thin wrapper over Google's Highway SIMD library to simplify its interface. This commit provides the implementation of that wrapper, consisting of:
The wrapper eliminates Highway's class tags by:
A README is included to guide usage and document design decisions.