Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH, SIMD: Initial implementation of Highway wrapper#28622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
seiko2plus wants to merge7 commits intonumpy:main
base:main
Choose a base branch
Loading
fromseiko2plus:hwy_wrapper

Conversation

seiko2plus
Copy link
Member

A thin wrapper over Google's Highway SIMD library to simplify its interface. This commit provides the implementation of that wrapper, consisting of:

  • simd.hpp: Main header defining the SIMD namespaces and configuration
  • simd.inc.hpp: Template header included multiple times with different namespaces

The wrapper eliminates Highway's class tags by:

  • Using lane types directly which can be deduced from arguments
  • Leveraging namespaces (np::simd and np::simd128) for different register widths

A README is included to guide usage and document design decisions.

ixgbe and abhishek-iitmadras reacted with thumbs up emoji
@seiko2plusseiko2plus added the component: SIMDIssues in SIMD (fast instruction sets) code or machinery labelApr 1, 2025
Copy link
Contributor

@tylerjereddytylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I was morbidly curious and read theREADME so added minor copyedits.

@ixgbe
Copy link
Contributor

Hi, Could you share the expected timeline or next steps for merging this PR? Thanks!

Copy link
Member

@MousiusMousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

First read, will process more over time 😸

*
* Therefore, we only enable SIMD when the Highway target is not scalar.
*/
#define NPY_SIMDX (HWY_TARGET != HWY_SCALAR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Doesn't this mean we'll have dispatch sources with scalar implementations of our own?

Potentially, this should raise a compile error.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

When HWY_TARGET == HWY_SCALAR, there are two possible scenarios:

  1. The SIMD extension or architecture isn't supported by Highway
  2. Highway considers it a broken platform or compiler (which is fine if the feature is part of the baseline)

For the second case, we have two options:

  1. Extend meson to detect this behavior
  2. Consider it normal behavior since the compiler still has a chance to optimize scalar operations based on the compiler flags defined by meson CPU dispatcher.

Comment on lines 34 to 43
// Indicates if the SIMD operations are available for float16.
#define NPY_SIMDX_F16 (NPY_SIMDX && HWY_HAVE_FLOAT16)
// Note: Highway requires SIMD extentions with native float32 support, so we don't need
// to check for it.

// Indicates if the SIMD operations are available for float64.
#define NPY_SIMDX_F64 (NPY_SIMDX && HWY_HAVE_FLOAT64)

// Indicates if the SIMD floating operations are natively supports fma.
#define NPY_SIMDX_FMA (NPY_SIMDX && HWY_NATIVE_FMA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These seem to just wrap theHWY constants withNPY_SIMDX, so if we removeNPY_SIMDX we can use theHWY variants?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We can use HWY variants, but under#if NPY_SIMDX, please check design notes3. Feature Detection Constants vs. Highway Constants

@seiko2plus
Copy link
MemberAuthor

I was morbidly curious and read the README so added minor copyedits.

@tylerjereddy, Thanks, I have addressed them. Feel free to give it a second round.

Hi, Could you share the expected timeline or next steps for merging this PR? Thanks!

@ixgbe, I'm was aiming to get feedback on the approach first that outlined in your PR#28490 but so far no strong opinions against it, then we can move forward with merging. Once this PR gets merged we can follow on your PR as fast as possible. Please, let me know if you have any specific suggestions or any concerns against this approach.

First read, will process more over time

@Mousius, Thank you! I've updated the README to cover your questions. Please give it a second read.

ixgbe reacted with thumbs up emoji

@seiko2plusseiko2plus marked this pull request as ready for reviewApril 7, 2025 16:20
@ixgbe
Copy link
Contributor

I was morbidly curious and read the README so added minor copyedits.

@tylerjereddy, Thanks, I have addressed them. Feel free to give it a second round.

Hi, Could you share the expected timeline or next steps for merging this PR? Thanks!

@ixgbe, I'm was aiming to get feedback on the approach first that outlined in your PR#28490 but so far no strong opinions against it, then we can move forward with merging. Once this PR gets merged we can follow on your PR as fast as possible. Please, let me know if you have any specific suggestions or any concerns against this approach.

First read, will process more over time

@Mousius, Thank you! I've updated the README to cover your questions. Please give it a second read.
LGTM (Looks Good To Me)! Thanks for coordinating this.

@r-devulapr-devulap self-assigned thisApr 9, 2025
@ixgbe
Copy link
Contributor

when I compile numpy source code(#28608) on RVV1.0 platform, We’ve identified two pain points that require further investigation: 1、constexpr int UNROLL = (NPY_SIMD == 128 ? 4 : 2); Since we have migrated from NPY_SIMD to NPY_SIMDX, the NPY_SIMD macro is now undefined for RVV (RISC-V Vector), which will trigger compile-time errors."
2、NPY_SIMD、NPY_SIMD_F32、NPY_SIMD_F64 for RISC-V Vector are undefined. The compiler error messages are shown in the figure below. How should we resolve this? Any insights would be greatly appreciated!
image

@seiko2plus
Copy link
MemberAuthor

seiko2plus commentedApr 12, 2025
edited
Loading

@ixgbe,
I have added kMaxLanes which can be used in the first case:

constexprint UNROLL = (NPY_SIMD ==128 ?4 :2);

to:

constexprintkUnrollSize =kMaxLanes<uint8_t> ==16 ?4 :2;

Forloadable_* andstoreable_*, you should avoid using them since Highway's scatter/gather intrinsics use scalar operations. However, you will need to implement another utility function to check if the stride is divisible by the size of the type.

ixgbe reacted with thumbs up emoji

@seiko2plusseiko2plusforce-pushed thehwy_wrapper branch 2 times, most recently from18c6085 toc7884f0CompareApril 17, 2025 14:41
Copy link
Member

@r-devulapr-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Minor comments but mostly LGTM. I still need to wrap my head around it and will give it a second read tomorrow.

Copy link
Member

@r-devulapr-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I have had some time to sit on this a little bit and here are my thoughts. As far as I understand, the main goal of having a wrapper around highway intrinsics is to get rid of the tags that highway ops require. For example, this piece of code

namespace hn = hwy::HWY_NAMESPACE;template<typename T>HWY_ATTR void add(T* src, T* dst, size_t N) {    const hn::ScalableTag<T> ftype;    hn::Vec<decltype(ftype)> v = hn::LoadU(ftype, src);    v = hn::Add(v, v);    hn::StoreU(v, ftype, dst);}

can be written without tags:

using namespace np::simd;template<typename T>void add(T* src, T* dst, size_t N) {    Vec<T> v = LoadU(src);    v = Add(v, v);    StoreU(v, dst);}

I do agree that the latter way looks cleaner and avoids having to redefine tags and highway vector for every kernel (like here:#28490 (comment)), I do have to point out couple of downsides of having this wrapper:

  1. This adds an extra layer of abstraction for the intrinsics that needs to be maintained and potentially add an extra layer to debug.
  2. If we want all our highway kernels to be tag free, then this wrapper needs to be expanded with time for all highway ops used which is extra work.

For example, if we want the kernel defined inloops_hyperbolic.dispatch.cpp.src to be tags free, the wrapper needs to expanded to include all these highway ops:

hn::InterleaveLowerhn::TwoTablesLookupLaneshn::RebindMaskhn::GatherIndexhn::ConcatLowerLowerhn::BitCasthn::Subhn::Mul

While the approach proposed here avoids explicit tags in Highway operations, the added abstraction layer introduces maintenance overhead without significant benefits. Directly using Highway's native tag syntax (e.g.,hn::LoadU(ftype, src)) appears sufficiently readable and not hard to write.

Here is my proposed alternative:

  1. Centralize type definitions in simd.hpp with predefined tags/vectors
  2. Reuse these definitions across all Highway kernels
  3. Avoid op-specific expansions through generic implementations

This way we avoid introducing an extra abstraction layer and eliminate the need to expand wrappers for every Highway operation.

@seiko2plus Do you think this would be a simpler and more maintainable solution? ping@jan-wassenberg and@Mousius for their opinions.

@jan-wassenberg
Copy link
Contributor

That makes sense to me. Centralized variable definitions for the tags would avoid the (minor) problem of people choosing different names for them, and shorten the code a bit.

There was also a concern that tags complicate python bindings because many more wrapper functions would have to be generated there - one for half, quarter vectors etc. I suppose we can for purposes of that python binding only export the full-vector version that the currently proposed wrapper functions here would provide.

@seiko2plus
Copy link
MemberAuthor

seiko2plus commentedApr 29, 2025
edited
Loading

@r-devulap,

I appreciate your thoughtful analysis. You make valid points about the trade-offs involved with this wrapper approach.
The primary motivation for this wrapper is indeed to make SIMD code more approachable by eliminating the tag-based syntax, because the current way of Highway seems to encourages users to produce more complicated meta-programming code in their final kernels.

Also, modern C++17 features allow us to introduce simpler constants to deal with SFINAE with cleaner code: e.g.,kMaxLanes<float> rather thanHWY_MAX_LANES_D<ScalableTag<float>> orkSupportLane<double> rather than#ifdef HWY_HAVE_FLOAT64.

This adds an extra layer of abstraction for the intrinsics that needs to be maintained and potentially add an extra layer to debug.

This wrapper is exceptionally thin. All functions are simple one-liners that just forward to Highway with the appropriate tag. There's very little to maintain or debug in the wrapper itself. If an issue occurs, the debugging process is straightforward - the problem will either be in the calling code or in Highway itself, with the wrapper simply passing values through. The mapping between wrapper functions and Highway functions is direct and transparent. See:

template<typename TLane>
HWY_API Vec<TLane>
LoadU(const TLane *ptr)
{
returnhn::LoadU(_Tag<TLane>(), ptr);
}

If we want all our highway kernels to be tag free, then this wrapper needs to be expanded with time for all highway ops used which is extra work.

Not all Highway operations need to be functionally wrapped. Most of Highway operations are already tag-free; extra work is nothing compared to the benefit of having cleaner kernels. Developers can still access the full Highway API through thehn namespace alias that's already included in the wrapper before deciding to add a wrapper for a specific operation. Do you think this expansion process is a burden?

To add more operations from Highway:
1. Import them in the`simd.inc.hpp` file using the`using` directive if they don't require a tag:
```cpp
// For operations that don't require a tag
using hn::FunctionName;
```
2. Define wrapper functions for intrinsics that require a class tag:
```cpp
// For operations that require a tag
template<typename TLane>
HWY_API ReturnTypeFunctionName(Args... args) {
return hn::FunctionName(_Tag<TLane>(), args...);
}
```

While the approach proposed here avoids explicit tags in Highway operations, the added abstraction layer introduces maintenance overhead without significant benefits.

The benefits are clear: code becomes more concise, more readable, less prone to errors and encourages users to produce simpler code. As we decided to replace C template sources.c.src with C++ code for simplicity, would it be wise to choose a cleaner approach even if that requires a thin wrapper to Highway that potentially will be moved to Highway itself in the end?

Here is my proposed alternative:
Centralize type definitions in simd.hpp with predefined tags/vectors
Reuse these definitions across all Highway kernels
Avoid op-specific expansions through generic implementations
This way we avoid introducing an extra abstraction layer and eliminate the need to expand wrappers for every Highway operation.

Not sure what the point of predefining tags is? Why not justhn::ScalableTag<float>{}? ortemplate<typename T> constexpr auto kScalable = hn::ScalableTag<T>{}; withtemplate <typename T> using ScalableVec = hn::Vec<hn::ScalableTag<T>>;?

Using your previous example, your suggestion would be like this:

template<typename T>voidadd(T* src, T* dst,size_t N) {// ScalableVec<T> v = hn::LoadU(kScalable<T>, src);// bad practice to use auto everywhere but can we tolerate it?auto v =hn::LoadU(kScalable<T>, src);    v =hn::Add(v, v);hn::StoreU(v,kScalable<T>, dst);}

How aboutkMaxLanes,kSupportLane, and constantsNPY_SIMDX_*? Based on your alternative, are we going to define them in simd.hpp too?

Do you think this would be a simpler and more maintainable solution?

I'm more concerned about maintaining the kernels than the wrapper itself.
With your suggestion, we still create some form of abstraction - just to avoid simple one-liner wrappers where needed.
All code involves abstraction; the question is at what level. At NumPy's scale, the thin wrapper approach is absolutely worth it.

There was also a concern that tags complicate python bindings because many more wrapper functions would have to be generated there - one for half, quarter vectors etc.
I suppose we can for purposes of that python binding only export the full-vector version that the currently proposed wrapper functions here would provide.

@jan-wassenberg, Thank you for your input. If we moved the wrapper to Highway to replacegoogle-tests withpytest, it would still be possible to usehwy::scalable::<HWY_TARGETS>,hwy::fixed128::<TARGETS>, etc. in the bindings.
pytest is dynamic enough to handle this.

@jan-wassenberg
Copy link
Contributor

One quick comment:

// bad practice to use auto everywhere but can we tolerate it?

I'd advise against using auto everywhere. Kernels that involve range reduction may involve multiple types (at least int32 and f32, possibly more), and I think it's helpful to see the type mentioned in the code.

r-devulap reacted with thumbs up emoji

@Mousius
Copy link
Member

This wrapper is exceptionally thin. All functions are simple one-liners that just forward to Highway with the appropriate tag. There's very little to maintain or debug in the wrapper itself.

This is potentially my most pressing question: Will it continue to be thin? It seems we'll have a lot of thin wrappers.

@jan-wassenberg do you see issues with these being contributed directly to Highway? I'd imagine a lot of the C++17 stuff is a welcome addition to the macro variants?

I'm not seeing anything that is NumPy-centric or needs to be NumPy-centric just yet, and it appears we already know roughly what we'd like to contribute.

Not all Highway operations need to be functionally wrapped. Most of the Highway operations are already tag-free; extra work is nothing compared to the benefit of having cleaner kernels.

I'd also suggest that the extra work may not be necessary if it's this minimal for users. I prefer@r-devulap's ideas of centralised tags for NumPy types and explicit typing in the SIMD kernels.

The benefits are clear: code becomes more concise, more readable, less prone to errors and encourages users to produce simpler code.

This could be debated. As@jan-wassenberg has mentioned above, it is valuable to see the types and conversions explicitly in SIMD kernels to avoid errors. The code itself may contain fewer characters, but that doesn't necessarily correlate with being easier to read. Personally, I've found the verbosity of the SIMD kernels justified due to their details.

@seiko2plus
Copy link
MemberAuthor

@jan-wassenberg,

I'd advise against using auto everywhere. Kernels that involve range reduction may involve multiple types (at least int32 and f32, possibly more), and I think it's helpful to see the type mentioned in the code.

Totally agree with you as mentioned "bad practice". Based on@r-devulap's idea of using centralized tags, the most simplified example would be like:

template<typename T>voidadd(T* src, T* dst,size_t N) {    ScalableVec<T> v =hn::LoadU(kScalable<T>, src);    v =hn::Add(v, v);hn::StoreU(v,kScalable<T>, dst);}

And that's why I believe simplifying Highway will encourages users to produce more clearer and simple code.

@Mousius,

Will it continue to be thin? It seems we'll have a lot of thin wrappers.

Yes definitely. We're just erasing tags and using modern C++ features. We're in 2025 - if we have a chance to make our meta-programming much simpler by leveraging modern C++ features, we should take it. Highway had legacy reasons to support C++11, but we don't.

I'm not seeing anything that is NumPy-centric or needs to be NumPy-centric just yet, and it appears we already know roughly what we'd like to contribute.

I agree that most of this wrapper isn't NumPy-specific and could benefit the broader Highway ecosystem. That's why it would be great to move it to Highway as we agreed before.

This could be debated. As@jan-wassenberg has mentioned above, it is valuable to see the types and conversions explicitly in SIMD kernels to avoid errors. The code itself may contain fewer characters, but that doesn't necessarily correlate with being easier to read. Personally, I've found the verbosity of the SIMD kernels justified due to their details.

I think you misunderstood my comment#28622 (comment). Let me explain with more details:

@r-devulap suggested using centralized tags, and my response was "there's no need for them, what's the point of having them?" Instead, we can use template variables (introduced in C++14) rather than explicitly defining each tag (if we decide to tolerate tags).

So instead of explicitly defining each tag:

const hn::ScalableTag<float> f32;const hn::ScalableTag<int32_t> s32;

we can define a template variable like this:

template<typename T>constexprautokScalable = hn::ScalableTag<T>{};

And by relying on the lane type, you can define any tag simply like this:

kScalable<float>;// equivalent to f32kScalable<int32_t>;// equivalent to s32

@r-devulap
Copy link
Member

r-devulap commentedApr 29, 2025
edited
Loading

Instead, we can use template variables (introduced in C++14) rather than explicitly defining each tag (if we decide to tolerate tags).

template<typename T> constexpr auto kScalable = hn::ScalableTag<T>{};template<typename T>void add(T* src, T* dst, size_t N) {    ScalableVec<T> v = hn::LoadU(kScalable<T>, src);    v = hn::Add(v, v);    hn::StoreU(v, kScalable<T>, dst);}

@seiko2plus this sounds good to me. My preference would be to tolerate tags in highway ops and avoid having a wrapper around them. Thin or not, it is still additional code we have to maintain and expand over time and I personally think doesn't add much value. As for the tags and vector themselves, a central definition that kernels can draw from sounds perfect.

How about kMaxLanes, kSupportLane, and constants NPY_SIMDX_*? Based on your alternative, are we going to define them in simd.hpp too?

I think so. These are useful to be defined in a central place that all the highway kernels can use.

@Mousius
Copy link
Member

I agree that most of this wrapper isn't NumPy-specific and could benefit the broader Highway ecosystem. That's why it would be great to move it to Highway as we agreed before.

I'm sorry, my question was probably misformed. What prevents us from contributing this directly to Highway now?

Instead, we can use template variables (introduced in C++14) rather than explicitly defining each tag (if we decide to tolerate tags).

Couldn't we also useScalableTag<float> everywhere, which is two characters more thankScalable<float>?

@seiko2plus
Copy link
MemberAuthor

I personally think doesn't add much value

It's different preferences, I guess. To me, counting on ADL and being able to force change SIMD width with just a single line when needed is a great advantage.

Thin or not, it is still additional code

But we are going to transferring the additional code from the kernels to the wrapper.

What prevents us from contributing this directly to Highway now?

Well, once it becomes mature enough we can pass it. New ideas may emerge anytime - why should we worry about backward compatibility at this stage when other projects aren't yet depending on it?

Couldn't we also use ScalableTag everywhere, which is two characters more than kScalable?

ScalableTag is the type, whilekScalable would be the tag instance. Yes, we could use the type directly with extra characters() or{}.

@r-devulap
Copy link
Member

It's different preferences, I guess. To me, counting on ADL and being able to force change SIMD width with just a single line when needed is a great advantage.

@seiko2plus if you find it advantageous and feel strongly about it, I don't mind putting this in. As you said in an earlier comment: developers can still access the full Highway API through thehn namespace alias.

@r-devulap
Copy link
Member

@seiko2plus Can we make this change before merging?https://github.com/numpy/numpy/pull/28622/files#r2057043403

@r-devulapr-devulap requested a review fromCopilotMay 5, 2025 03:47
Copy link

@CopilotCopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull Request Overview

This PR introduces a thin Highway SIMD wrapper for NumPy by implementing a simplified interface through new headers and documentation.

  • Updated loops_trigonometric.dispatch.cpp to include the new SIMD wrapper header and adjust array allocation.
  • Added simd.hpp and simd.inc.hpp to provide lightweight abstractions over Highway.
  • Included a README to document design decisions and usage.

Reviewed Changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated no comments.

FileDescription
numpy/_core/src/umath/loops_trigonometric.dispatch.cppUpdated SIMD function with new header inclusion and buffer size adjustments.
numpy/_core/src/common/simd/simd.inc.hppIntroduces template implementations and exposes Highway operations in an anonymous namespace.
numpy/_core/src/common/simd/simd.hppProvides configuration macros and namespace-based wrappers for Highway SIMD functionality.
numpy/_core/src/common/simd/README.mdDocuments the design, usage, and configuration details of the new SIMD wrapper.
Files not reviewed (2)
  • numpy/_core/src/highway: Language not supported
  • numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src: Language not supported
Comments suppressed due to low confidence (2)

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp:6

  • The new header simd/simd.hpp is included alongside simd/simd.h. Verify whether both are necessary or if consolidation is possible to avoid potential conflicts or redundancy.
#include "simd/simd.hpp"

numpy/_core/src/umath/loops_trigonometric.dispatch.cpp:189

  • Switching from hn::Lanes(f32) to hn::MaxLanes(f32) alters the allocated buffer size. Confirm that this change is intentional and that the larger buffer size matches the expected SIMD lane count without introducing unnecessary overhead.
float NPY_DECL_ALIGNED(NPY_SIMD_WIDTH) ip_fback[hn::MaxLanes(f32)];

@Mousius
Copy link
Member

Discussing in the Optimisation team meeting:

  • #define NPY_SIMDX (HWY_TARGET != HWY_SCALAR) will always beTrue as Highway is going to defaultEMU128 which will allow for compiler auto-vectorisation by default. ThereforeNPY_SIMDX is likely unnecessary?
  • Highway is happy to host the wrapper to remove lane tags under contrib given it's fairly clear why it exists and why it's useful, which would save us hosting it in NumPy. We should contribute it directly undercontrib for Highway which equally allows for it to evolve further if necessary?

@seiko2plus
Copy link
MemberAuthor

seiko2plus commentedMay 17, 2025
edited
Loading

@seiko2plus Can we make this change before merging?https://github.com/numpy/numpy/pull/28622/files#r2057043403

Done.

#define NPY_SIMDX (HWY_TARGET != HWY_SCALAR) will always be True as Highway is going to default EMU128 which will allow for compiler auto-vectorisation by default. Therefore NPY_SIMDX is likely unnecessary?

NPY_HWY (NPY_SIMDX) is needed for two important reasons: to completely disable optimization via disable-optimization and to clear the baseline features viacpu-baseline=none. This is important because Highway will still enableSSE3 by default even ifSSE flags aren't provided, sinceSSE3 is enabled by default ongcc/clang. By the way,Highway seems doesn't correctly emulate severalSSE41 elementary intrinsics, but this isn't a concern since PR#28896 will raise the baseline to x86-64-v2 (level 2).

Highway is happy to host the wrapper to remove lane tags under contrib given it's fairly clear why it exists and why it's useful, which would save us hosting it in NumPy. We should contribute it directly under contrib for Highway which equally allows for it to evolve further if necessary?

As I mentioned before, I'd prefer to use it within NumPy first rather than moving it to Highway until it's mature enough. Hosting in NumPy will make it easier to make changes as we refine the implementation.

@Mousius
Copy link
Member

Mousius commentedMay 19, 2025
edited
Loading

NPY_HWY (NPY_SIMDX) is needed for two important reasons: to completely disable optimization via disable-optimization and to clear the baseline features viacpu-baseline=none. This is important because Highway will still enableSSE3 by default even ifSSE flags aren't provided, sinceSSE3 is enabled by default ongcc/clang. By the way,Highway seems doesn't correctly emulate severalSSE41 elementary intrinsics, but this isn't a concern since PR#28896 will raise the baseline to x86-64-v2 (level 2).

I believe this is already possible by only enablingEMU128? I don't see why we needNPY_SIMDX for this.

As I mentioned before, I'd prefer to use it within NumPy first rather than moving it to Highway until it's mature enough. Hosting in NumPy will make it easier to make changes as we refine the implementation.

I think this is unnecessary extra work, as it's simple to contribute this to Highway directly without creating our own mini-framework straight away. In my mind, this is setting a precedent that we don't try to work with Highway in the first instance.@jan-wassenberg has already said he has no issues and it's a small change. However, if@r-devulap is happy to merge this, then I will get out of the way.

A thin wrapper over Google's Highway SIMD library to simplify its interface.This commit provides the implementation of that wrapper, consisting of:- simd.hpp: Main header defining the SIMD namespaces and configuration- simd.inc.hpp: Template header included multiple times with different namespacesThe wrapper eliminates Highway's class tags by:- Using lane types directly which can be deduced from arguments- Leveraging namespaces (np::simd and np::simd128) for different register widthsA README is included to guide usage and document design decisions.
  - Fix hardware/platform terminology in documentation for clarity  - Add support for long double in template specializations  - Add kMaxLanes constant to expose maximum vector width information  - Follows clang formatting style for consistency with NumPy codebase.
  - Add anonymous namespace around implementation to ensure each    translation unit gets its own constants based on local flags  - Use HWY_LANES_CONSTEXPR for Lanes function to ensure proper    constexpr evaluation across platforms
…size  Replace hn::Lanes(f64) with hn::MaxLanes(f64) when defining the index array  size to fix error C2131: "expression did not evaluate to a constant". This error  occurs because Lanes() isn't always constexpr compatible, especially with  scalable vector extensions. MaxLanes() provides a compile-time constant value  suitable for static array allocation and should be used with non-scalable SIMD  extensions when defining fixed-size arrays.
Rename Highway wrapper macros for clarity:- NPY_SIMDX → NPY_HWY- NPY_SIMDX_F16 → NPY_HWY_F16- NPY_SIMDX_F64 → NPY_HWY_F64- NPY_SIMDX_FMA → NPY_HWY_FMATo avoids confusion with legacy SIMD macros.
Skip Highway's EMU128 in favor of NumPy's scalar implementations fordue to strict IEEE 754 floating-point compliance requirements
@seiko2plus
Copy link
MemberAuthor

I believe this is already possible by only enabling EMU128? I don't see why we need NPY_SIMDX for this.

When the optiondisable-optimization is set to true, we should disable all CPU optimized code including dispatch, SIMD, and loop unrolling.
After a quick look at theEMU128 implementation, I realized that we should disable it too to avoid any kind of fast math operations. Also, compiling withfp-strict may not lead to auto-vectorization and may end up with register spilling. Lets stay safe for now, till we get done from moving to Highway.

@Mousius
Copy link
Member

When the optiondisable-optimization is set to true, we should disable all CPU optimized code including dispatch, SIMD, and loop unrolling.
After a quick look at theEMU128 implementation, I realized that we should disable it too to avoid any kind of fast math operations. Also, compiling withfp-strict may not lead to auto-vectorization and may end up with register spilling. Lets stay safe for now, till we get done from moving to Highway.

We already haveNPY_DISABLE_OPTIMIZATION to completely disable the optimised kernels?

I'm not sure whether a lack of optimisation inEMU128 is hugely worrying given it's the fallback.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@tylerjereddytylerjereddytylerjereddy left review comments

@amane-ameamane-ameamane-ame left review comments

@ixgbeixgbeixgbe left review comments

@r-devulapr-devulapr-devulap requested changes

Copilot code reviewCopilotCopilot left review comments

@MousiusMousiusMousius requested changes

Requested changes must be addressed to merge this pull request.

Assignees

@r-devulapr-devulap

Labels
01 - Enhancementcomponent: SIMDIssues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

7 participants
@seiko2plus@ixgbe@jan-wassenberg@Mousius@r-devulap@tylerjereddy@amane-ame

[8]ページ先頭

©2009-2025 Movatter.jp