Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork11.9k
ENH: support float and longdouble in FFT using C++ pocketfft version#25711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Uh oh!
There was an error while loading.Please reload this page.
mreineck commentedJan 28, 2024
Ah, OK, this switches to the C++ version, but uses only its 1D transforms! |
seberg commentedJan 28, 2024
That is actually a (small or not?) issue with using the gufuncs, it may be tricky to expand to arbitrary dimensions in a nice way for those (which was mentioned to me, I admit; although I didn't realize it might be an issue any time soon). And yes, so long as this looks all fine, it seems fine to not do the N-D versions. |
mhvk commentedJan 28, 2024
Indeed, I went the easy route, based on the gufunc I had just introduced (mostly to be able to allow |
mreineck commentedJan 28, 2024
No, threading/vectorization only affects 2D and higher transforms. I have played with applying this to 1D as well, but only in |
Uh oh!
There was an error while loading.Please reload this page.
mhvk commentedJan 28, 2024
Just to be clear, do you mean that vectorization helps for FTs over multiple axes, or FTs over a single axis in higher-D arrays? From the code, it looks like the latter, which might still work with the ufunc, as we get an extra axis passed in.
OK, that seems reasonable for follow-up, then. More generally, it may well make sense to use your iterator too, and/or copy what scipy does... I definitely went for what seemed the fastest route... |
| The various FFT routines in `numpy.fft` have gained an ``out`` | ||
| `numpy.fft` support for different precisions and in-place calculations | ||
| ---------------------------------------------------------------------- | ||
| The various FFT routines in `numpy.fft` now support calculations in float, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
They have always supported single precision but have upcast the result. We should be clear that this is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Does this sound good?
The various FFT routines in `numpy.fft` now do their calculations natively infloat, double, or long double precision, depending on the input precision,instead of always calculating in double precision. Hence, the calculation willnow be less precise for single and more precise for long double precision.The data type of the output array will now be adjusted accordingly.Furthermore, all FFT routines have gained an ``out`` argument that can be usedfor in-place calculations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe it makes sense to have a dedicated change note that points out the single precision change explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What's the numpy policy on usingversionchanged. Should that be added to the fft docstrings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe it makes sense to have a dedicated change note that points out the single precision change explicitly.
I don't know how the numpy release notes are amalgamated, but for 2.0 it would be good to highlight all the compatibility breaks somewhere in the release notes.
mreineck commentedJan 28, 2024
Yes, it works for transforming just a single axis in a multi-D array too. The idea is to "bundle" a few of the 1D transforms into a single one of SIMD type. |
mhvk commentedJan 28, 2024
The |
mhvk commentedJan 28, 2024
@mreineck - OK, I'll look into using |
mhvk commentedJan 29, 2024
Calling |
rgommers commentedJan 29, 2024
Nice!
The cache size may help, assuming it's a useful setting in single-threaded mode (@mreineck?). I'd leave the threading along, we can't do much with that. In SciPy it's controlled by a
Probably an oversight indeed, since the SciPy licenses list has it (here). So +1 for adding it.
Adding Build time wise this PR in its current state looks fine - impact of C++ is okay overall, see quick profiling result below. Binary size is fine too, it's ~240 kb for a dev (debugoptimized) build. A note though that the SciPy pocketfft extension is more than 4x larger, so enabling all functionality that SciPy uses in the NumPy build is probably not the way to go (it'd be ~25% of the size of ![]() |
mreineck commentedJan 29, 2024
It will work in single-threaded mode. However, my feeling is that the plan caching isn't really worth it: it may cause large memory consumption (see, e.g.,scipy/scipy#12297) for very little performance gain (unless you call very short 1D FFTs repeatedly). |
30eeaad to102ac99Comparemhvk commentedJan 30, 2024
OK, I now have a version that calls the vectorized pocketfft version if possible. It still has the direct loops for the case that Some remaining questions:
|
mhvk commentedJan 30, 2024
@rgommers - I would love to have a proper |
rgommers commentedJan 30, 2024
I'm not really opposed, but it is a change, with technically only increased binary size as a penalty and no upsides for Windows and macOS arm64 users. Doing nothing and continuing to convert to I'm not sure how serious the plan was to include all the pocketfft functionality that SciPy needs into NumPy - but binary size does seem to become relevant then. If's it's <0.1 MB I'm not too worried, but I wouldn't really want to pay 0.25 MB or more for long double loops that no one has asked for. |
mhvk commentedJan 30, 2024
For comparison, |
rgommers commentedJan 31, 2024
Thanks for the comparison, that is helpful. Note that those are I did a quick test building a wheel from this branch; the pocketfft extension is 508 kb uncompressed, 208 kb compressed in it (total wheel size 7.8 MB; |
mhvk commentedJan 31, 2024
OK, great! The other transforms may not be as bad as one would think, since they mostly use the same code. But anyway, not sure those are needed, and definitely not relevant here. |
seiko2plus commentedJan 31, 2024
Its only limited to core only, I thought it was a global flag |
mhvk commentedJan 31, 2024
Are we sure those flags are seen in Lines 398 to 401 in2634f80
|
mhvk commentedJan 31, 2024
OK, I just empirically confirmed that that flag does not get passed on. And with that, that I can catch C++ exceptions. I'll push that up in a bit. |
seiko2plus commentedJan 31, 2024
If its about memory allocations, I think we should use our custom allocations over malloc or new. |
mhvk commentedJan 31, 2024
@seiko2plus - the problem is that the allocations (and possible other errors) happen inside My current solution is just to put everything in a |
mreineck commentedJan 31, 2024
I'd really recommend not doing that. Most likely these custom operations are C-style and therefore will silently return NULL on error, which goes against fundamental assumptions C++ codes are making. You'd have to patch every place in pocketfft where memory is allocated and add error handling code. |
mhvk commentedJan 31, 2024
p.s. Note that I now on purpose use |
mreineck commentedJan 31, 2024
To me this looks exactly like the right solution. It's located directly at the interface, does not require any big patching and can be extended in case more detailed error handling is required. |
mreineck commentedJan 31, 2024
|
seberg left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Not a deep review (although I doubt that is needed at this point!).
Just commenting on two things so they don't get lost (one bug, one suggestion to try to have a more C++ pattern for catching the exceptions).
Submodule+license seem still two things that would be nice to do as well.
numpy/fft/_pocketfft_umath.cpp Outdated
| void *buff =NULL; | ||
| if (step_out !=sizeof(std::complex<T>)) { | ||
| buff =pocketfft::detail::aligned_alloc(sizeof(std::complex<T>), | ||
| nout *sizeof(std::complex<T>)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There is a small problem here thatbuff is avoid * and not apocketfft:detail::arr<std::complex<T>>. Which I think means that if theplan->exec fails below no cleanup will occur.
My C++ is a bit trial and error, but I think:
/* Create a temporary array to hold a buffer if op is not contiguous */bool contiguous_op = (step_out != sizeof(std::complex<T>));pocketfft::detail::arr<std::complex<T>> tmp_arr(contiguous_op ? 0 : nout);should be an option (an array of length 0 doesn't do a malloc so that is fine).
And below:
std::complex<T> *op_or_buff = (contiguous_op ? (std::complex<T> *)op : tmp_arr.data());(Clearly, the dealloc below then becomes unnecessary, which is the whole point.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, that makes a lot of sense! I implemented it (though with an opposite bool,buffered).
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Note that the C++ version is header only. We just link to upstreamvia a git submodule.
mhvk commentedFeb 3, 2024
@seberg - thanks! Both suggestions were really nice, and make the code a lot cleaner. I pushed a revised version that uses them and that also replaces the addition of the C++ pocketfft header with a new submodule (which includes the license). |
mhvk commentedFeb 3, 2024
p.s. In the checklist above, this just leaves the question of whether this is tested thoroughly enough. I could adjust more tests to try different |
mhvk commentedFeb 3, 2024
p.p.s. Well the above about tests was not actually true, but now it is, so I ticked my "more tests" item... |
| assert_allclose(x_norm, | ||
| np.linalg.norm(tmp),atol=1e-6) | ||
| @pytest.mark.parametrize("dtype", [np.half,np.single,np.double, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This one is now done much more thoroughly intest_identity_long_short andtest_identity_long_short_reversed
Also adjust python side to allow multiple double types.
Also make sure every part uses copy_input and copy_outputconsistently, and avoid compiling multithreading.
Also use internal pocketfft arr class to allocate the possibletemporary buffers, so we do not have to worry about deallocatingthem any more.
mhvk commentedFeb 9, 2024
Is there anything more to do for this? (C++ pocketfft in |
seberg left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Not managing to concentrate very deeply on it, but those two C++ things were basically what I noticed and the rest looks fine.
Since nobody voiced other concerns, let's get this in. (It isn't vastly changed after all, since it is porting the C code.)
Thanks for this great new code!

Uh oh!
There was an error while loading.Please reload this page.
Using the C++ version of pocketfft turned out to be not that much of a hassle. With it, we can support all the
dtypes that it supports: float, double, and long double.@asmeurer,@mreineck - comments are most welcome! (My C++ is non-existent so this really is just a C file with templates thrown in, mostly using trial & error...)
fixes#17801
EDIT: some things to decide:
LICENSES.mdin thepocketfftdirectory. (The licence wasn't there for the C version, but that seems to have been an oversight.)If that is something we should catch, please advice how to interceptdealt with usingthrow...try/catch.