Movatterモバイル変換

NotificationsYou must be signed in to change notification settings
Fork110
Star1.7k

v1.3.4

Toggle v1.3.4's commit message

VkFFT v1.3.4 release-Stable release that incorporates all the v1.3.3 bugfixes - no new functionality in this release-Tests reference:vincefn/pyvkfft#32 (comment)

v1.3.3b

Toggle v1.3.3b's commit message

bugfix (Segmentation fault with 1.3.3#150)

v1.3.2

Toggle v1.3.2's commit message

VkFFT v1.3.2 release-Added double-double support in VkFFT. Requires cpu initialization in full quad precision, so only supports gcc with quadmath dependency for now. Potentially possible to add full FP128 support or some other FP128 library (like mpir) in the future.-Data has to be stored in double-double before VkFFT kernels calls (no fp128<->double-double conversion on the GPU yet).-Full 1e-32 precision, but same range as FP64. See Library for Double-Double and Quad-Double Arithmetic by Y Hida for more information on double-double.-Double-double requires FMA contraction to be disabled (due to ab-cd contraction rounding mismatch). Doesn't work on Vulkan as I haven't found how to do that yet.-Added DST I-IV support.-Fixed warnings (#138)-Added proper check for app to be zero before initializeVkFFT call and zeroing on deletion (#134)-Added an option to provide a staging buffer in the application and VkGPU handle (#129)-Added guards for build type (#128)-Changed default innermost stride for real buffers in out-of-place R2C from size[0]+2 to size[0] (#139)-Allow specifying glslang version (#135)-Improved instruction count and accuracy for radix-7.-Fixed missing deallocation calls for the inverse Bluestein axes. Fixed the buffer layout size in Vulkan in some cases.-Refactored the code generator and container struct layout for better handling complex numbers (-5k loc).-Added more precision tests and benchmarks.

Version 1.3 update of VkFFT-Major library design change - from single header to multiple header approach, which improves structure and maintainability. Now instead of copying a single file, the user has to copy the vkFFT folder contents.-VkFFT has been rewritten to follow the multiple-level platform structure, described in the VkFFT whitepaper. All algorithms have been split into respective files, which should ease an understanding of the library design by everybody. Multiple code duplication places have been restructured and unified (mainly the read/write part of kernels and pre/post-processing).-All math operations and most variables have been abstracted to a union container approach, that can either contain numbers or variable names. Not a full compiler, but the code generated is close to machine-like. There are no math sprintf calls in the actual code generator now. More details can be found here:https://youtu.be/lHlFPqlOezo-VkFFT supports arbitrary number of dimensions now. By defining VKFFT_MAX_FFT_DIMENSIONS, it is now possible to mimic fftw guru interface. Default 4. Innermost stride is always fixed to be 1, but there can be an arbitrary number of outer strides. to achieve innermost batching, initialize N+1 dim FFT and omit the innermost one using omitDimension[0] = 1.-Enabled fp16 for all backends.-Accuracy verification of the new version can be found here:vincefn/pyvkfft#25-The new code structure will facilitate the implementation of many new features and performance improvements, so stay tuned.

v1.2.31

Toggle v1.2.31's commit message

Multi-upload performance improvements + bugfixes-Improved multi-upload FFT algorithm performance in double precision on HPC GPUs-Fixed double precision sincos computation. Now it is possible to disable LUT - useLUT switched to int64_t, -1 disables LUT, 0 - auto decision, 1 forces it. It is possible to disable LUT for 4-step algorithm rotation only - useLUT_4step-Optimized swapTo3Stage4Step and switched it to direct number value from the power of 2-Bugfixes: fixed FP64 usage in FP32 when number ending was not printed in kernels (important), fixed registerBoost incorrect writing,fixed#93

v1.2.30

Toggle v1.2.30's commit message

Metal support in VkFFT-This update adds Apple Metal backend in VkFFT (VKFFT_BACKEND 5)-Metal backend has similar performance compared to other backends (tested on M1 Pro 8c SoC)-Metal backend passes all VkFFT tests OpenCL passes (tested on M1 Pro 8c SoC)-Current limitations of the Metal backend: no double precision, no saving/loading binaries, forced 256 max threads, C++ bindings only, incomplete error handling.-Bugfixes: Rader uint LUT offset not working in some cases, Mult Rader coalescing with <1024 threads, DCT-III reordering index issues with OpenCL on Intel/Apple GPUs.-Slightly improved coalescing logic for Nvidia GPUs-Added precision plots

v1.2.26

Toggle v1.2.26's commit message

Radix 6, 8, 9, 10, 12, 14, 15, 16, 32 support + Bluestein tuning-Added support for more composite radix kenrels. Improves performance by reducing shared memory communications-Added Bluestein sequence advanced tuning: can now specify the sequence to pad to. Added default tuned values for FP32 and FP64 for Nvidia A100 and AMD MI250 for sequences up to 4096-Improved LUT usage: do not upload the first radix, coalesced upload to shared memory for small stage requests.-Bugfixes: C2R check for big radix, matrix convolution coordinate assignment, specification of device_id in cuFFT/rocFFT scripts

v1.2.17

Toggle v1.2.17's commit message

Updated accuracy of radix-11 and radix-13 multiplicative constants (#58)-Improved printing of precision tests (-vkfft 11-18)

v1.2.12

Toggle v1.2.12's commit message

DCT-I and odd-length DCT-IV support-Added support for the missing base Discrete Cosine Transforms.-Fixed DCT-IV issues in OpenCL backend-Improved coalescing logic for DCT and Bluestein's algorithm-bugfixes

PreviousNext

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

v1.3.4

v1.3.3b

v1.3.3

v1.3.2

Verified

v1.3.1

v1.2.31

v1.2.30

Verified

v1.2.26

v1.2.17

v1.2.12

Movatterモバイル変換

Uh oh!

Tags: DTolm/VkFFT