cuda-kernels
Here are 222 public repositories matching this topic...
Language:All
Sort:Most stars
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
- Updated
Mar 12, 2025 - C
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
- Updated
Mar 18, 2025 - Python
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
- Updated
Mar 18, 2025 - Rust
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
- Updated
Mar 16, 2025 - Cuda
Deep learning in Rust, with shape checked tensors and neural networks
- Updated
Jul 23, 2024 - Rust
Safe rust wrapper around CUDA toolkit
- Updated
Mar 12, 2025 - Rust
CUDA Kernel Benchmarking Library
- Updated
Mar 12, 2025 - Cuda
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
- Updated
Apr 14, 2022 - C++
Kernel Tuner
- Updated
Mar 18, 2025 - Python
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
- Updated
Jun 24, 2022 - C++
From zero to hero CUDA for accelerating maths and machine learning on GPU.
- Updated
Jul 23, 2024 - Cuda
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.
- Updated
Dec 8, 2022 - C#
Some CUDA design patterns and a bit of template magic for CUDA
- Updated
Jun 3, 2023 - C++
Spiking Neural Networks in C++ with strong GPU acceleration through CUDA
- Updated
Jul 3, 2020 - Cuda
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
- Updated
Oct 10, 2023 - C++
CUDA kernel author's tools
- Updated
Apr 24, 2022 - Cuda
Triton implementation of FlashAttention2 that adds Custom Masks.
- Updated
Aug 14, 2024 - Python
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
- Updated
Jul 13, 2024 - Cuda
A tool for examining GPU scheduling behavior.
- Updated
Aug 17, 2024 - Cuda
Improve this page
Add a description, image, and links to thecuda-kernels topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thecuda-kernels topic, visit your repo's landing page and select "manage topics."